[Libwebsockets] Interaction of external threads with libwebsockets server

Andy Green andy at warmcat.com
Fri Jan 3 15:38:51 CET 2014



Thomas Spitz <thomas.spitz at hestia-france.com> wrote:
>Hello Andy,
>
>It doesn't seem to work.
>
>In order to debug, I added the following printf in libwebsockets.c
>#ifdef LWS_HAS_PPOLL
>/*
> * if we changed something in this pollfd...
> *   ... and we're running in a different thread context
> *     than the service thread...
> *       ... and the service thread is waiting in ppoll()...
> *          then fire a SIGUSR2 at the service thread to force it to
> *             restart the ppoll() with our changed events
> */
>if (events != context->fds[wsi->position_in_fds_table].events) {
> sampled_ppoll_tid = lws_idling_ppoll_tid;
>*printf("sampled_ppoll_tid: %d\n",sampled_ppoll_tid);*
> if (sampled_ppoll_tid) {
>tid = context->protocols[0].callback(context, NULL,
>     LWS_CALLBACK_GET_THREAD_ID, NULL, NULL, 0);
> if (tid != sampled_ppoll_tid)
>*printf("kill(sampled_ppoll_tid, SIGUSR2)\n");*
>kill(sampled_ppoll_tid, SIGUSR2);
> }
>}
>#endif
>
>Here below is the log I get when opening test.html while test-server.c
>is
>running (Enclosed my modified test-server.c with a asynchronous sending
>thread). Counter is increased to 1 and then 1 minute elapsed before it
>is
>increased to 2.
>
>webserver PID : 24290
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>START asynchronous sending
>Thread PID : 24290
>asynchronousSending pthread_self():-1678948608
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>    GET URI = /
>    Host = 192.168.1.6:7681
>    Connection = keep-alive
>    Accept: =
>text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
>    Accept-Encoding: = gzip,deflate,sdch
>    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
>    Cache-Control: = max-age=0
>    Cookie: = test=LWS_1388753472_194286_COOKIE
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>sampled_ppoll_tid: -1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
>kill(sampled_ppoll_tid, SIGUSR2)
>    GET URI = /libwebsockets.org-logo.png
>    Host = 192.168.1.6:7681
>    Connection = keep-alive
>    Accept: = image/webp,*/*;q=0.8
>    Accept-Encoding: = gzip,deflate,sdch
>    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
>    Cache-Control: = max-age=0
>    Cookie: = test=LWS_1388753472_194286_COOKIE
>    Referer: = https://192.168.1.6:7681/
>    GET URI = /xxx
>    Host = 192.168.1.6:7681
>    Connection = Upgrade
>    Protocol = dumb-increment-protocol
>    Upgrade = websocket
>    Origin = https://192.168.1.6:7681
>    Key = flv4nyR7f+VEHFSWJXQxBA==
>    Version = 13
>    Extensions = x-webkit-deflate-frame
>    Pragma: = no-cache
>    Cache-Control: = no-cache
>    Cookie: = test=LWS_1388753472_194286_COOKIE
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464

It's better not to use pthread_self but the api that gives the int thread id, since we store it in an int.

>    GET URI = /xxx
>    Host = 192.168.1.6:7681
>    Connection = Upgrade
>    Protocol = lws-mirror-protocol
>    Upgrade = websocket
>     Origin = https://192.168.1.6:7681
>    Key = OXYhUJt7pkvdd72hJzPp5w==
>    Version = 13
>sampled_ppoll_tid: 0
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>    Extensions = x-webkit-deflate-frame
>    Pragma: = no-cache
>    Cache-Control: = no-cache
>    Cookie: = test=LWS_1388753472_194286_COOKIE
>sampled_ppoll_tid: -1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
>kill(sampled_ppoll_tid, SIGUSR2)
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>sampled_ppoll_tid: 0
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
>    GET URI = /favicon.ico
>    Host = 192.168.1.6:7681
>    Connection = keep-alive
>    Accept: = */*
>    Accept-Encoding: = gzip,deflate,sdch
>    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
>    Cookie: = test=LWS_1388753472_194286_COOKIE
>sampled_ppoll_tid: -1657694464
>LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
>kill(sampled_ppoll_tid, SIGUSR2)
>
>I continue my investigation.

I can't make much sense of the logging without timestamping.

You can test the ppoll side by just making it wait in ppoll for a long time and use kill to fire a sigusr2 at it from another terminal and confirm the ppoll exits.

Testing the whole thing needs the main ws protocol connection to be connected and idle until the other thread provokes it to do something.

-Andy

>Thanks for your the patch anyway.
>
>BR,
>Thomas
>
>
>2014/1/3 "Andy Green (林安廸)" <andy at warmcat.com>
>
>> On 01/01/14 22:38, the mail apparently from Andy Green included:
>>
>>
>>>
>>> Thomas Spitz <thomas.spitz at hestia-france.com> wrote:
>>>
>>>> Hello Andy,
>>>>
>>>> Have you had some time trying to replace poll by ppoll in order to
>have
>>>> poll triggered on signal from an external thread?
>>>>
>>>
>>> Not yet... in Taiwan the big holiday is Chinese New Year in a few
>weeks.
>>>  I'm still interested in doing it, the weekend is the most likely
>time.
>>>
>>
>> Please have a look at this:
>>
>> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=
>> 3b3fa9e2086da6157289141e0b6fe1e5035bad25
>>
>> I didn't test it because I don't have a threaded user code, but it
>should
>> be pretty close if not workable already.
>>
>> Note the comment in the commit log, you have to actively enable this
>code
>> (I wasn't able to find a way for the compiler to understand if it had
>> ppoll() or not).
>>
>> ppoll() is a GNU extension so if this is useful, we'll need to add it
>as a
>> CMake-time option.
>>
>> -Andy
>>
>>
>>
>>  -Andy
>>>
>>>  At the present your lib works very well with intensive data submit
>from
>>>> external thread.
>>>>
>>>> Happy new year to everyone.
>>>>
>>>> BR,
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On 25 Dec 2013 15:24, Andy Green (林安廸) <andy at warmcat.com> wrote:
>>>>
>>>>  On 25/12/13 20:14, the mail apparently from Thomas Spitz included:
>>>>>
>>>>>       The choices seem to boil down to this kind of "add a fake
>>>>>>      descriptor" thing (although everything, including the
>"interrupt
>>>>>>
>>>>> the
>>>>
>>>>>      poll" descriptor and the use of it should be defined inside
>the
>>>>>>      library), or maybe change to use ppoll() and fire signals at
>it.
>>>>>>
>>>>>> ppoll() could be an interesting solution but it only interrupt
>the
>>>>>>
>>>>> poll.
>>>>
>>>>>
>>>>>>
>>>>> I think that's all we need to do.
>>>>>
>>>>> Latency is only coming this way on an idle system where we are
>>>>>
>>>> sleeping in
>>>>
>>>>> the poll(), but another thread asked to change what a pollfd was
>>>>>
>>>> waiting on.
>>>>
>>>>>
>>>>> As you pointed out originally, under those circumstances the
>changed
>>>>> pollfd rules won't be seen and handled -- if every fd is idle for
>the
>>>>> events it started out with -- until the poll() timeout expires.
>>>>>
>>>>> Although in other use-cases this isn't that realistic as a
>problem,
>>>>>
>>>> since
>>>>
>>>>> some deal with tens of thousands of simultaneous connections and
>>>>>
>>>> usually
>>>>
>>>>> someone is breaking the poll after a short time for service, in
>other
>>>>>
>>>> use
>>>>
>>>>> cases it is realistic.  You can attack it by reducing the poll
>sleep
>>>>>
>>>> period
>>>>
>>>>> but then you're looking at maybe hundreds of wakes a second on
>what
>>>>>
>>>> should
>>>>
>>>>> be an idle system, needlessly bad for power.
>>>>>
>>>>> If we provided a way for those use-cases to have very long poll()
>>>>>
>>>> timeouts
>>>>
>>>>> and minimal latency it's good I think, so long as it doesn't
>burden
>>>>>
>>>> or make
>>>>
>>>>> problems when it's not wanted or needed.
>>>>>
>>>>>   I was thinking of interrupting poll() using a named pipe in
>which I
>>>>>
>>>>>> would have told lws which wsi it needs to write to. The complete
>>>>>>
>>>>> process
>>>>
>>>>> would have been the following:
>>>>>>
>>>>>>
>>>>> No it's not a good way... lws already has a good semantic in
>poll()
>>>>>
>>>> for
>>>>
>>>>> understanding who needed service.  This would be a lot of new
>stuff
>>>>>
>>>> doing
>>>>
>>>>> the same job that only works in the multithreaded case.
>>>>>
>>>>>   1) Before libwebsocket_create_context(), I create the named
>pipe.
>>>>>
>>>>>> 2) For every client connection, I book for a shm
>>>>>> in LWS_CALLBACK_ESTABLISHED through which I will share incoming
>data
>>>>>> with my main thread
>>>>>> 3) My main thread process the incoming data and store the answer
>>>>>>
>>>>> into
>>>>
>>>>> the shm. It then indicates lws that an answer is ready for a given
>>>>>>
>>>>> wsi
>>>>
>>>>> indicating the ID of the shm in the named pipe
>>>>>> 4) lws poll() is interupted and it knows immediatly which wsi it
>>>>>>
>>>>> needs
>>>>
>>>>> to write to thanks to the ID of the shm. If the wsi is closed in
>the
>>>>>> meanwhile, lws indicate it to the shm in LWS_CALLBACK_CLOSED
>>>>>>
>>>>>> If I use ppoll(), I could keep almost the same principle but I
>would
>>>>>> then need to add a SIGUSR1 and a handler OR loop through my
>client
>>>>>>
>>>>> shm
>>>>
>>>>> array each time ppoll() got interupted with EINTR flag set...
>>>>>>
>>>>> Finally I
>>>>
>>>>> am still wondering whether my solution is not simpler?
>>>>>>
>>>>>>
>>>>> That solution is basically a threaded rewrite of lws not using
>>>>>
>>>> poll(). If
>>>>
>>>>> you're interested to do that I don't want to discourage you, but
>it's
>>>>> something different from lws then.  Of course lws is liberally
>>>>>
>>>> licensed so
>>>>
>>>>> you're welcome to build on it if you have a compatible license.
>>>>>
>>>>> However, if you think about larger scale servers, which do exist
>>>>>
>>>> using
>>>>
>>>>> lws, "knowing the exact (single) wsi" that woke it is not useful
>when
>>>>>
>>>> there
>>>>
>>>>> may be hundreds of fds needing service each poll().
>>>>>
>>>>>       Either way lws_change_pollfd() is central to the solution.
>>>>>
>>>>>>
>>>>>> With my solution or even ppoll one, I don't see when I need to
>>>>>> calllws_change_pollfd() especially as lws_change_pollfd needs a
>>>>>>
>>>>> pointer
>>>>
>>>>> to wsi which I cannot give as my interrupt concerns the complete
>>>>>>
>>>>> context
>>>>
>>>>> and not a special wsi...?I must miss a point.
>>>>>>
>>>>>>
>>>>> lws_change_pollfd() is the point that any code which wants to
>change
>>>>>
>>>> the
>>>>
>>>>> events on a pollfd ends up at now.  And changing the event on a
>>>>>
>>>> pollfd is
>>>>
>>>>> the definition of the cause of latency (when poll() is idle and
>with
>>>>> relatively long timeout).
>>>>>
>>>>> So whether it is doing rx flow control or wait on being able to
>send,
>>>>>
>>>> that
>>>>
>>>>> function is the place to signal to break the poll() one way or the
>>>>>
>>>> other.
>>>>
>>>>>
>>>>>       If you pick a signal like SIGUSR1 and install a do-nothing
>>>>>
>>>> handler
>>>>
>>>>>      for it, firing SIGUSR1 at the process from itself in
>>>>>>      lws_change_pollfd() and using ppoll() could be a really
>small
>>>>>>
>>>>> and
>>>>
>>>>>      robust solution.
>>>>>>      Since the signal is handled it doesn't do anything except
>>>>>>
>>>>> interrupt
>>>>
>>>>>      the ppoll causing a pollfd reload.
>>>>>>      You only need to fire the signal the first time anything
>wants
>>>>>>
>>>>> to
>>>>
>>>>>      interrupt the wait *from another thread* (because if the lws
>>>>>>
>>>>> thread
>>>>
>>>>>      is in poll(), it isn't doing anything else).  If a pollfd
>raced
>>>>>>
>>>>> it
>>>>
>>>>>      and changed first, there's no problem with an additional
>signal
>>>>>>      interrupting the next ppoll loop.
>>>>>>
>>>>>> Ideally, if it is not too much to ask, a simple example of code
>>>>>>
>>>>> would be
>>>>
>>>>> ideal.
>>>>>>
>>>>>>
>>>>> I may have some time tomorrow to give this a try.
>>>>>
>>>>> -Andy
>>>>>
>>>>
>>




More information about the Libwebsockets mailing list