[Libwebsockets] Interaction of external threads with libwebsockets server

Thomas Spitz thomas.spitz at hestia-france.com
Fri Jan 3 15:05:37 CET 2014


Hello Andy,

It doesn't seem to work.

In order to debug, I added the following printf in libwebsockets.c
#ifdef LWS_HAS_PPOLL
/*
 * if we changed something in this pollfd...
 *   ... and we're running in a different thread context
 *     than the service thread...
 *       ... and the service thread is waiting in ppoll()...
 *          then fire a SIGUSR2 at the service thread to force it to
 *             restart the ppoll() with our changed events
 */
if (events != context->fds[wsi->position_in_fds_table].events) {
 sampled_ppoll_tid = lws_idling_ppoll_tid;
*printf("sampled_ppoll_tid: %d\n",sampled_ppoll_tid);*
 if (sampled_ppoll_tid) {
tid = context->protocols[0].callback(context, NULL,
     LWS_CALLBACK_GET_THREAD_ID, NULL, NULL, 0);
 if (tid != sampled_ppoll_tid)
*printf("kill(sampled_ppoll_tid, SIGUSR2)\n");*
kill(sampled_ppoll_tid, SIGUSR2);
 }
}
#endif

Here below is the log I get when opening test.html while test-server.c is
running (Enclosed my modified test-server.c with a asynchronous sending
thread). Counter is increased to 1 and then 1 minute elapsed before it is
increased to 2.

webserver PID : 24290
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
START asynchronous sending
Thread PID : 24290
asynchronousSending pthread_self():-1678948608
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
    GET URI = /
    Host = 192.168.1.6:7681
    Connection = keep-alive
    Accept: =
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    Accept-Encoding: = gzip,deflate,sdch
    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
    Cache-Control: = max-age=0
    Cookie: = test=LWS_1388753472_194286_COOKIE
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
sampled_ppoll_tid: -1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
kill(sampled_ppoll_tid, SIGUSR2)
    GET URI = /libwebsockets.org-logo.png
    Host = 192.168.1.6:7681
    Connection = keep-alive
    Accept: = image/webp,*/*;q=0.8
    Accept-Encoding: = gzip,deflate,sdch
    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
    Cache-Control: = max-age=0
    Cookie: = test=LWS_1388753472_194286_COOKIE
    Referer: = https://192.168.1.6:7681/
    GET URI = /xxx
    Host = 192.168.1.6:7681
    Connection = Upgrade
    Protocol = dumb-increment-protocol
    Upgrade = websocket
    Origin = https://192.168.1.6:7681
    Key = flv4nyR7f+VEHFSWJXQxBA==
    Version = 13
    Extensions = x-webkit-deflate-frame
    Pragma: = no-cache
    Cache-Control: = no-cache
    Cookie: = test=LWS_1388753472_194286_COOKIE
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
    GET URI = /xxx
    Host = 192.168.1.6:7681
    Connection = Upgrade
    Protocol = lws-mirror-protocol
    Upgrade = websocket
     Origin = https://192.168.1.6:7681
    Key = OXYhUJt7pkvdd72hJzPp5w==
    Version = 13
sampled_ppoll_tid: 0
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
    Extensions = x-webkit-deflate-frame
    Pragma: = no-cache
    Cache-Control: = no-cache
    Cookie: = test=LWS_1388753472_194286_COOKIE
sampled_ppoll_tid: -1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
kill(sampled_ppoll_tid, SIGUSR2)
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
sampled_ppoll_tid: 0
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1657694464
    GET URI = /favicon.ico
    Host = 192.168.1.6:7681
    Connection = keep-alive
    Accept: = */*
    Accept-Encoding: = gzip,deflate,sdch
    Accept-Language: = fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
    Cookie: = test=LWS_1388753472_194286_COOKIE
sampled_ppoll_tid: -1657694464
LWS_CALLBACK_GET_THREAD_ID pthread_self():-1678948608
kill(sampled_ppoll_tid, SIGUSR2)

I continue my investigation.

Thanks for your the patch anyway.

BR,
Thomas


2014/1/3 "Andy Green (林安廸)" <andy at warmcat.com>

> On 01/01/14 22:38, the mail apparently from Andy Green included:
>
>
>>
>> Thomas Spitz <thomas.spitz at hestia-france.com> wrote:
>>
>>> Hello Andy,
>>>
>>> Have you had some time trying to replace poll by ppoll in order to have
>>> poll triggered on signal from an external thread?
>>>
>>
>> Not yet... in Taiwan the big holiday is Chinese New Year in a few weeks.
>>  I'm still interested in doing it, the weekend is the most likely time.
>>
>
> Please have a look at this:
>
> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=
> 3b3fa9e2086da6157289141e0b6fe1e5035bad25
>
> I didn't test it because I don't have a threaded user code, but it should
> be pretty close if not workable already.
>
> Note the comment in the commit log, you have to actively enable this code
> (I wasn't able to find a way for the compiler to understand if it had
> ppoll() or not).
>
> ppoll() is a GNU extension so if this is useful, we'll need to add it as a
> CMake-time option.
>
> -Andy
>
>
>
>  -Andy
>>
>>  At the present your lib works very well with intensive data submit from
>>> external thread.
>>>
>>> Happy new year to everyone.
>>>
>>> BR,
>>>
>>> Thomas
>>>
>>>
>>> On 25 Dec 2013 15:24, Andy Green (林安廸) <andy at warmcat.com> wrote:
>>>
>>>  On 25/12/13 20:14, the mail apparently from Thomas Spitz included:
>>>>
>>>>       The choices seem to boil down to this kind of "add a fake
>>>>>      descriptor" thing (although everything, including the "interrupt
>>>>>
>>>> the
>>>
>>>>      poll" descriptor and the use of it should be defined inside the
>>>>>      library), or maybe change to use ppoll() and fire signals at it.
>>>>>
>>>>> ppoll() could be an interesting solution but it only interrupt the
>>>>>
>>>> poll.
>>>
>>>>
>>>>>
>>>> I think that's all we need to do.
>>>>
>>>> Latency is only coming this way on an idle system where we are
>>>>
>>> sleeping in
>>>
>>>> the poll(), but another thread asked to change what a pollfd was
>>>>
>>> waiting on.
>>>
>>>>
>>>> As you pointed out originally, under those circumstances the changed
>>>> pollfd rules won't be seen and handled -- if every fd is idle for the
>>>> events it started out with -- until the poll() timeout expires.
>>>>
>>>> Although in other use-cases this isn't that realistic as a problem,
>>>>
>>> since
>>>
>>>> some deal with tens of thousands of simultaneous connections and
>>>>
>>> usually
>>>
>>>> someone is breaking the poll after a short time for service, in other
>>>>
>>> use
>>>
>>>> cases it is realistic.  You can attack it by reducing the poll sleep
>>>>
>>> period
>>>
>>>> but then you're looking at maybe hundreds of wakes a second on what
>>>>
>>> should
>>>
>>>> be an idle system, needlessly bad for power.
>>>>
>>>> If we provided a way for those use-cases to have very long poll()
>>>>
>>> timeouts
>>>
>>>> and minimal latency it's good I think, so long as it doesn't burden
>>>>
>>> or make
>>>
>>>> problems when it's not wanted or needed.
>>>>
>>>>   I was thinking of interrupting poll() using a named pipe in which I
>>>>
>>>>> would have told lws which wsi it needs to write to. The complete
>>>>>
>>>> process
>>>
>>>> would have been the following:
>>>>>
>>>>>
>>>> No it's not a good way... lws already has a good semantic in poll()
>>>>
>>> for
>>>
>>>> understanding who needed service.  This would be a lot of new stuff
>>>>
>>> doing
>>>
>>>> the same job that only works in the multithreaded case.
>>>>
>>>>   1) Before libwebsocket_create_context(), I create the named pipe.
>>>>
>>>>> 2) For every client connection, I book for a shm
>>>>> in LWS_CALLBACK_ESTABLISHED through which I will share incoming data
>>>>> with my main thread
>>>>> 3) My main thread process the incoming data and store the answer
>>>>>
>>>> into
>>>
>>>> the shm. It then indicates lws that an answer is ready for a given
>>>>>
>>>> wsi
>>>
>>>> indicating the ID of the shm in the named pipe
>>>>> 4) lws poll() is interupted and it knows immediatly which wsi it
>>>>>
>>>> needs
>>>
>>>> to write to thanks to the ID of the shm. If the wsi is closed in the
>>>>> meanwhile, lws indicate it to the shm in LWS_CALLBACK_CLOSED
>>>>>
>>>>> If I use ppoll(), I could keep almost the same principle but I would
>>>>> then need to add a SIGUSR1 and a handler OR loop through my client
>>>>>
>>>> shm
>>>
>>>> array each time ppoll() got interupted with EINTR flag set...
>>>>>
>>>> Finally I
>>>
>>>> am still wondering whether my solution is not simpler?
>>>>>
>>>>>
>>>> That solution is basically a threaded rewrite of lws not using
>>>>
>>> poll(). If
>>>
>>>> you're interested to do that I don't want to discourage you, but it's
>>>> something different from lws then.  Of course lws is liberally
>>>>
>>> licensed so
>>>
>>>> you're welcome to build on it if you have a compatible license.
>>>>
>>>> However, if you think about larger scale servers, which do exist
>>>>
>>> using
>>>
>>>> lws, "knowing the exact (single) wsi" that woke it is not useful when
>>>>
>>> there
>>>
>>>> may be hundreds of fds needing service each poll().
>>>>
>>>>       Either way lws_change_pollfd() is central to the solution.
>>>>
>>>>>
>>>>> With my solution or even ppoll one, I don't see when I need to
>>>>> calllws_change_pollfd() especially as lws_change_pollfd needs a
>>>>>
>>>> pointer
>>>
>>>> to wsi which I cannot give as my interrupt concerns the complete
>>>>>
>>>> context
>>>
>>>> and not a special wsi...?I must miss a point.
>>>>>
>>>>>
>>>> lws_change_pollfd() is the point that any code which wants to change
>>>>
>>> the
>>>
>>>> events on a pollfd ends up at now.  And changing the event on a
>>>>
>>> pollfd is
>>>
>>>> the definition of the cause of latency (when poll() is idle and with
>>>> relatively long timeout).
>>>>
>>>> So whether it is doing rx flow control or wait on being able to send,
>>>>
>>> that
>>>
>>>> function is the place to signal to break the poll() one way or the
>>>>
>>> other.
>>>
>>>>
>>>>       If you pick a signal like SIGUSR1 and install a do-nothing
>>>>
>>> handler
>>>
>>>>      for it, firing SIGUSR1 at the process from itself in
>>>>>      lws_change_pollfd() and using ppoll() could be a really small
>>>>>
>>>> and
>>>
>>>>      robust solution.
>>>>>      Since the signal is handled it doesn't do anything except
>>>>>
>>>> interrupt
>>>
>>>>      the ppoll causing a pollfd reload.
>>>>>      You only need to fire the signal the first time anything wants
>>>>>
>>>> to
>>>
>>>>      interrupt the wait *from another thread* (because if the lws
>>>>>
>>>> thread
>>>
>>>>      is in poll(), it isn't doing anything else).  If a pollfd raced
>>>>>
>>>> it
>>>
>>>>      and changed first, there's no problem with an additional signal
>>>>>      interrupting the next ppoll loop.
>>>>>
>>>>> Ideally, if it is not too much to ask, a simple example of code
>>>>>
>>>> would be
>>>
>>>> ideal.
>>>>>
>>>>>
>>>> I may have some time tomorrow to give this a try.
>>>>
>>>> -Andy
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://libwebsockets.org/pipermail/libwebsockets/attachments/20140103/6128535d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-server_PC.c
Type: text/x-csrc
Size: 25338 bytes
Desc: not available
URL: <https://libwebsockets.org/pipermail/libwebsockets/attachments/20140103/6128535d/attachment.bin>


More information about the Libwebsockets mailing list