[Libwebsockets] Fast loop of LWS_CALLBACK_GET_THREAD_ID events.

Andy Green andy at warmcat.com
Sun Apr 26 00:15:36 CEST 2015



On 26 April 2015 01:53:31 GMT+08:00, Andrey Pokrovskiy <wonder.mice at gmail.com> wrote:
>I started to look at PONG after that comment from Bruce:
>
>> Looking at parser.c, it gets LWS_WS_OPCODE_07__CLOSE and goes to
>process_as_ping
>
>Apparently, when lws receives CLOSE it sends a PONG back.

Ugh... I was about to correct you that it just echos the close payload like a pong, but in fact that change is incomplete... it sends a pong control packet not a close one.

I pushed a patch to fix that, thanks for finding it.  However...

>As I can tell, test server and test client don't send CLOSE now. So
>maybe that's why you can't reproduce it?

... you're also right about that.  I tried and can reproduce it now using a browser as the client.  The actual symptom is caused by something else.

I found the root cause is the connection state we enter after preparing to send the close one was not added to the list of connection states that need POLLOUT processing.  A one-liner to do that fixed it here, I pushed that too... please give it a go.

-Andy

>On Sat, Apr 25, 2015 at 4:38 AM, Andy Green <andy at warmcat.com> wrote:
>>
>>
>> On 25 April 2015 18:13:09 GMT+08:00, Andrey Pokrovskiy
><wonder.mice at gmail.com> wrote:
>>>This patch fixes 100%CPU on the client during connect for me:
>>>https://github.com/wonder-mice/libwebsockets/commit/23e938f8d10aaef82600b71ac4bd30c510103553
>>>
>>>Didn't do enough testing though, probably break a lot. But I doubt it
>>>could fix your problem.
>>>
>>>As far as I understand, after pong was sent the following must be
>>>called:
>>
>> I don't think any PONG is involved in what's going on in Bruce's
>system.
>>
>> It'd be helpful if the test server was hacked to show the same thing
>so I could see the problem.
>>
>> -Andy
>>
>>>        if (lws_change_pollfd(wsi, LWS_POLLOUT, 0))
>>>                goto failed;
>>>        lws_libev_io(context, wsi, LWS_EV_STOP | LWS_EV_WRITE);
>>>
>>>Try tracing the first lws_handle_POLLOUT_event() call after "Change
>>>poll FD fd = 8, events = 5". I think it fails to set "events" to 1
>for
>>>some reason.
>>>
>>>On Fri, Apr 24, 2015 at 6:22 PM, Andy Green <andy at warmcat.com> wrote:
>>>>
>>>>
>>>> On 25 April 2015 10:06:29 GMT+09:00, Andy Green <andy at warmcat.com>
>>>wrote:
>>>>>
>>>>>
>>>>>On 25 April 2015 03:13:19 GMT+09:00, Bruce Perens
><bruce at perens.com>
>>>>>wrote:
>>>>>>My server currently catches LWS_CALLBACK_GET_THREAD_ID and returns
>>>>>>pthread_self(). Currently, it never
>>>>>>calls libwebsocket_callback_on_writable().
>>>>>
>>>>>Right but something should be doing so somewhere if it sends
>>>anything.
>>>>>
>>>>>>Using either the released version or a pull from git today, a
>>>>>>libwebsocket_service() loop with a long delay eats the CPU while a
>>>>>>socket
>>>>>>is closing. This is what I am seeing:
>>>>>>
>>>>>>Add poll FD, count = 1 fd = 6, events = 1      *Start of serving a
>>>few
>>>>>>files via HTML. Lots of LWS_CALLBACK_GET_THREAD_ID events
>happening
>>>>>>here.*
>>>>>>Add poll FD, count = 2 fd = 7, events = 1
>>>>>>Add poll FD, count = 3 fd = 8, events = 1
>>>>>>Add poll FD, count = 4 fd = 9, events = 1
>>>>>>Delete poll FD, count = 3 fd = 7, events = 0
>>>>>>Delete poll FD, count = 2 fd = 8, events = 0
>>>>>>Delete poll FD, count = 1 fd = 9, events = 0
>>>>>>Add poll FD, count = 2 fd = 8, events = 1       *Start of
>Websocket
>>>>>>service.*
>>>>>>Connection with protocol: radio-server-1 path: /foo
>>>>>>Change poll FD fd = 8, events = 5                  *Service is in
>>>>>
>>>>>Sorry why is he closing?  In the interests of reproducing it.
>>>>>
>>>>>>closing
>>>>>>state, CPU utilization goes to 100%, continuous
>>>>>>LWS_CALLBACK_GET_THREAD_ID
>>>>>>events here.*
>>>>>>Delete poll FD, count = 1 fd = 8, events = 0    Service is closed,
>>>CPU
>>>>>>utilization goes low again.
>>>>>>Websocket connection closed.
>>>>>>
>>>>>>Am I providing the wrong information to the
>>>LWS_CALLBACK_GET_THREAD_ID
>>>>>>callback?
>>>>>
>>>>>No the number reported there is opaque to lws.  Lws itself is truly
>>>>>singlethreaded.
>>>>>
>>>>>I am sitting in Haneda airport at the moment with a couple of hours
>>>to
>>>>>kill, I will try to reproduce it.
>>>>
>>>> Just doing this
>>>>
>>>> diff --git a/test-server/test-server.c b/test-server/test-server.c
>>>> index 8643c93..ff969ff 100644
>>>> --- a/test-server/test-server.c
>>>> +++ b/test-server/test-server.c
>>>> @@ -973,7 +973,7 @@ int main(int argc, char **argv)
>>>>                  * the number of ms in the second argument.
>>>>                  */
>>>>
>>>> -               n = libwebsocket_service(context, 50);
>>>> +               n = libwebsocket_service(context, 50000);
>>>>  #endif
>>>>         }
>>>>
>>>> ...and then start and kill the test client after a few seconds does
>>>not exhibit the problem.  Top shows no load from the server.
>>>>
>>>> Does your scenario actually promote the connection to a ws one or
>he
>>>stays as http only?
>>>>
>>>> -Andy
>>>>
>>>>>-Andy
>>>>>
>>>>>>    Thanks
>>>>>>
>>>>>>    Bruce
>>>>>>
>>>>>>
>>>>>>------------------------------------------------------------------------
>>>>>>
>>>>>>_______________________________________________
>>>>>>Libwebsockets mailing list
>>>>>>Libwebsockets at ml.libwebsockets.org
>>>>>>http://ml.libwebsockets.org/mailman/listinfo/libwebsockets
>>>>>
>>>>>_______________________________________________
>>>>>Libwebsockets mailing list
>>>>>Libwebsockets at ml.libwebsockets.org
>>>>>http://ml.libwebsockets.org/mailman/listinfo/libwebsockets
>>>>
>>>> _______________________________________________
>>>> Libwebsockets mailing list
>>>> Libwebsockets at ml.libwebsockets.org
>>>> http://ml.libwebsockets.org/mailman/listinfo/libwebsockets
>>




More information about the Libwebsockets mailing list