[Libwebsockets] Tight loop 100% CPU with [SSL_connect WANT_... retrying]

Andy Green andy at warmcat.com
Thu Dec 3 14:41:00 CET 2015



On 3 December 2015 21:06:14 GMT+08:00, Andrejs Hanins <andrejs.hanins at ubnt.com> wrote:
>Hi Andy
>
>On 11/24/2015 09:58 AM, Andy Green wrote:
>> 
>> 
>> On 24 November 2015 15:17:00 GMT+08:00, Andrejs Hanins
><andrejs.hanins at ubnt.com> wrote:
>>> Andy,
>>>
>>> On 11/24/2015 02:42 AM, Andy Green wrote:
>>>>
>>>>
>>>> On 23 November 2015 19:46:17 GMT+08:00, Andrejs Hanins
>>> <andrejs.hanins at ubnt.com> wrote:
>>>>> Hi,
>>>>>
>>>>> 	I'm getting 100% CPU load when LWS client in SSL mode is
>connecting
>>> to
>>>>> some TCP port which accepts connection at TCP level but doesn't
>>> respond
>>>>> to SSL "Client hello". It typically happens when server process is
>>> very
>>>>> busy or hanged (I test with kill -stop PID). LWS client log
>outputs
>>>>> lots of "SSL_connect WANT_... retrying" messages which seem to
>come
>>>>> without any delay, so causing 100% CPU. Also for each such message
>I
>>>>> get LWS_CALLBACK_CHANGE_MODE_POLL_FD callback with POLLIN +
>POLLOUT
>>>>> events set, however current events are already set for POLLIN and
>>>>> POLLOUT.
>>>>>
>>>>> 	Isn't 100% CPU load is something which should not happen in this
>>>>> situation? Any suggestions to avoid it? It is quite frustrating
>that
>>>>> clients which try to connect to busy servers eat 100% CPU.
>>>>
>>>> Yes... that's why there are so many states around the connection,
>so
>>> it can go back to the event loop and pick it up when something
>happens.
>>>>
>>>> Wanting read is difficult though.  Openssl may want to read
>>> something, but he may not be able to succeed at that until he has
>>> written something, eg, update keys.  That's why it's spinning he is
>>> writeable, but this time that wasn't the problem.
>>>>
>>>> He should timeout though on client side.
>>> Just to clarify - timeout on client side does happen properly as you
>>> said. But CPU is at 100% during waiting for that timeout.
>>>
>>>>
>>>> I'll take a look at it tonight.
>>> Thanks Andy. Feel free to send me private patches for testing or
>>> debugging.
>> 
>> Please try this
>> 
>>
>https://github.com/warmcat/libwebsockets/commit/1728988efa97aefdcf6c4feb06877e460880cda2
>> 
>> It just treats wanting read or write at face value, separately.
>
>We've just got another situation with 100% CPU load presumably caused
>by LWS client.
>Information is limited, we only know that spinning code definitely goes
>through
>lws_ssl_remove_wsi_from_buffered_list and _lws_log many times. The last
>message from LWS
>before 100% CPU load was "accepting self-signed certificate". As
>before, it could be
>some half-closed connection from the server.
>
>Maybe similar correction is needed somewhere else during later phases
>of SSL connection
>establishment when server dies in the middle and does not close the TCP
>connection?
>
>The problem is that we can't reproduce it at the moment. Maybe you can
>at least pin-point
>some particular places in LWS code which might produce tight loops so
>that we can try to
>make a test by stopping LWS server exactly when LWS client executes
>some suspicious line?

Tight loops are meant to be forbidden since it's all nonblocking.
 
Please try the patch I just pushed, which makes it deal with 0 return from ssl_read / write as a shutdown connection, see if that helps.

-Andy

>Thanks a lot in advance!
>
>> 
>> -Andy
>> 
>>>>
>>>> -Andy
>>>>
>>>>> 	Thanks in advance!
>>>>>
>>>>> BR, Andrey
>>>>> _______________________________________________
>>>>> Libwebsockets mailing list
>>>>> Libwebsockets at ml.libwebsockets.org
>>>>> http://ml.libwebsockets.org/mailman/listinfo/libwebsockets
>>>>
>> 




More information about the Libwebsockets mailing list