[Libwebsockets] Tight loop 100% CPU with [SSL_connect WANT_... retrying]
andrejs.hanins at ubnt.com
Thu Dec 3 14:06:14 CET 2015
On 11/24/2015 09:58 AM, Andy Green wrote:
> On 24 November 2015 15:17:00 GMT+08:00, Andrejs Hanins <andrejs.hanins at ubnt.com> wrote:
>> On 11/24/2015 02:42 AM, Andy Green wrote:
>>> On 23 November 2015 19:46:17 GMT+08:00, Andrejs Hanins
>> <andrejs.hanins at ubnt.com> wrote:
>>>> I'm getting 100% CPU load when LWS client in SSL mode is connecting
>>>> some TCP port which accepts connection at TCP level but doesn't
>>>> to SSL "Client hello". It typically happens when server process is
>>>> busy or hanged (I test with kill -stop PID). LWS client log outputs
>>>> lots of "SSL_connect WANT_... retrying" messages which seem to come
>>>> without any delay, so causing 100% CPU. Also for each such message I
>>>> get LWS_CALLBACK_CHANGE_MODE_POLL_FD callback with POLLIN + POLLOUT
>>>> events set, however current events are already set for POLLIN and
>>>> Isn't 100% CPU load is something which should not happen in this
>>>> situation? Any suggestions to avoid it? It is quite frustrating that
>>>> clients which try to connect to busy servers eat 100% CPU.
>>> Yes... that's why there are so many states around the connection, so
>> it can go back to the event loop and pick it up when something happens.
>>> Wanting read is difficult though. Openssl may want to read
>> something, but he may not be able to succeed at that until he has
>> written something, eg, update keys. That's why it's spinning he is
>> writeable, but this time that wasn't the problem.
>>> He should timeout though on client side.
>> Just to clarify - timeout on client side does happen properly as you
>> said. But CPU is at 100% during waiting for that timeout.
>>> I'll take a look at it tonight.
>> Thanks Andy. Feel free to send me private patches for testing or
> Please try this
> It just treats wanting read or write at face value, separately.
We've just got another situation with 100% CPU load presumably caused by LWS client.
Information is limited, we only know that spinning code definitely goes through
lws_ssl_remove_wsi_from_buffered_list and _lws_log many times. The last message from LWS
before 100% CPU load was "accepting self-signed certificate". As before, it could be
some half-closed connection from the server.
Maybe similar correction is needed somewhere else during later phases of SSL connection
establishment when server dies in the middle and does not close the TCP connection?
The problem is that we can't reproduce it at the moment. Maybe you can at least pin-point
some particular places in LWS code which might produce tight loops so that we can try to
make a test by stopping LWS server exactly when LWS client executes some suspicious line?
Thanks a lot in advance!
>>>> Thanks in advance!
>>>> BR, Andrey
>>>> Libwebsockets mailing list
>>>> Libwebsockets at ml.libwebsockets.org
More information about the Libwebsockets