[Libwebsockets] OpenSSL error queue handling issues
ahanins at gmail.com
Wed Jul 10 18:04:17 CEST 2019
There are multiple serious issues related to handling of OpenSSL error
queue in current master and v3.1-stable branches.
v3.1-stable: Most invocations of OpenSSL APIs are followed either by
ERR_get_error() or lws_ssl_elaborate_error() when API function returns
with an error. This is a right thing to do to clear the error queue. But
lws_ssl_capable_read() which calls SSL_read() does not call anything to
clear a possible error generated by SSL_read(). As a result, when one of
next invocations of SSL_read() fails with negative return value and
SSL_get_error() is called in a hope to get the _latest_ error code the
_earliest_ error code is returned instead, the one which left inside the
error queue from the previous SSL_read(). As one might imagine it's
catastrophic, because instead of innocent SSL_ERROR_WANT_READ there
could be some SSL_ERROR_SYSCALL error code left in the queue which leads
to immediate termination of the ws connection.
master: The issue from v3.1-stable is still there, but I also noticed
an obviously wrong code inside lws_tls_server_accept():
m = lws_ssl_get_error(wsi, n);
It's wrong because lws_tls_err_describe() clears all entries from the
error queue leaving no chances for lws_ssl_get_error() to read any error
I have the following suggestions:
1. Add lws_tls_err_describe() to the lws_ssl_capable_read() right after
SSL_get_error() to make sure errors are always cleared. It's interesting
that lws_ssl_capable_write() already calls lws_tls_err_describe() in
master. So read path was just forgotten?
2. Fix obvious mistake inside lws_tls_server_accept().
3. Kindly ask responsible devs to review the code for similar issues in
other places. I personally couldn't find any other obvious mistakes.
4. Last but not least - safeguard the code by adding ERR_clear_error()
or lws_tls_err_describe() right before invocations of SSL_read() and
SSL_write() to make sure any spurious errors left in the queue can't
hurt us. Maybe even before all SSL APIs if we want to be paranoid.
I spend quite a time not only to figure it out but also to reproduce it
with 100% probability in my environment which uses libwebsockets heavily
(multiple connections, server+client, failed connection attempts due to
timeouts and/or dead sockets etc). In my experiments occasional errors
generated by SSL_read() from one connection caused disconnects of other
connections when their SSL_read() was returning with SSL_ERROR_WANT_READ
but got SSL_ERROR_SYSCALL from the queue. A particular error left in the
queue was "error:14094123:SSL routines:ssl3_read_bytes:application data
after close notify".
More information about the Libwebsockets