[Libwebsockets] OpenSSL error queue handling issues

Andrey H. ahanins at gmail.com
Wed Jul 10 18:04:17 CEST 2019

	There are multiple serious issues related to handling of OpenSSL error 
queue in current master and v3.1-stable branches.

	v3.1-stable: Most invocations of OpenSSL APIs are followed either by 
ERR_get_error() or lws_ssl_elaborate_error() when API function returns 
with an error. This is a right thing to do to clear the error queue. But 
lws_ssl_capable_read() which calls SSL_read() does not call anything to 
clear a possible error generated by SSL_read(). As a result, when one of 
next invocations of SSL_read() fails with negative return value and 
SSL_get_error() is called in a hope to get the _latest_ error code the 
_earliest_ error code is returned instead, the one which left inside the 
error queue from the previous SSL_read(). As one might imagine it's 
catastrophic, because instead of innocent SSL_ERROR_WANT_READ there 
could be some SSL_ERROR_SYSCALL error code left in the queue which leads 
to immediate termination of the ws connection.

	master: The issue from v3.1-stable is still there, but I also noticed 
an obviously wrong code inside lws_tls_server_accept():

	m = lws_ssl_get_error(wsi, n);

	It's wrong because lws_tls_err_describe() clears all entries from the 
error queue leaving no chances for lws_ssl_get_error() to read any error 
at all.

	I have the following suggestions:
1. Add lws_tls_err_describe() to the lws_ssl_capable_read() right after 
SSL_get_error() to make sure errors are always cleared. It's interesting 
that lws_ssl_capable_write() already calls lws_tls_err_describe() in 
master. So read path was just forgotten?

2. Fix obvious mistake inside lws_tls_server_accept().

3. Kindly ask responsible devs to review the code for similar issues in 
other places. I personally couldn't find any other obvious mistakes.

4. Last but not least - safeguard the code by adding ERR_clear_error() 
or lws_tls_err_describe() right before invocations of SSL_read() and 
SSL_write() to make sure any spurious errors left in the queue can't 
hurt us. Maybe even before all SSL APIs if we want to be paranoid.

	I spend quite a time not only to figure it out but also to reproduce it 
with 100% probability in my environment which uses libwebsockets heavily 
(multiple connections, server+client, failed connection attempts due to 
timeouts and/or dead sockets etc). In my experiments occasional errors 
generated by SSL_read() from one connection caused disconnects of other 
connections when their SSL_read() was returning with SSL_ERROR_WANT_READ 
but got SSL_ERROR_SYSCALL from the queue. A particular error left in the 
queue was "error:14094123:SSL routines:ssl3_read_bytes:application data 
after close notify".

BR, Andrey

More information about the Libwebsockets mailing list