[Libwebsockets] OpenSSL error queue handling issues

Andy Green andy at warmcat.com
Thu Jul 11 20:01:49 CEST 2019



On 7/11/19 11:30 AM, Andrey H. wrote:
> 
> 
> On 7/11/19 12:41 PM, Andy Green wrote:
>> I am not really clear where OpenSSL apis are going to refer to the 
>> error queue first before performing their operation.
> That's not what I'm saying. The problem is not that OpenSSL API itself 
> may refer to some error from the queue and fail prematurely when it 
> shouldn't. The potential problem is only for the callers of OpenSSL API

I don't really understand how the problem you mentioned things getting 
elevated to SYSCALL fits in then.

> which may get earliest error from the queue instead of the latest if 
> queue wasn't empty before the API call. Typical example is 
> SSL_get_error() which docs says:
> 
> The current thread's error queue must be empty before the TLS/SSL I/O 
> operation is attempted, or SSL_get_error() will not work reliably.
> 
> Note that it doesn't say that I/O operation itself may not work reliably.

Mm OK.

>> For example we've done the clear now before SSL_new(), but is it 
>> really credible SSL_new() would choose to fail because something like 
>> a cert verification failed before on the same thread on a different 
>> connection? 
> Definitely no. OpenSSL APIs don't fail just because error queue was not 
> empty. Clear before SSL_new() is necessary for the following 
> ERR_get_error() to return a real reason of SSL_new() failure, but not 
> some error code from a previous failure, like cert validation. Imagine 
> how it could confuse the troubleshooting of why SSL_new() fails.
> 
> 
>> I can imagine SSL_read() thinking it might have already failed and try 
>> to make the fail sticky. 
> As I said before - no. It doesn't happen this way. Only new errors are 
> pushed to the queue, if necessary. Existing errors don't hurt OpenSSL 
> APIs and they don't stick this way.

I can't match this to the "catastrophe" that was supposed to be 
happening.  I sounds like it's just sometimes reporting a stale error 
from the stack if it had to pick one... something had to actually go 
wrong but it may report an old error if you ask for details.

The cleardowns should have got rid of all that now.

>> So I guess the right thing is to definitively clear it before these 
>> kind of apis as we have done.
> Yes, it's safest, IMO.

We are doing it now.

>> Another possibile approach is clear it on any api where we check the 
>> return code.
> Or always use ERR_peek_last_error() which returns the _latest_ error 
> from the queue, this way it doesn't matter how many errors are in the 
> queue. But it doesn't work for SSL_get_error() which internally uses 
> ERR_peek_error() which returns the _earliest_ error, thus we need do 
> clear errors before calling the following APIs: SSL_connect(), 
> SSL_accept(), SSL_do_handshake(), SSL_read_ex(), SSL_read(), 
> SSL_peek_ex(), SSL_peek(), SSL_write_ex(), SSL_write().
> See https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html

Maybe I am dumb but should these OpenSSL apis not clear down the error 
stack themselves if it is literally mandatory?  Instead of every project 
using it having to do it.

I think we're now manually clearing down all those mentioned that we 
actually use... in our case it's not too difficult since in the lws tree 
all the OpenSSL apis are corralled down ./lib/tls/openssl/ and 
centralized.  Ie, there is just one SSL_read() and one SSL_write() in 
the whole tree, so one place to fix up.  So I think enough is done now 
AFAIK.

>> But again if you consider crypto apis like RSA or hashes, it doesn't 
>> seem credible they will check for a pre-existing error stack and fail 
>> out on entry.  What's most likely is it will return 1 (OK) if no 
>> problems, and if problems, push on the error stack which may also then 
>> have some old entries afterwards and return 0 (bad)... it's OK I think.
> Right.

OK then I think we're good.

-Andy

> BR, Andrey


More information about the Libwebsockets mailing list