[Libwebsockets] OpenSSL error queue handling issues

Andy Green andy at warmcat.com
Thu Jul 11 11:41:43 CEST 2019



On 7/11/19 8:56 AM, Andrey H. wrote:
> 
> 
> On 7/10/19 10:46 PM, Andy Green wrote:
>> Thanks for reporting it, but well... that is not what I would call a 
>> "catastrophe".  It does sound like it isn't how it should be though.
> It's subjective indeed, but in my world it's a catastrophe when an error 
> generated by one connection leads to a disconnect of other arbitrary 
> connection.

In my world it's houses burning down, security wide open, devices 
bricked etc.  Buggy disconnects are not right or good, but they're 
within the scope of what anyway happens in normal operation, eg, train 
going into a tunnel.

>> The right thing seems to be to change the diagnostic api to use 
>> ERR_peek_... 
> For diagnostics - yes. But just for you to know - the SSL_get_error() 
> used from lws_ssl_get_error() does not modify the error queue whereas 
> ERR_get_error() does remove the earliest error from the queue. Ideally, 
> each failed OpenSSL API call should be followed by ERR_get_error() to 
> clear the queue, but that's a slightly bigger change, because all 
> invocations of OpenSSL API should be inspected. Not all of them push 
> errors to the queue. Now after we've added (see my PR) hopefully all 
> missing ERR_clear_error() the libwebsockets is guaranteed to get the 
> right error code where it really matters but as we don't always clear 
> the queue after us, some other code in the same thread may not be ready 
> for such situations. The simplest solution could be to clear the error 
> queue by ERR_clear_error() right before exiting from libwebsockets entry 
> points like lws_service_fd(). What do you think about it?

I am not really clear where OpenSSL apis are going to refer to the error 
queue first before performing their operation.  However there's an upper 
bound on how many apis can do that, because generally stuff has been 
working OK.

For example we've done the clear now before SSL_new(), but is it really 
credible SSL_new() would choose to fail because something like a cert 
verification failed before on the same thread on a different connection? 
  It does not seem likely it's going to refer to that instead of caring 
about malloc fail etc... looking at the code in OpenSSL, I can't see it 
either, although it could be implicit in something called from there.

https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L679

I can imagine SSL_read() thinking it might have already failed and try 
to make the fail sticky.  So I guess the right thing is to definitively 
clear it before these kind of apis as we have done.

Another possibile approach is clear it on any api where we check the 
return code.  But again if you consider crypto apis like RSA or hashes, 
it doesn't seem credible they will check for a pre-existing error stack 
and fail out on entry.  What's most likely is it will return 1 (OK) if 
no problems, and if problems, push on the error stack which may also 
then have some old entries afterwards and return 0 (bad)... it's OK I think.

>> If you had joined me at the grindstone before imagine how much better 
>> and further along everything would be.
> Andy, I truly value your and others work put into libwebsockets which
> successfully served our project for years, but I was genuinely surprised 
> how that error queue handling issue survived for so long.
> 
>> I pushed patches on v3.1-stable and master, please take a look.
> I've created a pull request for v3.1-stable which adds more 
> ERR_clear_error() before important calls. Also fixed a few places where 
> lws_ssl_get_error() was used incorrectly.

OK I pushed it and added some more of those cases on the master version.

Thanks again.

-Andy

>>
>> -Andy
>>
>>>
>>> BR, Andrey
>>>
>>> _______________________________________________
>>> Libwebsockets mailing list
>>> Libwebsockets at ml.libwebsockets.org
>>> https://libwebsockets.org/mailman/listinfo/libwebsockets


More information about the Libwebsockets mailing list