[Libwebsockets] We met an issue when using libwebsocket and fixed it, how can I submit the patch

andy at warmcat.com andy at warmcat.com
Thu Jun 4 11:52:23 CEST 2020



On June 4, 2020 9:31:47 AM UTC, Olivier Langlois <olivier at olivierlanglois.net> wrote:
>On Thu, 2020-06-04 at 14:16 +0800, 张立峰 wrote:
>> Hi, Andy:
>> 
>> 
>> Recently, we met a crash on windows.
>> Some thing like this: 
>> 
>> The program is a windows client using websocket protocol.
>> 
>> 1. We create a thread as the service thread, calling lws_service. We
>> intended to use this thread to service may ws client connections. --
>> Mark as thread A
>> 
>> 2. And we call lws_client_connect_via_info from another thread to
>> create the wsi struct.  -- Mark as thread B
>> 
>> 3. After some normal case testing, every thing is fine. Then we
>> disabled the local network (from windows`s device manager) to test
>> network failed case.
>> 4. It will crash. 
>> 
>> 5. After some code debug effort, we know that the wsi created in
>> lws_client_connect_via_info, freed twice. One is before 
>> lws_client_connect_via_info return and in the thread B,  anther is in
>> the  lws_service and in the thread A (founding a closed socket, I
>> think, in our case is in windows-service.c(_lws_plat_service_tsi)?)
>> 
>> 
>> I tried to fix it like this:
>> 1. In the /libwebsockets/lib/roles/http/client/client-handshake.c
>> 2. move following code a little down, before a successfull wsi
>> return:
>> 
>> if (__insert_wsi_socket_into_fds(wsi->context, wsi))
>>            goto try_next_result_closesock;
>> 
>> 
>> 
>> I am not very confident for this code change. Please give some help,
>> if this is right fix?
>
>While I was testing the network failure scenario, I did stumble into a
>core dump as well. It did turn out to be a problem in my code but I did
>discovered a lib peculiar behavior that was unexpected.
>
>1. While inside lws_client_connect_via_info, the callback with the
>reason LWS_CALLBACK_CLIENT_CONNECTION_ERROR can be called when there is
>a DNS lookup error.
>
>This was causing a glitch because I am rescheduling a reconnection when
>a connection error occur but also when lws_client_connect_via_info
>return value indicate a failure. Therefore, I was scheduling a
>reconnection twice.

Yes it's a bit awkward.  The problem is with the recent async dns and its cache, resolving failures may occur both at 'synchronous client connection setup time' and asynchronously depending on cache state.  The async stuff takes care to present getaddrinfo compatible results and behaviour otherwise.

>2. I was storing the wsi pointer passed with LWS_CALLBACK_PROTOCOL_INIT
>for when I wanted to call lws_timed_callback_vh_protocol from a place

Just a heads up that api has been deprecated, from v4.1 on you'll need LWS_WITH_DEPRECATED_THINGS to still have it.  By v4.3 or equivalent it will be gone.  You can see how to replace it with smarter backoff-aware sul based apis here

https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-client/minimal-ws-client/minimal-ws-client.c

The vh stuff predates sul, bloats wsi whether you use it or not and has basically emulation code to present what the sul stuff can do better natively.

A side effect of using a sul based scheme  is there's no problem scheduling something twice inside one event loop wake, it just gets resheduled to the last time requested and happens once.

>where I don't have access to a wsi pointer. The only place where I was
>doing this sin is when I wanted to reconnect after a
>lws_client_connect_via_info failure and this is where my crash was
>occuring (not immediatly when calling lws_timed_callback_vh_protocol
>but when the timeout was trying to call the bogus protocol CB) because
>the wsi passed to LWS_CALLBACK_PROTOCOL_INIT is a local variable that
>stops existing when you return from the callback.
>
>I fixed that by storing the protocol instead.

Right it's a fakewsi just to keep up appearances.  But the context and vhost bindings of  it should be usable and appropriate for why it's being called.  Bit there's no way to pick a wsi->protocol for fakewsi at vhost init time.

-Andy

>Those 2 things are maybe not your issue but I suggest that you validate
>those 2 points nonetheless in your code.
>
>Greetings,
>
>_______________________________________________
>Libwebsockets mailing list
>Libwebsockets at ml.libwebsockets.org
>https://libwebsockets.org/mailman/listinfo/libwebsockets


More information about the Libwebsockets mailing list