[Libwebsockets] We met an issue when using libwebsocket and fixed it, how can I submit the patch
olivier at olivierlanglois.net
Thu Jun 4 11:31:47 CEST 2020
On Thu, 2020-06-04 at 14:16 +0800, 张立峰 wrote:
> Hi, Andy:
> Recently, we met a crash on windows.
> Some thing like this:
> The program is a windows client using websocket protocol.
> 1. We create a thread as the service thread, calling lws_service. We
> intended to use this thread to service may ws client connections. --
> Mark as thread A
> 2. And we call lws_client_connect_via_info from another thread to
> create the wsi struct. -- Mark as thread B
> 3. After some normal case testing, every thing is fine. Then we
> disabled the local network (from windows`s device manager) to test
> network failed case.
> 4. It will crash.
> 5. After some code debug effort, we know that the wsi created in
> lws_client_connect_via_info, freed twice. One is before
> lws_client_connect_via_info return and in the thread B, anther is in
> the lws_service and in the thread A (founding a closed socket, I
> think, in our case is in windows-service.c(_lws_plat_service_tsi)?)
> I tried to fix it like this:
> 1. In the /libwebsockets/lib/roles/http/client/client-handshake.c
> 2. move following code a little down, before a successfull wsi
> if (__insert_wsi_socket_into_fds(wsi->context, wsi))
> goto try_next_result_closesock;
> I am not very confident for this code change. Please give some help,
> if this is right fix?
While I was testing the network failure scenario, I did stumble into a
core dump as well. It did turn out to be a problem in my code but I did
discovered a lib peculiar behavior that was unexpected.
1. While inside lws_client_connect_via_info, the callback with the
reason LWS_CALLBACK_CLIENT_CONNECTION_ERROR can be called when there is
a DNS lookup error.
This was causing a glitch because I am rescheduling a reconnection when
a connection error occur but also when lws_client_connect_via_info
return value indicate a failure. Therefore, I was scheduling a
2. I was storing the wsi pointer passed with LWS_CALLBACK_PROTOCOL_INIT
for when I wanted to call lws_timed_callback_vh_protocol from a place
where I don't have access to a wsi pointer. The only place where I was
doing this sin is when I wanted to reconnect after a
lws_client_connect_via_info failure and this is where my crash was
occuring (not immediatly when calling lws_timed_callback_vh_protocol
but when the timeout was trying to call the bogus protocol CB) because
the wsi passed to LWS_CALLBACK_PROTOCOL_INIT is a local variable that
stops existing when you return from the callback.
I fixed that by storing the protocol instead.
Those 2 things are maybe not your issue but I suggest that you validate
those 2 points nonetheless in your code.
More information about the Libwebsockets