[Libwebsockets] We met an issue when using libwebsocket and fixed it, how can I submit the patch

Andy Green andy at warmcat.com
Thu Jun 4 08:38:01 CEST 2020



On 6/4/20 7:16 AM, 张立峰 wrote:
> Hi, Andy:
> 
> Recently, we met a crash on windows.
> Some thing like this:
> The program is a windows client using websocket protocol.
> 1. We create a thread as the service thread, calling lws_service. We 
> intended to use this thread to service may ws client connections. -- 
> Mark as thread A
> 2. And we call lws_client_connect_via_info from another thread to create 
> the wsi struct.  -- Mark as thread B.

You can't do that.

> 3. After some normal case testing, every thing is fine. Then we disabled 
> the local network (from windows`s device manager) to test network failed 
> case.
> 4. It will crash

Yes, lws is not threadsafe, if you ignore that you will crash in many 
exciting and varied ways.

> 5. After some code debug effort, we know that the wsi created in 
> lws_client_connect_via_info, freed twice. One is before 
> lws_client_connect_via_info return and in the thread B,  anther is in 
> the lws_service and in the thread A (founding a closed socket, I think, 
> in our case is in windows-service.c(_lws_plat_service_tsi)?)
> 
> I tried to fix it like this:
> 1. In the /libwebsockets/lib/roles/http/client/client-handshake.c
> 2. move following code a little down, before a successfull wsi return:
> if (__insert_wsi_socket_into_fds(wsi->context, wsi))
>             goto try_next_result_closesock;
> 
> I am not very confident for this code change. Please give some help, if 
> this is right fix?

No... the right fix is to understand lws deliberately does not even try 
to be threadsafe.  Its model is that everything that uses its apis - 
with one exception - is on the same thread inside the event loop.  Your 
current architecture goes against that and you won't get a "fix" trying 
to use it that way.

You may only call lws apis from the "lws event loop thread", unless you 
are calling the one threadsafe api, lws_cancel_service(), any use of lws 
apis from another thread is strictly forbidden.

For use with other threads, you need to take the approach to have the 
other threads ask the lws thread to do work that is related to lws.  For 
example, define a mutex and struct in a shared memory area that all the 
threads can access, lock it and add your request in there.  Then unlock 
the shared area and call lws_cancel_service()

https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-threads/protocol_lws_minimal.c#n124

to signal to the lws thread there is something to do.  Your protocol 
will immediately receive a LWS_CALLBACK_EVENT_WAIT_CANCELLED callback 
**on the lws thread context**

https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-threads/protocol_lws_minimal.c#n266

in there it can lock the shared struct and find out what it is being 
asked to do, and do it from the lws thread, unlock and go on.

You can, eg, put your prepared _connection_info struct in the shared 
area, prepared by the other thread.  So although this is a deep change, 
actually it's probably not going to cause a big code rewrite.  But this 
change isn't optional.

-Andy

> Best wishes
> Terry Zhang
> 2020-6-4
> 
> 
> 
> ------------------ Original ------------------
> *From: * "Andy Green"<andy at warmcat.com>;
> *Date: * Wed, Mar 4, 2020 09:03 PM
> *To: * "张立峰"<lifeng.zhang at melot.cn>; 
> "libwebsockets"<libwebsockets at ml.libwebsockets.org>;
> *Subject: * Re: [Libwebsockets] We met an issue when using libwebsocket 
> and fixed it, how can I submit the patch
> 
> 
> On 3/4/20 10:28 AM, 张立峰 wrote:
>  > Hi, everyone:
>  >
>  > When using libwebsockets as a client to communicate a ws server, we
>  > write out many messages in a queue when receive
>  > a LWS_CALLBACK_CLIENT_WRITEABLE. This may cause the ws server to close
>  > our connection.
>  >
>  > We found that when lws_write return value is smaller than requested, the
>  > connection will be close when we use lws_write next time.
>  >
>  > After check the source code of libwebsockets, we found the issue cause:
>  >
>  > The "wsi->ws->inside_frame" is not set to "0" after wsi->buflist_out has
>  > been sent. This will cause the wrong coded ws frame been sent.
> 
> Thanks, I modified it a little bit for the cases ws is not enabled for
> build or it's http and not ws connection, and pushed it on master and
> v3.2-stable.
> 
> -Andy


More information about the Libwebsockets mailing list