[Libwebsockets] 答复: a question about poll timeout change in libwebsockets.

Andy Green andy at warmcat.com
Mon Jun 8 10:56:09 CEST 2020

On 6/8/20 9:29 AM, huangkaicheng wrote:
> Hi,
>     Because lws is not thread safe. In some service case, we make that lws to execute safely in another thread. We add MutexLock to ensure lws is safe as follows.
> In thread A(network thread):
>      ....
>      g_wsContext = lws_create_context(&info);
>     while (retCode >= 0 && !g_bForceExit) {
>          MutexLock(&g_stWebSocketMutex);
>          retCode = lws_service(g_wsContext, RTC_TRANS_LOOP_SLEEP_TIME);
>          MutexUnLock(&g_stWebSocketMutex);
>          SleepMs(RTC_TRANS_LOOP_SLEEP_TIME);

Hm shouldn't need to sleep

>      }
> In thread B(service thread):
>      ...
>      MutexLock(&g_stWebSocketMutex);
>      struct lws_client_connect_info connectInfo;
>      (VOS_VOID)memset_s(&connectInfo, sizeof(struct lws_client_connect_info), 0, sizeof(struct lws_client_connect_info));
>      connectInfo.address = linkInfo->serverAddr;
>      connectInfo.context = g_wsContext;
>      connectInfo.port = (int)info->port;
>      connectInfo.ssl_connection = (int)info->bUseSsl;
>      connectInfo.origin = linkInfo->serverAddr;
>      connectInfo.host = info->domain;
>      connectInfo.path = info->url;
>      connectInfo.protocol = NULL;
>      connectInfo.ietf_version_or_minus_one = ietfVersion;
>      wsi = lws_client_connect_via_info(&connectInfo);
>      MutexUnLock(&g_stWebSocketMutex);
>     ....
> In stable 2.3, it work well. And in stable4.0 it work well also. But we meet sometime lws_client_connect_via_info will not execute immediately util ws_service is return.
> And we found lws_service return is wait utils poll is timeout. So lws_client_connect_via_info may be waited up to 1000ms because "mute lock"(in fact, there is nothing todo). We need lws_service is return quickly if there is nothing todo so that lws_client_connect_via_info can be execute quickly. In stable 2.3, timeout is setted by lws_service,so lws_client_connect_via_info is executed quickly.
>      Can you understand what I described and give a suggest for a better way please?

In your example the context is created in one thread and other lws apis 
(lws_client_connect_via_info()) called in another thread, as you 
realized this is not safe, but locking the whole lws service action with 
the mutex is an open-ended wait now basically, it's not going to work well.

It should be enough to follow the ws-server-threads example way (it 
doesn't make any difference if ws or http or whatever).  Remove the 
current mutex and just let the "network thread" do lws service() loop by 
himself without anything else in his loop.

In the "service thread", when you want something to be done in lws, lock 
a mutex and set up an object in memory that can be seen by both threads 
(ie, make the connectInfo struct a static at file scope or inside 
another object at file scope or whatever), set a flag in the shared 
object to indicate you want a new client connection, and then unlock the 
mutex that protects access to the shared object and call 
lws_cancel_service(context) from the "service thread".


This (unsually immediately) wakes up the lws thread, and makes a 
protocol callback event LWS_CALLBACK_EVENT_WAIT_CANCELLED **from the lws 
thread context**.  So the code there is normal lws code running in lws 
thread context, but you made it go there from the other thread.


In there, it should lock the mutex protecting the shared object and from 
the contents of the shared object prepared by the other thread, figure 
out what the other thread wants it to do.  In this case, it can do the 
call to lws_client_connect_via_info() with the connectInfo prepared by 
the other thread, reset whatever "please make a client connection" flag 
you set in the object to indicate that was the task, and then release 
the mutex.

If you are very sensitive to latency, look at LWS_WITH_SYS_ASYNC_DNS, 
this will eliminate the call to getaddrinfo() implied by creating the 
client connection and make it return immediately if it did not fail 
before getting to DNS resolution part.  The client connection DNS 
resolution will then also proceed asynchronously like the connect() 
always does.

If multiple events may be pending, eg, due to other threads want to get 
tasks done by lws thread, or there can be new events at any time, you 
need a linked-list or fifo or ringbuffer in the shared object so it can 
collect tasks even if new tasks come before lws can finish the first one.

This way should be very responsive and work cross-platform, even the 
freertos plat supports lws_cancel_service() now.  And it will work the 
same if your event lib is libuv or whatever and not poll(), now and in 
the future.


More information about the Libwebsockets mailing list