[Libwebsockets] Thread pool example and performance

andy at warmcat.com andy at warmcat.com
Tue Apr 27 21:43:47 CEST 2021



On April 27, 2021 6:42:41 PM UTC, sas spss <sas2016spss at gmail.com> wrote:
>Hi, I am checking the service thread example of http server eventlib in
>v4.2 stable:
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/http-server/minimal-http-server-eventlib-smp?h=v4.2-stable
>
>It has a warning on top : "this is under development, it's not stable".
>What part of this example is not stable ?

Well, that has been there a long while, since I started writing gitohashi, which uses it for blame lookups, and sai, which uses it to do the libgit2 clones / fetches at the builders.  Those have been really pretty stable for a long while.  So I guess the warning is crufty.

>Another question:
>
>In a general situation of a high load https server on internet, serving
>data from memory ( 2k -200K in request / response size) after some
>calculations (for example, sort, search in linked list with 10k - 100K
>items), with sufficient physical memory and CPU cores, and with the
>consideration of pthread mutux lock/unlock, context switch overhead,
>which
>model of libwebsocket will perform better: single thread model with
>event
>lib,  1 main thread for business logic (sort, search) + 1 dedicated
>thread
>for network I/O,  or the thread pool one ?    Thanks.

There's not really one answer to that... what is 'high load'... 10 clients active/s, 100?  How many of the expensive computations can be expected per second and how long do they take?  How critical is latency?

If there is cpu intensive work measured in tens or hundreds of ms, doing it on one event loop will limit how you can serve other events.  So if ten connections have events that each cost 100ms for user code to process, it will chew through it but latency for everyone else with events on that loop (including guys you just completed their computation and want to send) spikes to 1s.

Otoh if the cpu cost is say 5ms, unless it's super latency sensitive, 50ms hit is lost in the noise.  If you don't care about, eg, 200ms latency budget, then 40 peak concurrent events costing 5ms is OK, and so on.

In one event loop the code is nice and simple and no locking.

For SMP, it's n of those, if no shared data also no locking and performance is pretty scalable while smp threads < cpu threads.  If shared data, locking starts to block if you're not careful about how long the locks are held, it leads to multiple locks for different purposes which is harder to get right.

For separate worker threads, main loop remains responsive, lws does the thread sync magic, but there are only n worker threads doing one thing at a time.

Answer is usually try it in one event loop first, if problems the core code can be adapted for the other cases.

-Andy

>- Joe


More information about the Libwebsockets mailing list