[Libwebsockets] can no more open web socket with lws_client_connect_via_info when a higher number of websocket are already open

Andy Green andy at warmcat.com
Wed Nov 17 18:12:05 CET 2021

On 11/17/21 16:03, Peiffer Eric wrote:
> Hi,
> We use libwebsockets 4.1 on linux debian 10. The last week we uncounter 
> an issue when a high numbers of websocket are  opened. After about 
> 150.000 outgoing web socket connection are opened 
> ,lws_client_connect_via_info return null.
> I thought we reached the limit of the system in terms of fd. But my 
> system team said me that the fd limit is up to 1.000.000 per CPU core.

There could also be intermediaries (eg, router) between your box and the 
internet that is topping out trying to handle that many tcp connections.

> In my application I have put fd_limit_per_thread to 0 then I thought 
> that libwebsocket can use all fd of the system.
> How can I debug what is happened in the lws_client_connect_via_infofunction?
> How can I get the value of errno though libwebsocket?

The lws logs will have information about it, but I guess those logs are 
going to be "a bit busy" if there are already 150K wsi open.

You can also look at the startup logs of lws at context creation time, 
which might be easier to do.

At context creation time, he will allocate for the fds array, to do that 
he tries to find out the OS fd limit using sysconf(_SC_OPEN_MAX); for 
most unix.


Due to some OS (including newer OSX) returning a huge positive integer 
there (not 2^31, but huge) to indicate "unlimited", there's a sanity 
check that if it's more than 10M, it's assumed to be insane and taken to 
a default of 2560.  Otherwise, eg, if it was told 1M, it will try to go 
with that.

Later during context creation, at pt creation time, he logs at info 
verbosity about the allocations he made there


For a more typical default-ulimit -n of 1024 server case it looks like this

[2021/11/17 14:15:21:9750] I: lws_create_context: ctx:  7064B (2968 ctx 
+ pt(1 thr x 4096)), pt-fds: 1024
[2021/11/17 14:15:21:9750] I: lws_create_context:  http: ah_data: 4096, 
ah: 984, max count 1024

the interesting number is pt-fds, that's the max number of pts it 
understood the process could ever see and so allocated for when it 
started up.

I don't know what you app does but there are some lazy closes possible 
for eg h2 or ws-over-h2, the connection is only closed after a grace 
period elapses since the last stream on it closed.  Again the logs 
should be informative if it's recent lws, well-known objects inside lws 
like wsi are accounted for much more explictly than in earlier lws, for 
example creating an h2 client connection you will get something like this

[2021/11/17 17:03:09:2410] N: __lws_lc_tag:  ++ 
[1139008|wsicli|0|GET/h1/default/warmcat.com] (1)

and if that upgrades to h2 and migrates to being a mux stream, that wsi 
is also accounted for

[2021/11/17 17:03:13:8741] N: __lws_lc_tag:  ++ 
[1139008|mux|0|default|h2_sid1_(1139008|wsicli|0)] (1)

the number in brackets is the count of objects in that tag group that 
are extant, "wsicli" being the client connection group and "mux" being 
the mux child stream group, there are only 1 of each after that.  When 
they are destroyed, there are more logs

[2021/11/17 17:03:13:9707] N: __lws_lc_untag:  -- 
[1139008|mux|0|default|h2_sid1_(1139008|wsicli|0)] (0) 96.601ms
[2021/11/17 17:03:13:9980] N: __lws_lc_untag:  -- 
[1139008|wsicli|0|GET/h1/default/warmcat.com] (0) 4.756s

Again the number in brackets is how many objects in that tag group are 
extant and the final number is how long that particular object lived.

So if wsi are persisting longer than expected, the number in brackets 
may exceed the number of active wsi your app cares about due to grace 


> Regards,
> Eric
> _______________________________________________
> Libwebsockets mailing list
> Libwebsockets at ml.libwebsockets.org
> https://libwebsockets.org/mailman/listinfo/libwebsockets

More information about the Libwebsockets mailing list