[Libwebsockets] LWS assertion probably due to start/stop of a pppd daemon

andy at warmcat.com andy at warmcat.com
Tue Sep 22 18:36:51 CEST 2020



On September 22, 2020 4:10:56 PM UTC, Thomas Spitz <thomas.spitz at hestia-france.com> wrote:
>Hello everyone,
>
>I've been using LWS in my main program for years without stability
>issues.
>
>My main program is launching a LWS server through a thread in the same
>manner as minimal-ws-server.c (creating lws context, creating vhost
>with my
>ws-protocol and looping on lws_service) :
>
>> pthread_attr_init(&attr);
>
>pthread_attr_setdetachstate(&attr,
>> PTHREAD_CREATE_DETACHED);pthread_create(&ThreadServeurWEBSOCKETS,
>&attr,
>> serveurwebsockets, NULL)
>
>This thread is managing a wsi list, receiving and sending WS messages
>
>Now, my program also launches a pppd daemon through a thread using fork
>+
>exec. This pppd daemon is used to set a mobile backup internet
>connection
>but this internet connection is never used by WS clients at the
>present.
>
>Testing with only one WS client, switching ON/OFF the mobile backup
>internet connections many times, this WS client is generaly able to
>stay
>connected or connects / disconnects without issue. However, sometimes
>(1
>time among 40 in average), the WS server creates an assertion.
>
>Here a log example with LWS: 4.1.0-22d043a, loglevel 31.:
...
>I think the cause of the above assertion is not due to lws itself but I
>don't know how to proceed in order to find the issue.
>
>> assert(context->lws_lookup[wsi->desc.sockfd -
>>                           lws_plat_socket_offset()] == 0);
>
>
>Do you think this could be due to a fd that would be copied during the
>pppd
>fork+exec?

Iiui it's telling you that lws has a particular fd in use, that has not indicated that it has closed... if an fd is in use, we'll normally hear about its disconnection by POLLERR or POLLIN, when we go to read() from it we get -1 back.  Then we understand he's dead, we do the wsi close flow and logically close the fd as part of that, removing it from the fd tables and destroying / freeing the wsi too until no trace of it.

Later the same fd in the process can be reused, but it comes again fresh to the whole thing from scratch.

The kind of error you have, it's like, eg, you have manually closed an fd that lws has in use, outside of lws and without lws-side processing of the close.  The now meaningless wsi is sitting there doing nothing but still registers ownership of the fd.  But then a new connection comes recycling that fd as if it's something new.

If it feels a problem on lws side, best thing is try to reproduce it with skeletal changes to a minimal example.  And run your own stuff under valgrind, it can report then if you, eg, have a bug where your code closes the wrong fd.

-Andy

>Thanks in advance for your help
>Best regards,
>Thomas


More information about the Libwebsockets mailing list