[Libwebsockets] Bug? LWS-native poll() amidst custom polls

andy at warmcat.com andy at warmcat.com
Thu Jun 24 07:08:44 CEST 2021

On June 24, 2021 3:16:44 AM UTC, Felipe Gasper <felipe at felipegasper.com> wrote:
>> On Jun 23, 2021, at 10:51 PM, andy at warmcat.com wrote:
>> On June 24, 2021 2:39:41 AM UTC, Felipe Gasper
><felipe at felipegasper.com> wrote:
>>>> On Jun 23, 2021, at 10:20 PM, Felipe Gasper
><felipe at felipegasper.com>
>>> wrote:
>>>>> On Jun 23, 2021, at 9:52 PM, andy at warmcat.com wrote:
>>>>> On June 24, 2021 12:07:24 AM UTC, Felipe Gasper
>>> <felipe at felipegasper.com> wrote:
>>>>>> Hello,
>>>>>> 	I’m working on a Perl binding to LWS
>>>>>> (https://github.com/FGasper/p5-Net-Libwebsockets … very much
>>>>>> construction still!) and am seeing some weirdness where LWS still
>>> runs
>>>>>> its native poll() even though there is a custom event loop
>>> configured.
>>>>>> 	I’m attaching a Linux strace: you can see poll() amidst
>>> _newselect()
>>>>>> calls. Note the last in particular, where it waits 30 seconds …
>>> this is
>>>>>> the retry.secs_since_valid_hangup.
>>>>>> 	I’m inclined to think this is a LWS bug since “mixed” polling
>>> seems
>>>>>> like it shouldn’t happen, but just in case … is it possible I’m
>>>>>> “holding it wrong”?
>>>>> strace is a bit circumstantial... put an assert() on lws default
>>> poll() usage and see if that is actually involved.
>>>>> Strace is oblique enough that you might be seeing eg, libc
>>> getaddrinfo() internal implementation or something else.  30s wait
>>> what you might get on misconfigured dns resolver lookup.
>>>> That’s what I thought, too. To check it I altered my
>>> retry.secs_since_valid_hangup, and that last poll() changed
>>> accordingly. So there definitely seems to be something still giving
>>> LWS’s timeout to a poll() … which I assume is LWS’s native poll().
>>>> I’ll try the assert idea. Thank you!
>>> I got:
>>> perl: /home/pi/code/libwebsockets/lib/plat/unix/unix-service.c:153:.
>>> _lws_plat_service_tsi: Assertion `"no native poll" == NULL' failed.
>>> There are several poll() calls in the code; I assert()ed before them
>>> all then ran my code, and I got the above. When I comment out the
>>> assert things run normally; thus, I think this is the only
>>> poll() call.
>> Well, that is proving what you have been saying then... can you get a
>backtrace to see how it got there?
>> The default poll loop presents as a participant in the event lib
>stuff, it has its own event lib ops and the context has to pick one at
>init time.  So it is a mutually exclusive thing, if you set your event
>lib ops, no normal way to reference the default one.  Something might
>call it directly I guess.
>Ah. It gets there because I’m calling lws_service(ctx, 0) on timeout.
>I think I’m not supposed to, though, from looking

Right, that exposed looping is a crutch only needed for the default loop and legacy EXTERNAL_POLL.  All other event libs, including your custom one, run their own loop and never return from that api call until the loop exits.  So there is no place for any such loop in lws user code.

Lws supports 'foreign loops', where it's appearance on the loop is just as a guest.  That way, he is not involved at all in creating or starting the loop, is not allowed to stop or interrupt it, his context creation and destruction after carefully cleaning up after itself, his entire life's cpu usage is done in event lib callbacks.

This is the situation the example duplicates.

 again at the minimal
>example. I guess lws_service_adjust_timeout() both does servicing and
>returns the next appropriate timeout? I think I just
 wasn’t expecting
>that; I was expecting there to be a specific way to tell LWS that an
>event-loop timeout happened.

Yes, he services ripe timeouts too.  Once you confirmed the example works just follow it (as far as possible, the actual custom loop you want to work with might not be so nice).

>(Admittedly, there’s significant complexity here that I don’t really
>get yet … strace shows, e.g., an eventfd and a Netlink socket, whose
>relationship to WebSocket isn’t clear to me.)

If you don't care you can turn off netlink monitoring at cmake.  But it allows lws to 1) track interfaces, source addresses and gateways to perform dns result sorting, and 2) allows lws to respond to existing connections becoming unroutable immediately by closing them.  Eventually this info will be used for additional desirable features.

>So in the context of what I’m doing, that means my timer callback
>should basically just do nothing; it exists solely for the purpose of
>causing the loop to restart to trigger a call to
>lws_service_adjust_timeout(). I assume there’s some internal timer that
>LWS checks every time that function runs to see if there’s need to send
>a ping, etc.

Lws maintains a sorted, us-resolution list of scheduled events.  If the amount of time you said you would wait on idle is more than the earliest event, it's trimmed to reflect the earliest event.  The custom loop must be adapted to follow that.

In cases where it can't, the real event lib support (also only using the set of event lib ops available here) instantiate an event lib native timer object that fires at the time needed for the earliest scheduled event instead.

>I don’t know what you may think of this, but it might be nice if there
>were assert()s against custom event loops in the code paths that
>shouldn’t be called in those contexts. There are probably more pieces
>in play here than I realize, but it’d be nice to get an error message
>like “Don’t call lws_service() with a custom event loop.”

I'm not sure this is going to come up very often, the event lib related examples show how to do it and encourage cut and paste.

The better way is unify the arrangements for the default loop to look the same.  But atm there are users who rely on doing stuff inside the explicit loop.


>Anyhow, this seems to do the trick. Thank you!

More information about the Libwebsockets mailing list