[Libwebsockets] Bug? LWS-native poll() amidst custom polls

Felipe Gasper felipe at felipegasper.com
Thu Jun 24 05:16:44 CEST 2021



> On Jun 23, 2021, at 10:51 PM, andy at warmcat.com wrote:
> 
> 
> 
> On June 24, 2021 2:39:41 AM UTC, Felipe Gasper <felipe at felipegasper.com> wrote:
>> 
>> 
>>> On Jun 23, 2021, at 10:20 PM, Felipe Gasper <felipe at felipegasper.com>
>> wrote:
>>> 
>>> 
>>> 
>>>> On Jun 23, 2021, at 9:52 PM, andy at warmcat.com wrote:
>>>> 
>>>> 
>>>> 
>>>> On June 24, 2021 12:07:24 AM UTC, Felipe Gasper
>> <felipe at felipegasper.com> wrote:
>>>>> Hello,
>>>>> 
>>>>> 	I’m working on a Perl binding to LWS
>>>>> (https://github.com/FGasper/p5-Net-Libwebsockets … very much under
>>>>> construction still!) and am seeing some weirdness where LWS still
>> runs
>>>>> its native poll() even though there is a custom event loop
>> configured.
>>>>> 
>>>>> 	I’m attaching a Linux strace: you can see poll() amidst
>> _newselect()
>>>>> calls. Note the last in particular, where it waits 30 seconds …
>> this is
>>>>> the retry.secs_since_valid_hangup.
>>>>> 
>>>>> 	I’m inclined to think this is a LWS bug since “mixed” polling
>> seems
>>>>> like it shouldn’t happen, but just in case … is it possible I’m
>>>>> “holding it wrong”?
>>>> 
>>>> strace is a bit circumstantial... put an assert() on lws default
>> poll() usage and see if that is actually involved.
>>>> 
>>>> Strace is oblique enough that you might be seeing eg, libc blocking
>> getaddrinfo() internal implementation or something else.  30s wait is
>> what you might get on misconfigured dns resolver lookup.
>>> 
>>> That’s what I thought, too. To check it I altered my
>> retry.secs_since_valid_hangup, and that last poll() changed
>> accordingly. So there definitely seems to be something still giving
>> LWS’s timeout to a poll() … which I assume is LWS’s native poll().
>>> 
>>> I’ll try the assert idea. Thank you!
>> 
>> I got:
>> 
>> perl: /home/pi/code/libwebsockets/lib/plat/unix/unix-service.c:153:.
>> _lws_plat_service_tsi: Assertion `"no native poll" == NULL' failed.
>> 
>> There are several poll() calls in the code; I assert()ed before them
>> all then ran my code, and I got the above. When I comment out the
>> assert things run normally; thus, I think this is the only problematic
>> poll() call.
> 
> Well, that is proving what you have been saying then... can you get a backtrace to see how it got there?
> 
> The default poll loop presents as a participant in the event lib stuff, it has its own event lib ops and the context has to pick one at init time.  So it is a mutually exclusive thing, if you set your event lib ops, no normal way to reference the default one.  Something might call it directly I guess.

Ah. It gets there because I’m calling lws_service(ctx, 0) on timeout.

I think I’m not supposed to, though, from looking again at the minimal example. I guess lws_service_adjust_timeout() both does servicing and returns the next appropriate timeout? I think I just wasn’t expecting that; I was expecting there to be a specific way to tell LWS that an event-loop timeout happened.

(Admittedly, there’s significant complexity here that I don’t really get yet … strace shows, e.g., an eventfd and a Netlink socket, whose relationship to WebSocket isn’t clear to me.)

So in the context of what I’m doing, that means my timer callback should basically just do nothing; it exists solely for the purpose of causing the loop to restart to trigger a call to lws_service_adjust_timeout(). I assume there’s some internal timer that LWS checks every time that function runs to see if there’s need to send a ping, etc.

I don’t know what you may think of this, but it might be nice if there were assert()s against custom event loops in the code paths that shouldn’t be called in those contexts. There are probably more pieces in play here than I realize, but it’d be nice to get an error message like “Don’t call lws_service() with a custom event loop.”

Anyhow, this seems to do the trick. Thank you!

-F


More information about the Libwebsockets mailing list