[Libwebsockets] 100% CPU and lws_adopt_socket: fail. Possible LWS issue?

Andy Green andy at warmcat.com
Tue Jul 19 01:09:02 CEST 2016


On Mon, 2016-07-18 at 14:03 +0200, Thomas Spitz wrote:
> Hello Andy,
> 
> Until July 7th, I haven't yet any blocage but looking just a bit in
> lws code, I saw that in case context is destroyed by mistake (I don't
> see why) it could reach to infinite loop as it returns 1 instead of
> -1 in lws-plat-unix.c:
> > 	if (!context || !context->vhost_list)
> > 		return 1;

That's not quite what goes on there, if you destroy the context, you
free the memory behind it, but that in itself doesn't set the user's
context pointer to NULL.

So it means actually "something bad happens if the user code has a NULL
context pointer" but the fact you got as far as trying to do lws
service with a NULL context means nothing good could possibly happen
anyway, and probably you segfaulted somewhere else first.

In the case you did call lws_context_destroy() the context is
invalidated, it will certainly segfault sooner or later when that
memory is reallocated and the original contents trashed.

So although you're right it'd spin, no point worrying about it I think.

-Andy

> Best regards,
> Thomas
> 
> 
> 2016-07-07 12:33 GMT+02:00 Thomas Spitz <thomas.spitz at hestia-france.c
> om>:
> > Hello Andy,
> > 
> > > Can you get these CLOSE_WAIT connections using the test apps?  I
> > > try to
> > > make it happen here both with lwsws + libuv and the poll() test
> > > server
> > > + client + browser, but he always closes cleanly.  TIME_WAIT is
> > > expected and clears up after 60 also as expected (the connection
> > > is
> > > gone).
> > Even with  my app, it is very hard to make it happens (once it took
> > months and most of the time weeks...). I supposed it is due to
> > special phone / tab app usage but I can't reproduce the problem by
> > myself.
> > It occured twice on an embedded device opened to the web to every
> > one. I am going to put further log info on this embeded device
> > opened to the web and expect to see further info...
> > 
> > > Is it that this symptom with the spinning creates the conditions
> > > where
> > > the CLOSE_WAIT appears?
> > I think so. I also have ESTABLISHED connections whereas nobody is
> > connected anymore...
> > 
> > I will let you know at the next spinning blocage...
> > 
> > Best regards,
> > Thomas
> > 
> > 2016-07-07 2:19 GMT+02:00 Andy Green <andy at warmcat.com>:
> > > On Wed, 2016-07-06 at 18:22 +0200, Thomas Spitz wrote:
> > > > > No, by default tcp doesn't randomly send things.  You can use
> > > "tcp
> > > > > keepalives" to make it randomly send things at specified
> > > intervals
> > > > > as probes for a dead link.  Otherwise if you're not sending
> > > > > anything either at application layer it can go on forever
> > > believing
> > > > > it's just idle.  (In ssl case that might do its own
> > > housekeeping on
> > > > > the link, I'm not sure of the conditions.)
> > > > > This is talking about situations where the physical link
> > > disappears
> > > > > without notification, eg phone runs out of battery.  Just
> > > closing
> > > > > the connection with the link still valid should be nice and
> > > clean.
> > > 
> > > > In fact the server send the time to each client every minute
> > > (so this
> > > > is a kind of keep alive). On the client side, if it doesn't
> > > receive a
> > > > messsage before 1min and 5s then it closes the connection as
> > > well. Do
> > > > the server has to do something in case it sends the new time to
> > > a
> > > > "dead connection" (I thought TCP would close the connection for
> > > > me...)?
> > > 
> > > If you're periodically tring to pass traffic, that should allow
> > > you to
> > > know if anything has gone wrong.  Closing the connection in LWS
> > > should
> > > be enough then.
> > > 
> > > > > >I also made a test program that progressively increase the
> > > number
> > > > > of
> > > > > >simultenous client connections and all connections are
> > > stopped
> > > > > (after 1
> > > > > >minute of TIME_WAIT in some case)
> > > > >
> > > > > TIME_WAIT is okay.
> > > > >
> > > > > CLOSE_WAIT isn't okay.  I'll check this out tomorrow.
> > > 
> > > Can you get these CLOSE_WAIT connections using the test apps?  I
> > > try to
> > > make it happen here both with lwsws + libuv and the poll() test
> > > server
> > > + client + browser, but he always closes cleanly.  TIME_WAIT is
> > > expected and clears up after 60 also as expected (the connection
> > > is
> > > gone).
> > > 
> > > Please try the test apps for this from your side.
> > > 
> > > Is it that this symptom with the spinning creates the conditions
> > > where
> > > the CLOSE_WAIT appears?
> > > 
> > > It's very helpful if you can reduce this to something both of us
> > > can
> > > reproduce with the test apps, perhaps with a little patch to
> > > align with
> > > what your code does.
> > > 
> > > -Andy
> > > 
> > > 
> > > 
> > 
> > 
> 



More information about the Libwebsockets mailing list