[Libwebsockets] lws client issues lws_close_reason but socket is not closed for 20 secs

Andy Green andy at warmcat.com
Sat Feb 1 07:46:32 CET 2020



On 1/31/20 4:20 PM, Dave Horton wrote:

> I have an lws client running against a non-lws websocket server.  I
> suspect what I am seeing is a problem in the server implementation,
> but I need some insight into how the client handles close.
> 
> I am calling lws_close_reason(wsi, LWS_CLOSE_STATUS_NORMAL, NULL, 0);
> in my lws callback and then return -1 from the callback.
> 
> I expect this to close the websocket immediately. (I know I’m
> probably wrong on this)

lws_close_reason() just stashes your reason code, it's the return -1 
that starts closing the wsi.

During the close flow, it might or might not try to stage it by waiting 
to see if it can send your logical close frame.

> What I see happening is that the socket is closed on the server side
> exactly 20 secs later.
> 
> If I take a network trace, I do not see any tcp message from the
> client to the server at the point that I called lws_close_reason
> 
> Instead I see a PING message 20 seconds later, followed by a CLOSE
> frame.
...

> My suspicion is that the lws client is possibly waiting for
> outstanding binary frames I have been sending to be ACK’ed, or
> something like that?

These logical close frames are "best effort" only.

If the close flow thinks it makes sense to try to send the protocol 
frame for logical closing (eg, it isn't closing because it just noticed 
the tls tunnel / tcp connection has closed already) it asks for a 
callback on writeable and halts the close partway.  The 20s is probably 
the timeout for that.

In ws, control frames can't interrupt a frame that's in progress, so if 
you're in the middle of sending a big data frame via multiple times 
around the event loop, it's going to start the close process, stop you 
sending any more data since it's closing, be unable to send a new close 
frame, and deadlock until timeout.

> If I compare to a trace from a “good” working server (meaning it
> disconnects immediately upon lws_close_reason) I see the following
> differences
> 
> interaction with “bad” server: ———————————
> 
> client sends 2 BINARY frames (always 640 bytes each) with tcp flags
> ACK,PUSH and websocket flag FIN set server sends ACK back, containing
> acknowledgement number of next-seq in the second BINARY frame ..this
> repeats until lws_close_reason is called at which point there is a 20
> sec delay after last ACK is received from server until PING,CLOSE
> 
> interaction with “good” server —————————————
> 
> client sends 1 BINARY frame (usually 1240 bytes or more, variable
> length) with ACK, PUSH and websocket flag FIN set server sends ACK
> back, containing acknowledgement number of seq in BINARY frame above
> (note difference from above) client sends TCP frame with ACK, PUSH
> set acknowledging above frame server sends ACK backm containing
> acknowledgement number of TCP frame immediately above …this repeats
> until lws_close_reason is called at which point CLOSE frame is
> immediately sent
> 
> 
> Any insight into what is going on with the 20-sec delay?

I can't really do anything with that.  If there's a problem in lws, a 
good way is hack the closest minimal example to demonstrate it and let 
me have the diff and instructions on how to reproduce it so I can look 
at it.

If that's not possible, then enable verbose logging and look at / trace 
the flow through __lws_close_free_wsi() to see what's happening.

-Andy

> Dave
> 
> 
> 
> _______________________________________________ Libwebsockets mailing
> list Libwebsockets at ml.libwebsockets.org 
> https://libwebsockets.org/mailman/listinfo/libwebsockets
> 


More information about the Libwebsockets mailing list