[Libwebsockets] [libwebsockets] #89: Possible bug with client closing socket

Trac trac at libwebsockets.org
Wed Oct 15 20:09:36 CEST 2014


#89: Possible bug with client closing socket
-----------------------------------+-----------------
 Reporter:  panyam                 |      Owner:
     Type:  defect                 |     Status:  new
 Priority:  major                  |  Milestone:
Component:  libwebsockets library  |    Version:  1.0
 Keywords:                         |
-----------------------------------+-----------------
 Hi,

     We are using libwebsockets in two different environments:

 1. On VMWare's Fusion Box running the same ubuntu 12.04
 2. On EC2 running  ubuntu 12.04

 Both are the exact same image (interms of setup).

 The scenario we have is when the client closes the websocket connection
 (this is on Chrome 37.0.2062.124 by the way), a send returns -1 (due to a
 SIGPIPE - rightly so).  This in turn calls
 libwebsocket_close_and_free_session.

 Now the variation happens here

 On the Fusion box setup (1), libwebsocket_close_and_free_session starts
 off from the WS_STATE_ESTABLISHED, and gets to "just_kill_connection"
 where the socket fd is removed from the registry (wsi->truncated_send_len
 = 0 here).  This in turn throws a LWS_CALLBACK_CLOSED to our callback
 function and all is well.

 On the ec2 instance however, wsi->truncated_send_len is non-zero and
 immediately enters the WSI_STATE_FLUSHING_STORED_SEND_BEFORE_CLOSE state
 from which it never recovers.  This is the code I am referring to (in
 libwebsockets.c - around line 65):

 {{{
  65     case WSI_STATE_FLUSHING_STORED_SEND_BEFORE_CLOSE:
  66         if (wsi->truncated_send_len) {
  67             fprintf(stderr, "wsi->truncated_send_len: %d\n",
 wsi->truncated_send_len);
  68             libwebsocket_callback_on_writable(context, wsi);
  69             return;
  70         }
  71         lwsl_info("wsi %p completed
 WSI_STATE_FLUSHING_STORED_SEND_BEFORE_CLOSE\n", wsi);
  72         goto just_kill_connection;
 }}}

 What happens is we have a busy wait scenario as since the state is
 WSI_STATE_FLUSHING_STORED_SEND_BEFORE_CLOSE, it tries to flush remaining
 data, fails, enters libwebsocket_close_and_free_session and so on.
 Because of this we never get the LWS_CALLBACK_CLOSED event (and end up
 with crazy high cpu usage due to the busy wait).

 I found that the fix was to simply remove the return on 69, which ensures
 that the socket is cleanly removed.   My reasoning was that since the
 socket is closed anyway no point in flushing and writing anything to it.

 But I am assuming that this method is unable to distinguish at this point
 whether the connection is dead or not (as it has no way of knowing what
 caused it to be called) which means a flush actually would make sense in
 some scenarios (ie just before a voluntary close from the server side).

 So a couple of questions I had was:

 1. Is my assumption about this correct?
 2. I understand that I am unable to produce a simple test program to repro
 this (and it could even be due to differences in the networking drivers in
 the two envs) so is there anything else we are missing?
 3. Can we pass a flag to this method to avoid a flush since the caller
 seems to be knowing why it is calling this method anyway?

 Appreciate your help and apologies in advance for any obvious oversight on
 my part.

 cheers
 Sriram

--
Ticket URL: <http://libwebsockets.org/trac/libwebsockets/ticket/89>
libwebsockets <http://libwebsockets.org>
libwebsockets C library



More information about the Libwebsockets mailing list