[Libwebsockets] #331 client application crashes libwebsocket_context_destroy on flaky network tests

Andy Green andy at warmcat.com
Mon Oct 19 18:34:49 CEST 2015

On 19 October 2015 23:42:23 GMT+08:00, Vikas Gahlan <vikas.gahlan at chargepoint.com> wrote:
>Initially, I also thought about same fix what you proposed but when I
>looked into lws_change_pollfd() function code, there are more error
>handling in this function where full cleanup using
>libwebsocket_close_and_free_session() not required IMO. As I am new to

The full close processing for a wsi is very adaptive to the wsi state, but he does assume the wsi has an entry in the fd table.  Basically if he got added to the fd table, he must get removed from it when destroyed and the close flow will handle that.

As you suspect later, lws_change_pollfd() has no meaning if the wsi not in the fd table (since it's not part of the poll() then, and he has no pollfd entry to manipulate).  So that should never happen.

>libwebsocket and still reviewing code to understand architecture , so I
>put changes in oom:  handling to ensure
>libwebsocket_close_and_free_session() is invoked only when there is
>valid entry in fdtable.

Well... if it got as far as putting a socket fd into the fd table, and then errored out, he needs to do the full close processing.  So that fix does fix something that needs fixing.

If there are other things that need fixing, we can solve that as well separately.

>I am using multithreaded application in which
>libwebsocket_client_connect_2 is invoked in 2 different context at
>context setup: 
>1) When my client application invokes libwebsocket_client_connect()
>after successful context creation using libwebsocket_create_context().
>2) If libwebsocket_client_connect() is successful in #A which is mostly
>successful then app creates specific service thread for handling
>websocket communication.
>If you are sure that lws_change_pollfd() is invoked only when sock is
>added in fdlist, then your fix of changes oom to failed in case of
>lws_change_pollfd() error is fine. 

User code can end up calling that asynchronously when using the 'call me back when wsi writeable' callbacks, but he won't / can't / shouldn't do so until the wsi has transitioned to ws protocol which is a long time after he's in the fd list.

HTTP processing that uses pollfd manipulation likewise doesn't happen until we did a poll() and the socket said he had the initial http headers for us; that only happens after he's added to the fd table.

No library use of apis changing POLL state should try to operate on a socket fd not in the table, since it won't work anyway.


>Vikas Gahlan
>-----Original Message-----
>From: Andy Green [mailto:andy.green at linaro.org] On Behalf Of Andy Green
>Sent: Sunday, October 18, 2015 4:25 PM
>To: Vikas Gahlan; libwebsockets at ml.libwebsockets.org
>Subject: Re: [Libwebsockets] #331 client application crashes in
>libwebsocket_context_destroy on flaky network tests
>On 18 October 2015 18:39:39 GMT+08:00, Andy Green <andy at warmcat.com>
>>On 18 October 2015 14:19:20 GMT+08:00, Vikas Gahlan 
>><vikas.gahlan at chargepoint.com> wrote:
>>>Attached patch work for me.
>>>Here is exact trace flow of the crash:
>>>libwebsocket_client_connect_2: address mywebsocket.com closed 
>>>close: just_kill_connection
>>>remove_wsi_socket_from_fds: wsi=0x7a0d8, sock=9, fds pos=1
>>>below fix works for me:
>>>From 300e50b2fe6a033a109d5f1f15208cb8490363e4 Mon Sep 17 00:00:00
>>>From: vgahlan-cpi <vikas.gahlan at chargepoint.com>
>>>Date: Sat, 17 Oct 2015 22:40:23 -0700
>>>Subject: [PATCH] fdpolllist cleanup requires if wsi socket exists in 
>>>fdtable to avoid double free in libwebsocket_context_destroy
>>>lib/client-handshake.c | 9 ++++++---
>>>1 file changed, 6 insertions(+), 3 deletions(-)
>>>diff --git a/lib/client-handshake.c b/lib/client-handshake.c index 
>>>7293a41..d7d72e6 100644
>>>--- a/lib/client-handshake.c
>>>+++ b/lib/client-handshake.c
>>>@@ -291,9 +291,12 @@ struct libwebsocket 
>>>        return wsi;
>>>-       lws_free(wsi->u.hdr.ah);
>>>-       lws_free(wsi);
>>>-       return NULL;
>>>+        /*In case wsi added successfully in fd polllist, clean-up
>>>requires for fdtable and session data properly using 
>>>+        if(wsi->position_in_fds_table < 0  || wsi->sock < 0 ){
>>Thanks... this definitely is a real problem, but the fix needs to be a
>>bit closer to the source of the problem I think.
>>I'll push an alternative fix for you to try in the next minutes.
>How about this?  He was the only path to oom4 after insertion into the
>fds list.
>>>+                lws_free(wsi->u.hdr.ah);
>>>+                lws_free(wsi);
>>>+                return NULL;
>>>+        }
>>>        libwebsocket_close_and_free_session(context, wsi,
>>>Vikas Gahlan
>>>ChargePoint | chargepoint.com
>>>Ofiice: +91-124-4889450 Ext. 464 | Mobile +91-9873232008
>>>This email and any attachments are intended for the sole use of the 
>>>intended recipient(s) and
>>>contain(s) confidential information that may be proprietary,
>>>or copyrighted under
>>>applicable law. If you are not the intended recipient, do not read, 
>>>copy, or forward this email message or any attachments and delete
>>>email message and any attachments immediately.
>>>Libwebsockets mailing list
>>>Libwebsockets at ml.libwebsockets.org
>>Libwebsockets mailing list
>>Libwebsockets at ml.libwebsockets.org
>This email and any attachments are intended for the sole use of the
>intended recipient(s) and
>contain(s) confidential information that may be proprietary, privileged
>or copyrighted under
>applicable law. If you are not the intended recipient, do not read,
>copy, or forward this email
>message or any attachments and delete this email message and any
>attachments immediately.

More information about the Libwebsockets mailing list