[Libwebsockets] Segfault

Jack Mitchell ml at communistcode.co.uk
Fri Jan 25 11:02:33 CET 2013


On 25/01/13 02:25, "Andy Green (林安廸)" wrote:
> On 24/01/13 23:14, the mail apparently from Jack Mitchell included:
>> On 23/01/13 12:01, "Andy Green (林安廸)" wrote:
>>> On 23/01/13 19:54, the mail apparently from Jack Mitchell included:
>>>> On 18/01/13 23:54, Andy Green wrote:
>>>>> Hi -
>>>>>
>>>>> Is your code arranged like the test server in terms of using the 
>>>>> "call
>>>>> me back when I am writable" api when you have something to send, and
>>>>> writing a single thing in the "I am writable" callback?
>>>>>
>>>>> The mystery here is how you end up trying to do multiple things 
>>>>> with a
>>>>> dead socket, the library shouldn't be able to call you back even once
>>>>> under those circumstances. However if your code took the (wrong)
>>>>> approach to store the wsi and randomly try to send on it, that can
>>>>> easily happen.
>>>>>
>>>>> -Andy
>>>>>
>>>>>
>>>>> Jack Mitchell <ml at communistcode.co.uk> wrote:
>>>>>
>>>>>     On 18/01/13 15:42, Jack Mitchell wrote:
>>>>>
>>>>>         On 18/01/13 14:04, "Andy Green (林安廸)" wrote:
>>>>>
>>>>>             On 18/01/13 21:20, the mail apparently from Jack Mitchell
>>>>>             included: Hi -
>>>>>
>>>>>                 Today I tried out the latest libwebsockets master in
>>>>>                 my embedded application and gave it a good 
>>>>> thrashing..
>>>>>                 I managed to reproduce a segfault a few times - I 
>>>>> have
>>>>>                 had this issue before but thought I had fixed it but
>>>>>                 it has reared it's ugly head again in this new
>>>>> release. I
>>>>>
>>>>>             Hm sorry to hear that but I am glad to hear you are
>>>>>             beating on the library HEAD.
>>>>>
>>>>>                 have attached a valgrind trace below in the hope that
>>>>>                 someone could help me out. I think it is trying to
>>>>>                 write to a dead socket (null pointer) and bailing 
>>>>> out.
>>>>>                 Should there be some extra error checking 
>>>>> somewhere to
>>>>>                 ensure that a dead socket is never written to?
>>>>>
>>>>>             Until this week it would have been too expensive, but 
>>>>> with
>>>>>             the new lookup array approach it should be possible to
>>>>>             cheaply confirm the struct websocket you have hold of
>>>>>             still jibes with the pollfd it claims to hold and the fds
>>>>>             match. I added an api lws_confirm_legit_wsi()
>>>>>
>>>>> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=acbaee649ab62beb34609d4b79e8814a2913430f 
>>>>>
>>>>>
>>>>>
>>>>>             and used it on libwebsocket_write... if you think that's
>>>>>             the problem you can sprinkle them around and see if it
>>>>>             fires. It looks for any inconsistency between what the
>>>>>             struct websocket thinks its position in in the polling
>>>>>             table and what the polling table thinks. I wasn't really
>>>>>             able to tie up the valgrind log with the idea something
>>>>>             blows segfaults. The log shows a memcpy inside deflate is
>>>>>             reading 2 bytes it shouldn't? -Andy
>>>>>
>>>>>                 I'm going to investigate some more and will let you
>>>>>                 know if I find a solution! <snip>
>>>>>
>>>>>         Hi Andy, I turned the DEBUG levels right up (1 | 2 | 4 | 8)
>>>>>         and it stopped the segfault. I would assume this means that
>>>>>         somewhere there is maybe some error checking code that the
>>>>>         debug ifdefs out? Jack.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     Below is a log of me thrashing it so you can see which parts of
>>>>> the code
>>>>>     I am giving a good kicking.
>>>>>
>>>>>     <snip>
>>>>>
>>>>
>>>> Hi Andy,
>>>>
>>>> I cannot produce this any more in the latest head. I will go over my
>>>> websocket implementation again at some point to be sure that it's not
>>>> just chance.
>>>
>>> If you see it again run it under gdb like this
>>>
>>> gdb --args libwebsocket-test-server
>>>
>>> > run
>>>
>>> if it blows chunks, you can use
>>>
>>> > bt
>>>
>>> to get a nice backtrace that will nail down where the problem is.
>>>
>>> -Andy
>>>
>>
>> Hi Andy,
>>
>> So we're back:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb51ff470 (LWP 966)]
>> deflate_fast (s=0x12cc28, flush=-1256198920) at deflate.c:1652
>> 1652                INSERT_STRING(s, s->strstart, hash_head);
>>
>> (gdb) bt
>
> Thanks a lot for the clear backtrace, it eliminates a lot of guesswork.
>
>> #0  deflate_fast (s=0x12cc28, flush=-1256198920) at deflate.c:1652
>> #1  0xb6d46ee4 in deflate (strm=strm at entry=0xb5170, flush=-1227591964,
>> flush at entry=2) at deflate.c:901
>> #2  0xb6e8505c in lws_extension_callback_deflate_frame
>> (context=<optimized out>, ext=<optimized out>,
>>      wsi=<optimized out>, reason=<optimized out>, user=0xb5138,
>> in=0xb51fcbb0, len=0) at extension-deflate-frame.c:224
>> #3  0xb6e844d0 in libwebsocket_write (wsi=wsi at entry=0x12f820,
>>      buf=buf at entry=0xb51fcc2a
>> "{\"method\":[\"updateData\"],\"parameters\":{\"43\":{\"val\":477},\"45\":{\"val\":15566}}}", 
>>
>> len=<optimized out>, protocol=protocol at entry=LWS_WRITE_TEXT) at
>> output.c:319
>> #4  0x0000ecf4 in webSock_genericSendRecieve (context=<error reading
>> variable: value has been optimized out>,
>>      wsi=0x12f820, wsi at entry=<error reading variable: value has been
>> optimized out>,
>>      reason=<error reading variable: value has been optimized out>,
>>      user=<error reading variable: value has been optimized out>,
>> in=0xb51fdc8a,
>>      in at entry=<error reading variable: value has been optimized out>,
>> len=4118,
>>      len at entry=<error reading variable: value has been optimized out>)
>> at webInterfaces/webInterface_webSockets.c:99
>> #5  0xb6e812b8 in user_callback_handle_rxflow
>> (callback_function=<optimized out>, context=context at entry=0x47000,
>>      wsi=0x12f820, reason=reason at entry=LWS_CALLBACK_BROADCAST,
>> user=0xf9768, in=in at entry=0xb51fdc8a,
>>      len=len at entry=4118) at libwebsockets.c:1347
>> #6  0xb6e8137c in libwebsockets_broadcast (protocol=0x21454
>> <systemConf+124>, buf=0xb51fdc8a "", len=4118)
>>      at libwebsockets.c:2138
>
> Right tickCheck() is coming from another thread and randomly wanting 
> to send things, even while the libwebsockets service is happening in 
> another thread.

Correct.

>
> All lws activity must be occurring in a single service thread only.  
> To allow what you're trying to do though, lws uses internal local 
> sockets to serialize broadcast requests from other threads. The 
> broadcast action in your tick thread should resolve to a send action 
> on these local sockets, which the service loop services when it gets 
> around to it.

This is a slightly simplified version of what I am running

int main()
{

    // Create main web socket
    controlWebSocket = controlWeb_setupWebSocket(controlWebSocket);

    int i;

    // Setup card threads
    for (i=0; i<NUM_TOTAL_CARDS; i++)
    {

       pthread_attr_init(&cardThreadAtt[i]);
       pthread_attr_setdetachstate(&cardThreadAtt[i], 
PTHREAD_CREATE_DETACHED);
       pthread_attr_setschedpolicy(&cardThreadAtt[i], SCHED_FIFO);

    }

    // Set off the cards
    for (i=0; i<NUM_TOTAL_CARDS; i++)
    {

       // Ensure card is alive and well, therefore start the thread to 
poll and manage it
       if (card[i]->status == CARD_ALIVE)
       {

         cardWebSocketContext[i] = 
startWebSocket(card[card->position].webSocketProtocol, 
cardWebSocketContext[card->position], card->position);

         // this is where the tickCheck() functions essentially jump in from
          pthread_create( &cardThread[i], &cardThreadAtt[i], 
card[i]->entryFunction, (void*) card[i]);

       }

    }

     while (1)
     {

       /* Sleep before looping */
       usleep(50000);

       /* Service the websocket here to listen for connection attempts */
       libwebsocket_service(controlWebSocket, 0);

       for (i=0; i<NUM_TOTAL_CARDS; i++)
       {

          if (card[i]->status == CARD_ALIVE)
          {

             libwebsocket_service(cardWebSocketContext[i], 0);

          }

       }

     }

}

>
> But from the backtrace, the library is using the direct path instead 
> and basically killing zlib by trying to use it two ways at the same 
> time on the same connection eventually.

Yes, I have seen backtraces in different places, but always in deflate.c

>
> --enable-nofork on configure will short broadcasts out like that (it's 
> basically saying there are no other threads), but looking at the code 
> in fact even without that it currently relies on your calling 
> libwebsockets_fork_service_loop() to set up the local broadcast sockets.
>
> I'll have a proper look at it later today and see if that can be 
> broken out or merged somewhere else, if your code will allow it you 
> might try libwebsockets_fork_service_loop() in the meanwhile.

I'm a bit worried about having 4 forked service loops with no control 
over them - I'll investigate the service loop code and see what it 
actually does and if I'm happy using it.

Many thanks for your continued co-operation!

Cheers,
Jack.

>
> -Andy
>
>> #7  0x0000ef44 in webSock_broadcastJsonObject (jsonObj=0xb5a054e0,
>> jsonObj at entry=0x0,
>>      card=card at entry=0x213d8 <systemConf>) at
>> webInterfaces/webInterface_webSockets.c:223
>> #8  0x0000b8a0 in XX86data_updateAll (card=card at entry=0x213d8
>> <systemConf>) at XX86/XX86_data.c:203
>> #9  0x0000afdc in XX86_processFPGAData (card=card at entry=0x213d8
>> <systemConf>) at XX86/XX86.c:143
>> #10 0x0000da0c in XX86_tickCheck (voidCard=0x213d8 <systemConf>) at
>> XX86/XX86_init.c:118
>> #11 0x4e3c6f5c in start_thread (arg=0xb51ff470) at pthread_create.c:313
>> #12 0x4e30e0d8 in ?? () from /lib/libc.so.6
>> #13 0x4e30e0d8 in ?? () from /lib/libc.so.6
>> Backtrace stopped: previous frame identical to this frame (corrupt 
>> stack?)
>>
>> I've got a hunch that I'm going to investigate but if you have any 
>> ideas...
>>
>> Cheers,
>>
>


-- 

   Jack Mitchell (jack at embed.me.uk)
   Embedded Systems Engineer
   http://www.embed.me.uk

--




More information about the Libwebsockets mailing list