"Andy Green (林安廸)"
andy at warmcat.com
Fri Jan 25 03:25:19 CET 2013
On 24/01/13 23:14, the mail apparently from Jack Mitchell included:
> On 23/01/13 12:01, "Andy Green (林安廸)" wrote:
>> On 23/01/13 19:54, the mail apparently from Jack Mitchell included:
>>> On 18/01/13 23:54, Andy Green wrote:
>>>> Hi -
>>>> Is your code arranged like the test server in terms of using the "call
>>>> me back when I am writable" api when you have something to send, and
>>>> writing a single thing in the "I am writable" callback?
>>>> The mystery here is how you end up trying to do multiple things with a
>>>> dead socket, the library shouldn't be able to call you back even once
>>>> under those circumstances. However if your code took the (wrong)
>>>> approach to store the wsi and randomly try to send on it, that can
>>>> easily happen.
>>>> Jack Mitchell <ml at communistcode.co.uk> wrote:
>>>> On 18/01/13 15:42, Jack Mitchell wrote:
>>>> On 18/01/13 14:04, "Andy Green (林安廸)" wrote:
>>>> On 18/01/13 21:20, the mail apparently from Jack Mitchell
>>>> included: Hi -
>>>> Today I tried out the latest libwebsockets master in
>>>> my embedded application and gave it a good thrashing.
>>>> I managed to reproduce a segfault a few times - I have
>>>> had this issue before but thought I had fixed it but
>>>> it has reared it's ugly head again in this new
>>>> release. I
>>>> Hm sorry to hear that but I am glad to hear you are
>>>> beating on the library HEAD.
>>>> have attached a valgrind trace below in the hope that
>>>> someone could help me out. I think it is trying to
>>>> write to a dead socket (null pointer) and bailing out.
>>>> Should there be some extra error checking somewhere to
>>>> ensure that a dead socket is never written to?
>>>> Until this week it would have been too expensive, but with
>>>> the new lookup array approach it should be possible to
>>>> cheaply confirm the struct websocket you have hold of
>>>> still jibes with the pollfd it claims to hold and the fds
>>>> match. I added an api lws_confirm_legit_wsi()
>>>> and used it on libwebsocket_write... if you think that's
>>>> the problem you can sprinkle them around and see if it
>>>> fires. It looks for any inconsistency between what the
>>>> struct websocket thinks its position in in the polling
>>>> table and what the polling table thinks. I wasn't really
>>>> able to tie up the valgrind log with the idea something
>>>> blows segfaults. The log shows a memcpy inside deflate is
>>>> reading 2 bytes it shouldn't? -Andy
>>>> I'm going to investigate some more and will let you
>>>> know if I find a solution! <snip>
>>>> Hi Andy, I turned the DEBUG levels right up (1 | 2 | 4 | 8)
>>>> and it stopped the segfault. I would assume this means that
>>>> somewhere there is maybe some error checking code that the
>>>> debug ifdefs out? Jack.
>>>> Below is a log of me thrashing it so you can see which parts of
>>>> the code
>>>> I am giving a good kicking.
>>> Hi Andy,
>>> I cannot produce this any more in the latest head. I will go over my
>>> websocket implementation again at some point to be sure that it's not
>>> just chance.
>> If you see it again run it under gdb like this
>> gdb --args libwebsocket-test-server
>> > run
>> if it blows chunks, you can use
>> > bt
>> to get a nice backtrace that will nail down where the problem is.
> Hi Andy,
> So we're back:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb51ff470 (LWP 966)]
> deflate_fast (s=0x12cc28, flush=-1256198920) at deflate.c:1652
> 1652 INSERT_STRING(s, s->strstart, hash_head);
> (gdb) bt
Thanks a lot for the clear backtrace, it eliminates a lot of guesswork.
> #0 deflate_fast (s=0x12cc28, flush=-1256198920) at deflate.c:1652
> #1 0xb6d46ee4 in deflate (strm=strm at entry=0xb5170, flush=-1227591964,
> flush at entry=2) at deflate.c:901
> #2 0xb6e8505c in lws_extension_callback_deflate_frame
> (context=<optimized out>, ext=<optimized out>,
> wsi=<optimized out>, reason=<optimized out>, user=0xb5138,
> in=0xb51fcbb0, len=0) at extension-deflate-frame.c:224
> #3 0xb6e844d0 in libwebsocket_write (wsi=wsi at entry=0x12f820,
> buf=buf at entry=0xb51fcc2a
> len=<optimized out>, protocol=protocol at entry=LWS_WRITE_TEXT) at
> #4 0x0000ecf4 in webSock_genericSendRecieve (context=<error reading
> variable: value has been optimized out>,
> wsi=0x12f820, wsi at entry=<error reading variable: value has been
> optimized out>,
> reason=<error reading variable: value has been optimized out>,
> user=<error reading variable: value has been optimized out>,
> in at entry=<error reading variable: value has been optimized out>,
> len at entry=<error reading variable: value has been optimized out>)
> at webInterfaces/webInterface_webSockets.c:99
> #5 0xb6e812b8 in user_callback_handle_rxflow
> (callback_function=<optimized out>, context=context at entry=0x47000,
> wsi=0x12f820, reason=reason at entry=LWS_CALLBACK_BROADCAST,
> user=0xf9768, in=in at entry=0xb51fdc8a,
> len=len at entry=4118) at libwebsockets.c:1347
> #6 0xb6e8137c in libwebsockets_broadcast (protocol=0x21454
> <systemConf+124>, buf=0xb51fdc8a "", len=4118)
> at libwebsockets.c:2138
Right tickCheck() is coming from another thread and randomly wanting to
send things, even while the libwebsockets service is happening in
All lws activity must be occurring in a single service thread only. To
allow what you're trying to do though, lws uses internal local sockets
to serialize broadcast requests from other threads. The broadcast
action in your tick thread should resolve to a send action on these
local sockets, which the service loop services when it gets around to it.
But from the backtrace, the library is using the direct path instead and
basically killing zlib by trying to use it two ways at the same time on
the same connection eventually.
--enable-nofork on configure will short broadcasts out like that (it's
basically saying there are no other threads), but looking at the code in
fact even without that it currently relies on your calling
libwebsockets_fork_service_loop() to set up the local broadcast sockets.
I'll have a proper look at it later today and see if that can be broken
out or merged somewhere else, if your code will allow it you might try
libwebsockets_fork_service_loop() in the meanwhile.
> #7 0x0000ef44 in webSock_broadcastJsonObject (jsonObj=0xb5a054e0,
> jsonObj at entry=0x0,
> card=card at entry=0x213d8 <systemConf>) at
> #8 0x0000b8a0 in XX86data_updateAll (card=card at entry=0x213d8
> <systemConf>) at XX86/XX86_data.c:203
> #9 0x0000afdc in XX86_processFPGAData (card=card at entry=0x213d8
> <systemConf>) at XX86/XX86.c:143
> #10 0x0000da0c in XX86_tickCheck (voidCard=0x213d8 <systemConf>) at
> #11 0x4e3c6f5c in start_thread (arg=0xb51ff470) at pthread_create.c:313
> #12 0x4e30e0d8 in ?? () from /lib/libc.so.6
> #13 0x4e30e0d8 in ?? () from /lib/libc.so.6
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> I've got a hunch that I'm going to investigate but if you have any ideas...
More information about the Libwebsockets