[Libwebsockets] Segfault

Jack Mitchell ml at communistcode.co.uk
Tue Jan 29 00:05:48 CET 2013


On 01/28/13 14:41, "Andy Green (林安廸)" wrote:
> On 28/01/13 22:17, the mail apparently from Jack Mitchell included:
>> On 28/01/13 10:08, Jack Mitchell wrote:
>>> On 25/01/13 22:24, "Andy Green (林安廸)" wrote:
>>>> On 25/01/13 23:04, the mail apparently from Jack Mitchell included:
>>>>> On 25/01/13 14:29, Jack Mitchell wrote:
>>>>>> On 25/01/13 13:11, "Andy Green (林安廸)" wrote:
>>>>>>> <snip>
>>>>>>>
>>>>>>> I studied it thisevening and changed this code around somewhat.
>>>>>>> Basically you should use libwebsockets_broadcast_foreign() from
>>>>>>> another thread now and it will take care of things properly without
>>>>>>> needing to use libwebsockets_fork_service_loop().  I added some
>>>>>>> documentation about it to README.coding also in this patch:
>>>>>>>
>>>>>>> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=52f28ce67acd96f468d9ebbfe6f61fea5be4502b 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> Ok, so this seem to have successfully stopped the segfaulting.
>>>>>
>>>>> Apologies, I take that back. This has done something odd and it
>>>>> seems as
>>>>> though only some of the data gets through, almost like it's only
>>>>> writing
>>>>> the last piece of data? I'm not 100% as my websocket data isn't
>>>>> deterministic...
>>>>
>>>> How much data are we trying to write at once?
>>>>
>>>> It must pass through a socket to synchronize with the service loop,
>>>> you will have to check the return code of
>>>> libwebsockets_broadcast_foreign() to see if the send succeeded. If
>>>> there's a big spam it will have to wait until the service loop gets
>>>> to it.
>>>
>>> Hi Andy,
>>>
>>> I think I've gotten to the bottom of this, and it is that it only
>>> sends the broadcast to the client.
>>>
>>> All broadcasts are in thread 1 and all services are in thread 0.
>>>
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Broadcasted:
>>> {"method":["updateRegisters"],"parameters":{"10":{"val":539},"12":{"val":15504}}} 
>>>
>>> // Sends
>>> Serviced card socket 0
>>> Broadcasted:
>>> {"method":["updateData"],"parameters":{"43":{"val":539},"45":{"val":15504}}} 
>>>
>>> //Sends
>>> Device (0) : Reading and Processing FPGA data took 15ms
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Broadcasted:
>>> {"method":["updateRegisters"],"parameters":{"10":{"val":471},"12":{"val":15571},"28":{"val":87}}} 
>>>
>>> // Doesn't send
>>> Broadcasted:
>>> {"method":["updateData"],"parameters":{"43":{"val":471},"45":{"val":15571}}} 
>>>
>>> // Doesn't send
>>> Broadcasted: {"method":["updateData"],"parameters":{"47":{"val":87}}}
>>> // Sends
>>> Device (0) : Reading and Processing FPGA data took 18ms
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Broadcasted:
>>> {"method":["updateRegisters"],"parameters":{"10":{"val":488},"12":{"val":15555},"28":{"val":86}}} 
>>>
>>> // Doesn't send
>>> Broadcasted:
>>> {"method":["updateData"],"parameters":{"43":{"val":488},"45":{"val":15555}}} 
>>>
>>> // Doesn't send
>>> Broadcasted: {"method":["updateData"],"parameters":{"47":{"val":86}}}
>>> // Sends
>>> Device (0) : Reading and Processing FPGA data took 18ms
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>> Serviced card socket 0
>>>
>>> I'm sure you get the gist. So, the question, is it valid to call
>>> broadcast_foreign multiple times without a service?
>>>
>>> If not. Is it valid to call service() immediatly after a broadcast()
>>> to ensure all data is serviced correctly there and then.
>>>
>>> Best Regards,
>>> Jack.
>>
>> Ok, so I think I have gotten to the bottom of this. I was storing the
>> string to be sent in a variable and I was overwriting it each time.
>>
>> With broadcast it would pretty much immediately write the string, with
>> broadcast_foreign, it would queue the broadcast callback which lead to
>> sending the same string X amount of times.
>>
>> I have now implemented service() in each thread immediately after
>> broadcast(), with a mutex to stop the main thread and child thread both
>> executing at the same time, which I think caused segfaults.
>>
>> However, now I am facing a different challenge, I am getting a segfault
>> again. When you perform a write in broadcast, what should you do if it
>> fails? At the moment I have this chain of events which leads to a 
>> segfault:
>>
>> service()
>>      receive()
>>          write()
>>          broadcast()
>>              write() <-- fails (websocket dead?)
>>                  checkLegitWSI() <-- returns OK
>
> I realized the other day checkLegitWSI or whatever it is, is not 
> workable.  Because the wsi may have been free()d by then.
>
>>                  return
>>                  segfault
>>
>> Does that make sense? I know it's not very clear but I'm trying to break
>> down a fairly complex program with an involved chain of events...
>
> Hum it should be simpler than all that, you shouldn't need any mutexes 
> either.  However, by adding things people wanted over the years, it is 
> possible the api is no longer internally consistent in all cases with 
> the supposed rules (maybe broadcast is evil), in which case we'll fix 
> / trim the api.
>
> The basic idea is that you should only attempt to do operations on a 
> wsi handed to you in the user callback.  In order to get a callback 
> with the wsi, it must still be in the fd tables and not closed.  There 
> should be no question about other threads closing it, since logical 
> closing should only be something provoked by service, and that is 
> singlethreaded.
>
> That alone should eliminate the whole idea of operations on a dead 
> wsi, the callback cannot be handed a dead wsi.
>
> I didn't really get what receive() means in your sequence above, it's 
> "receive" from your FPGA stuff is it?
>
> If you saw that you have something new to distribute to interested 
> websocket parties, you can pull it into a FIFO and then call
>
> LWS_EXTERN int
> libwebsocket_callback_on_writable_all_protocol(
>                  const struct libwebsocket_protocols *protocol);
>
> which will get you a callback coming with reason 
> LWS_CALLBACK_SERVER_WRITEABLE for ALL live wsi clients that are 
> connected using that protocol.
>
> In that callback, you can do like the test server does...
>
>     case LWS_CALLBACK_SERVER_WRITEABLE:
>
>         while (pss->ringbuffer_tail != ringbuffer_head) {
>
>             n = libwebsocket_write(wsi, (unsigned char *)
>                    ringbuffer[pss->ringbuffer_tail].payload +
>                    LWS_SEND_BUFFER_PRE_PADDING,
>                    ringbuffer[pss->ringbuffer_tail].len,
>                                 LWS_WRITE_TEXT);
>
>             (... check errors etc ...)
>
>             if (lws_send_pipe_choked(wsi)) {
>                 libwebsocket_callback_on_writable(context, wsi);
>                 return 0;
>             }
>         }
>         break;
>
> ie, it pushes as much of the pending FPGA fifo that is outstanding for 
> that connection into libwebsocket_write() as it can until that socket 
> would block.  Just before then, it sets up another callback when that 
> socket is writable and bails.
>
> So you should be able to do that without broadcast, the current test 
> server code for mirror protocol will be interesting to study if I 
> understood your application... if I have the wrong idea, well, please 
> let me know.

Hi Andy,

I have bitten the bullet and earlier today I started re-implementing my 
websocket push functionallity with LWS_CALLBACK_WRITABLE_ALL and a 
circular buffer. I will let you know how it goes, but so far the initial 
building blocks are working well.

Just for clarity (and maybe interest, I don't know!), the way I use 
libwebsockets is as follows:

I have two main simultaneous ideas going one as follows:

Case 1:

1) Every second FPGA issues interrupt
2) FPGA memory is read and processed
3) Data which has changed, and is relevant to the user get pushed over 
websocket to a realtime web page

I'll come back to this, as I am interested in whether I can check the 
number of open websockets, so I may be able to skip JSON object creation 
etc... to reduce processor usage.

Case 2:

1) User visits webpage
2) User does an action which requires interaction with the application 
(e.g. change config)
3) App recieves new config change over websocket
4) App makes config change internally
5) Data recieved from the websocket contained a javascript callback function
6) javascript callback function and newly changed data is propogated to 
ALL clients of the websocket (thus keeping everyone in sync, and also 
not changing any data on the webpage locally, so it is in essence 
impossible to become out ouf sync).

What makes the whole thing more difficult is that I have multiple FPGAs 
and multiple websockets. I have one for each FPGA (as the FPGAs are 
modular, and hopefully, eventually hot pluggable) and one for the 
'control' application . Each FPGA has it's own thread in the program 
which pushes each FPGA's new data to the client, then there is also the 
thread which listens for incoming data, and then off the back of the 
recieved data from the client, broadcasts and propogates the new data 
that has changed to all the currently connected clients.

So, as you can see it's quite an intertwined web and I'm only really 
just getting into programming professionally, so I'm learning quite a 
bit along the way! However, the current web prototype is looking 
fantastic, and the realtime data aquisiiton coupled with the low 
overhead makes using it a dream compared to our old AJAX polling 
implementations.

I applogoise for being a bit of a pain, but I thank you for bearing with 
me and my questions. I have been a long time open source user and hope 
that someday soon I will become proficiant enough to also contribute and 
improve, rather than question and flounder! I hope that the above has 
given you a pleasant insight into how your library is helping my 
development, and I also hope that you're proud of the fantastic work 
you're doing here.

Best Regards,
Jack.

----------------------
Jack Mitchell
Embedded Software Engineer
http://www.embed.me.uk

>
> -Andy
>
>
>> Thanks,
>> Jack.
>>
>>>
>>>>
>>>>>> However there seems to be a small issue with the socket function as
>>>>>> Valgrind kindly points out:
>>>>>>
>>>>>> ==3475== Thread 5:
>>>>>> ==3475== Syscall param socketcall.send(msg) points to uninitialised
>>>>>> byte(s)
>>>>>> ==3475==    at 0x4FEEF374: send (in /lib/libpthread-2.16.so)
>>>>>> ==3475==    by 0x498AA0B: libwebsockets_broadcast_foreign
>>>>>> (libwebsockets.c:2186)
>>>>>> ==3475==    by 0xEF4B: webSock_broadcastJsonObject
>>>>>> (webInterface_webSockets.c:223)
>>>>>> ==3475==    by 0xAFDB: XX86_processFPGAData (XX86.c:141)
>>>>>> ==3475==    by 0xDA13: XX86_tickCheck (XX86_init.c:118)
>>>>>> ==3475==    by 0x4FEE6F5B: start_thread (pthread_create.c:313)
>>>>>> ==3475==    by 0x4FE2E0D7: ??? (in /lib/libc-2.16.so)
>>>>>> ==3475==  Address 0x6f75d42 is on thread 5's stack
>>>>>> ==3475==
>>>>>> ==3475== Thread 1:
>>>>>> ==3475== Syscall param socketcall.send(msg) points to uninitialised
>>>>>> byte(s)
>>>>>> ==3475==    at 0x4FEEF374: send (in /lib/libpthread-2.16.so)
>>>>>> ==3475==    by 0x498AA0B: libwebsockets_broadcast_foreign
>>>>>> (libwebsockets.c:2186)
>>>>>> ==3475==    by 0xEF4B: webSock_broadcastJsonObject
>>>>>> (webInterface_webSockets.c:223)
>>>>>> ==3475==    by 0xD88B: XX86socket_handleReceive (XX86_socket.c:110)
>>>>>> ==3475==    by 0xEE3B: webSock_genericSendRecieve
>>>>>> (webInterface_webSockets.c:147)
>>>>>> ==3475==    by 0x498B3E7: user_callback_handle_rxflow
>>>>>> (libwebsockets.c:1347)
>>>>>> ==3475==    by 0x498D61B: libwebsocket_rx_sm (parsers.c:968)
>>>>>> ==3475==    by 0x498D72B: libwebsocket_interpret_incoming_packet
>>>>>> (parsers.c:1037)
>>>>>> ==3475==    by 0x498D973: libwebsocket_read (handshake.c:233)
>>>>>> ==3475==    by 0x498BB2F: libwebsocket_service_fd
>>>>>> (libwebsockets.c:884)
>>>>>> ==3475==    by 0x498BC13: libwebsocket_service 
>>>>>> (libwebsockets.c:1047)
>>>>>> ==3475==    by 0xA947: main (R0005.c:108)
>>>>>> ==3475==  Address 0xbd8999ba is on thread 1's stack
>>>>
>>>> Hm this is wrong, you should not call
>>>> libwebsockets_broadcast_foreign() from your service loop.
>>>> libwebsockets_broadcast() is the one for that.  I guess this is the
>>>> reason for your data loss.
>>>>
>>>>>> I also get the following (which isn't a new problem, just one I've
>>>>>> ignored for a bit):
>>>>>>
>>>>>> ==3475== Conditional jump or move depends on uninitialised value(s)
>>>>>> ==3475==    at 0x498B7F8: libwebsocket_service_fd
>>>>>> (libwebsockets.c:716)
>>>>>> ==3475==    by 0x498BC13: libwebsocket_service 
>>>>>> (libwebsockets.c:1047)
>>>>>> ==3475==    by 0xA947: main (R0005.c:108)
>>>>>>
>>>>>> I think this small snippet is trying to service a now closed 
>>>>>> context?
>>>>>> As for the big snippet I'm not so sure about that...
>>>>
>>>> 716 is currently this
>>>>
>>>>         if (context->started_with_parent &&
>>>> kill(context->started_with_parent, 0) < 0)
>>>>
>>>> context gets memset after allocation IIRC.
>>>>
>>>> -Andy
>>>
>>>
>>
>>
>




More information about the Libwebsockets mailing list