[Libwebsockets] Segfault

"Andy Green (林安廸)" andy at warmcat.com
Mon Jan 28 15:41:14 CET 2013


On 28/01/13 22:17, the mail apparently from Jack Mitchell included:
> On 28/01/13 10:08, Jack Mitchell wrote:
>> On 25/01/13 22:24, "Andy Green (林安廸)" wrote:
>>> On 25/01/13 23:04, the mail apparently from Jack Mitchell included:
>>>> On 25/01/13 14:29, Jack Mitchell wrote:
>>>>> On 25/01/13 13:11, "Andy Green (林安廸)" wrote:
>>>>>> <snip>
>>>>>>
>>>>>> I studied it thisevening and changed this code around somewhat.
>>>>>> Basically you should use libwebsockets_broadcast_foreign() from
>>>>>> another thread now and it will take care of things properly without
>>>>>> needing to use libwebsockets_fork_service_loop().  I added some
>>>>>> documentation about it to README.coding also in this patch:
>>>>>>
>>>>>> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=52f28ce67acd96f468d9ebbfe6f61fea5be4502b
>>>>>>
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>
>>>>> Hi Andy,
>>>>>
>>>>> Ok, so this seem to have successfully stopped the segfaulting.
>>>>
>>>> Apologies, I take that back. This has done something odd and it
>>>> seems as
>>>> though only some of the data gets through, almost like it's only
>>>> writing
>>>> the last piece of data? I'm not 100% as my websocket data isn't
>>>> deterministic...
>>>
>>> How much data are we trying to write at once?
>>>
>>> It must pass through a socket to synchronize with the service loop,
>>> you will have to check the return code of
>>> libwebsockets_broadcast_foreign() to see if the send succeeded. If
>>> there's a big spam it will have to wait until the service loop gets
>>> to it.
>>
>> Hi Andy,
>>
>> I think I've gotten to the bottom of this, and it is that it only
>> sends the broadcast to the client.
>>
>> All broadcasts are in thread 1 and all services are in thread 0.
>>
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Broadcasted:
>> {"method":["updateRegisters"],"parameters":{"10":{"val":539},"12":{"val":15504}}}
>> // Sends
>> Serviced card socket 0
>> Broadcasted:
>> {"method":["updateData"],"parameters":{"43":{"val":539},"45":{"val":15504}}}
>> //Sends
>> Device (0) : Reading and Processing FPGA data took 15ms
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Broadcasted:
>> {"method":["updateRegisters"],"parameters":{"10":{"val":471},"12":{"val":15571},"28":{"val":87}}}
>> // Doesn't send
>> Broadcasted:
>> {"method":["updateData"],"parameters":{"43":{"val":471},"45":{"val":15571}}}
>> // Doesn't send
>> Broadcasted: {"method":["updateData"],"parameters":{"47":{"val":87}}}
>> // Sends
>> Device (0) : Reading and Processing FPGA data took 18ms
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Broadcasted:
>> {"method":["updateRegisters"],"parameters":{"10":{"val":488},"12":{"val":15555},"28":{"val":86}}}
>> // Doesn't send
>> Broadcasted:
>> {"method":["updateData"],"parameters":{"43":{"val":488},"45":{"val":15555}}}
>> // Doesn't send
>> Broadcasted: {"method":["updateData"],"parameters":{"47":{"val":86}}}
>> // Sends
>> Device (0) : Reading and Processing FPGA data took 18ms
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>> Serviced card socket 0
>>
>> I'm sure you get the gist. So, the question, is it valid to call
>> broadcast_foreign multiple times without a service?
>>
>> If not. Is it valid to call service() immediatly after a broadcast()
>> to ensure all data is serviced correctly there and then.
>>
>> Best Regards,
>> Jack.
>
> Ok, so I think I have gotten to the bottom of this. I was storing the
> string to be sent in a variable and I was overwriting it each time.
>
> With broadcast it would pretty much immediately write the string, with
> broadcast_foreign, it would queue the broadcast callback which lead to
> sending the same string X amount of times.
>
> I have now implemented service() in each thread immediately after
> broadcast(), with a mutex to stop the main thread and child thread both
> executing at the same time, which I think caused segfaults.
>
> However, now I am facing a different challenge, I am getting a segfault
> again. When you perform a write in broadcast, what should you do if it
> fails? At the moment I have this chain of events which leads to a segfault:
>
> service()
>      receive()
>          write()
>          broadcast()
>              write() <-- fails (websocket dead?)
>                  checkLegitWSI() <-- returns OK

I realized the other day checkLegitWSI or whatever it is, is not 
workable.  Because the wsi may have been free()d by then.

>                  return
>                  segfault
>
> Does that make sense? I know it's not very clear but I'm trying to break
> down a fairly complex program with an involved chain of events...

Hum it should be simpler than all that, you shouldn't need any mutexes 
either.  However, by adding things people wanted over the years, it is 
possible the api is no longer internally consistent in all cases with 
the supposed rules (maybe broadcast is evil), in which case we'll fix / 
trim the api.

The basic idea is that you should only attempt to do operations on a wsi 
handed to you in the user callback.  In order to get a callback with the 
wsi, it must still be in the fd tables and not closed.  There should be 
no question about other threads closing it, since logical closing should 
only be something provoked by service, and that is singlethreaded.

That alone should eliminate the whole idea of operations on a dead wsi, 
the callback cannot be handed a dead wsi.

I didn't really get what receive() means in your sequence above, it's 
"receive" from your FPGA stuff is it?

If you saw that you have something new to distribute to interested 
websocket parties, you can pull it into a FIFO and then call

LWS_EXTERN int
libwebsocket_callback_on_writable_all_protocol(
				 const struct libwebsocket_protocols *protocol);

which will get you a callback coming with reason 
LWS_CALLBACK_SERVER_WRITEABLE for ALL live wsi clients that are 
connected using that protocol.

In that callback, you can do like the test server does...

	case LWS_CALLBACK_SERVER_WRITEABLE:

		while (pss->ringbuffer_tail != ringbuffer_head) {

			n = libwebsocket_write(wsi, (unsigned char *)
				   ringbuffer[pss->ringbuffer_tail].payload +
				   LWS_SEND_BUFFER_PRE_PADDING,
				   ringbuffer[pss->ringbuffer_tail].len,
								LWS_WRITE_TEXT);

			(... check errors etc ...)

			if (lws_send_pipe_choked(wsi)) {
				libwebsocket_callback_on_writable(context, wsi);
				return 0;
			}
		}
		break;

ie, it pushes as much of the pending FPGA fifo that is outstanding for 
that connection into libwebsocket_write() as it can until that socket 
would block.  Just before then, it sets up another callback when that 
socket is writable and bails.

So you should be able to do that without broadcast, the current test 
server code for mirror protocol will be interesting to study if I 
understood your application... if I have the wrong idea, well, please 
let me know.

-Andy


> Thanks,
> Jack.
>
>>
>>>
>>>>> However there seems to be a small issue with the socket function as
>>>>> Valgrind kindly points out:
>>>>>
>>>>> ==3475== Thread 5:
>>>>> ==3475== Syscall param socketcall.send(msg) points to uninitialised
>>>>> byte(s)
>>>>> ==3475==    at 0x4FEEF374: send (in /lib/libpthread-2.16.so)
>>>>> ==3475==    by 0x498AA0B: libwebsockets_broadcast_foreign
>>>>> (libwebsockets.c:2186)
>>>>> ==3475==    by 0xEF4B: webSock_broadcastJsonObject
>>>>> (webInterface_webSockets.c:223)
>>>>> ==3475==    by 0xAFDB: XX86_processFPGAData (XX86.c:141)
>>>>> ==3475==    by 0xDA13: XX86_tickCheck (XX86_init.c:118)
>>>>> ==3475==    by 0x4FEE6F5B: start_thread (pthread_create.c:313)
>>>>> ==3475==    by 0x4FE2E0D7: ??? (in /lib/libc-2.16.so)
>>>>> ==3475==  Address 0x6f75d42 is on thread 5's stack
>>>>> ==3475==
>>>>> ==3475== Thread 1:
>>>>> ==3475== Syscall param socketcall.send(msg) points to uninitialised
>>>>> byte(s)
>>>>> ==3475==    at 0x4FEEF374: send (in /lib/libpthread-2.16.so)
>>>>> ==3475==    by 0x498AA0B: libwebsockets_broadcast_foreign
>>>>> (libwebsockets.c:2186)
>>>>> ==3475==    by 0xEF4B: webSock_broadcastJsonObject
>>>>> (webInterface_webSockets.c:223)
>>>>> ==3475==    by 0xD88B: XX86socket_handleReceive (XX86_socket.c:110)
>>>>> ==3475==    by 0xEE3B: webSock_genericSendRecieve
>>>>> (webInterface_webSockets.c:147)
>>>>> ==3475==    by 0x498B3E7: user_callback_handle_rxflow
>>>>> (libwebsockets.c:1347)
>>>>> ==3475==    by 0x498D61B: libwebsocket_rx_sm (parsers.c:968)
>>>>> ==3475==    by 0x498D72B: libwebsocket_interpret_incoming_packet
>>>>> (parsers.c:1037)
>>>>> ==3475==    by 0x498D973: libwebsocket_read (handshake.c:233)
>>>>> ==3475==    by 0x498BB2F: libwebsocket_service_fd
>>>>> (libwebsockets.c:884)
>>>>> ==3475==    by 0x498BC13: libwebsocket_service (libwebsockets.c:1047)
>>>>> ==3475==    by 0xA947: main (R0005.c:108)
>>>>> ==3475==  Address 0xbd8999ba is on thread 1's stack
>>>
>>> Hm this is wrong, you should not call
>>> libwebsockets_broadcast_foreign() from your service loop.
>>> libwebsockets_broadcast() is the one for that.  I guess this is the
>>> reason for your data loss.
>>>
>>>>> I also get the following (which isn't a new problem, just one I've
>>>>> ignored for a bit):
>>>>>
>>>>> ==3475== Conditional jump or move depends on uninitialised value(s)
>>>>> ==3475==    at 0x498B7F8: libwebsocket_service_fd
>>>>> (libwebsockets.c:716)
>>>>> ==3475==    by 0x498BC13: libwebsocket_service (libwebsockets.c:1047)
>>>>> ==3475==    by 0xA947: main (R0005.c:108)
>>>>>
>>>>> I think this small snippet is trying to service a now closed context?
>>>>> As for the big snippet I'm not so sure about that...
>>>
>>> 716 is currently this
>>>
>>>         if (context->started_with_parent &&
>>> kill(context->started_with_parent, 0) < 0)
>>>
>>> context gets memset after allocation IIRC.
>>>
>>> -Andy
>>
>>
>
>




More information about the Libwebsockets mailing list