[Libwebsockets] multithreaded client and ring buffer question

Andy Green andy at warmcat.com
Tue Jan 15 21:17:04 CET 2019



On January 16, 2019 4:22:43 AM GMT+09:00, Dave Horton <daveh at beachdognet.com> wrote:
>I think I'm not understanding something about lws_cancel_service,
>because
>it is not working as I expected.
>Based on this thread, I expected that when I call lws_cancel_service,
>I'd
>get a LWS_CALLBACK_EVENT_WAIT_CANCELLED for every wsi that I had.
>In my case, I have two wsi, and only one is getting
>a LWS_CALLBACK_EVENT_WAIT_CANCELLED callback.

That's not what I wrote though.

>What I've done:
>- created a single context and then loop calling lws_service in one
>thread

The best plan is look at and / or  play with a related minimal example.  Then duplicate it and modify it.

If there are still open questions, a quick git grep will usually shed some light.  Eg

LWS_CALLBACK_EVENT_WAIT_CANCELLED = 71, 
/**< This is sent to every protocol of every vhost in response
	 * to lws_cancel_service() or lws_cancel_service_pt().  This
	 * callback is serialized in the lws event loop normally, even
	 * if the lws_cancel_service[_pt]() call was from a different
	 * thread. */

https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-callbacks.h#n774

>- I observe that this creates an initial wsi, what I think you referred
>to
>as a 'fake' wsi.

Yeah.  Because the callback is triggered once per vhost-protocol, it has no relation to any real wsi.

>- some time later, another thread wants to initiate a connection - its
>adds
>connection info to a list and calls lws_cancel_service
>- I get a LWS_CALLBACK_EVENT_WAIT_CANCELLED callback for my "fake wsi"
>- it checks the list, sees there is a desire for a new connection and
>launches it (lws_client_connect_via_info)
>- I get a LWS_CALLBACK_CLIENT_ESTABLISHED with a new wsi for my new
>client
>connection. So now I have two wsi.
>- some time later, another thread wants to cause this connection to be
>dropped from the near end
>- it updates some state information in the per-session (user) data
>pointer,
>and calls lws_cancel_service
>- I get a LWS_CALLBACK_EVENT_WAIT_CANCELLED for my fake wsi, BUT
>- I DO NOT get a LWS_CALLBACK_EVENT_WAIT_CANCELLED for the other wsi -
>the
>one I am trying to drop
>
>(Note: had I gotten a callback for my "real" wsi, I would have returned
>-1
>from the event loop, as I understand that would cause the connection to
>be
>terminated).
>
>I thought calling lws_cancel_service resulted in receiving
>LWS_CALLBACK_EVENT_WAIT_CANCELLED for all my wsi s?

You thought wrong... imagine a server with 1M connections and the ringbuffer for one of them changed... it'd be dumb to sit there calling back on 1M - 1 wsi telling them someone else has something to do.  So the callback is called one time to tell it "something happened".  The handler can take your mutex once and figure out itself which connections need what (in the 1M connection scale you'd want a linked-list of modified ringbuffers to save you iterating through them all).

If you studied the related minimal examples you would have seen how it worked, eg

case LWS_CALLBACK_EVENT_WAIT_CANCELLED:
if (!vhd) break;
     /*
		 * When the "spam" threads add a message to the ringbuffer,
		 * they create this event in the lws service thread context
		 * using lws_cancel_service().
		 *
		 * We respond by scheduling a writable callback for all
		 * connected clients.
		 */
...

https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-threads/protocol_lws_minimal.c#n256

>What am I doing wrong?

Spend a little time with the minimal examples.

>I am assuming, for one thing, that there is a single context shared by
>all
>wsi, and that I can simply keep that in a static variable and call
>lws_cancel_service on that, but am I wrong -- is there a different
>context
>per wsi or something?
>
>I have posted a gist with the detailed debug logs at
>https://gist.github.com/davehorton/e0df812242c816c92f6f13013234dde0

-Andy

>
>On Sun, Jan 13, 2019 at 9:51 PM Andy Green <andy at warmcat.com> wrote:
>
>>
>>
>> On 14/01/2019 10:04, Dave Horton wrote:
>> > Thanks so much for the help on this - its much appreciated.
>> >
>> > I'm starting to build my client application now (slowly), and I
>want to
>> > make sure I understand a few things about what a "wsi" represents,
>and
>> > what a vhost is.
>>
>> > So far, I have a fairly simply client application that creates a
>context
>> > with a protocol and starts the event loop.
>> >
>> > My protocols structure looks like this:
>> >
>> >    static const struct lws_protocols protocols[] = {
>> >      {
>> >        "audiostream.drachtio.org
><http://audiostream.drachtio.org>",
>> >        lws_callback,
>> >        0,
>> >        0,
>> >      },
>> >      { NULL, NULL, 0, 0 }
>> >    };
>> >
>> > then I am simply calling 'lws-create_context' and then looping
>calling
>> > 'lws_service'.
>>
>> I think this is what you are doing already but as a general remark
>the
>> best way to get started is confirm a minimal example builds and works
>as
>> far as it goes, and then start modifying it.  Because that way you
>start
>> with something that "works" and as you change it, you can easily
>detect
>> if it "stopped working" as soon as that happens and back up.
>>
>> In a sense all you need to do is "not break it" as you change and
>> increase what it does.  This sounds like a facile observation but
>> actually it's a really good way to get straight to a stable result.
>>
>> The examples are CC0 (== Public Domain) to help people take this
>route.
>>
>> > I am simply logging out the callbacks that are then invoked (not
>yet
>> > doing anything with them), and I see that the callback is invoked
>with
>> > either of two wsi pointers:
>> > one has no vhost associated, and the other has a vhost of
>'default'.
>> >
>> > I'm unclear why there are two vhosts, which each represents, and
>whether
>> > I need to be doing anything in the callback with the null vhost.
>>
>> The callback is used both for callbacks specific to a wsi (==
>specific
>> to an fd / socket fd / "connection") and for vhostwide or systemwide
>> events.  In the case that the callback is not really associated with
>a
>> specific wsi, a fake wsi is provided just for the callback.  This is
>> done because some user callback implementations always, eg, get the
>> lws_context from the wsi... with context = lws_get_context(wsi)... if
>> the wsi was NULL, they will fault.  So the wsi is never NULL.
>>
>> > When the service loops starts (and before I attempt any
>connections) I
>> > get these 3 callbacks:
>> >
>> > lws_callback wsi: 0x7fffd79758b0, vhost (null), reason:
>> > LWS_CALLBACK_GET_THREAD_ID
>> > lws_callback wsi: 0x7fffd7975570, vhost: default, reason:
>> > LWS_CALLBACK_EVENT_WAIT_CANCELLED
>> > lws_callback wsi: 0x7fffd7975440, vhost: default, reason:
>> > LWS_CALLBACK_PROTOCOL_INIT
>> >
>> > What is the purpose of the callback in the wsi with null vhost?  In
>>
>> The meaning of the callback is not related to any vhost.
>>
>> It's asking the callback to return the tid of the service thread. 
>This
>> is used to understand if callback_on_writable() is being called from
>a
>> foreign thread.
>>
>> It's also legitimate to have a wsi that is not bound to any vhost,
>and
>> so actually has a NULL vhost.
>>
>> > general, am I expected to handle these callbacks when vhost=NULL in
>any
>> > particular fashion?
>> > ****
>>
>> If you have a single thread you don't need to deal with it.
>>
>> $ git grep LWS_CALLBACK_GET_THREAD_ID
>>
>> is your friend.  It will quickly lead you to this
>>
>> LWS_EXTERN int
>> _lws_plat_service_tsi(struct lws_context *context, int timeout_ms,
>int tsi)
>> {
>> ...
>>
>>         if (!pt->service_tid_detected) {
>>                 struct lws _lws;
>>
>>                 memset(&_lws, 0, sizeof(_lws));
>>                 _lws.context = context;
>>
>>                 pt->service_tid  =
>>                         context->vhost_list->protocols[0].callback(
>>                         &_lws, LWS_CALLBACK_GET_THREAD_ID, NULL,
>NULL, 0);
>>                 pt->service_tid_detected = 1;
>>         }
>>
>> a) it's quicker to get a hint that way for you and b) it scales much
>> better than asking me.
>>
>> -Andy
>>
>> > On Sat, Jan 12, 2019 at 1:33 PM Andy Green <andy at warmcat.com
>> > <mailto:andy at warmcat.com>> wrote:
>> >
>> >
>> >
>> >     On January 12, 2019 11:53:37 PM GMT+08:00, Dave Horton
>> >     <daveh at beachdognet.com <mailto:daveh at beachdognet.com>> wrote:
>> >      >Trying to understand suggested use of
>> >      >LWS_CALLBACK_EVENT_WAIT_CANCELLED....
>> >      >
>> >      >I notice in minimal-ws-client.c, where you have multiple spam
>> threads
>> >      >creating content, the foreign/spam threads
>> >      >call LWS_CALLBACK_EVENT_WAIT_CANCELLED,
>> >      >and then in the service loop the handler calls
>> >      >'lws_callback_on_writable'.
>> >      >
>> >      >Could not the foreign thread(s) have called
>> lws_callback_on_writable
>> >      >directly?  I want to make sure I follow the best / suggested
>> practice.
>> >
>> >     lws_cancel_service() is safe against reentrancy, it writes to a
>pipe
>> >     and job done.  So 2 or more threads can call it without
>locking.
>> >
>> >     on_writable() calls through to the event loop library to change
>> >     POLLOUT waits.  Although the lws part can handle a single
>foreign
>> >     thread calling it, libuv or whatever can't.
>> >
>> >     On busy systems, cancel_service() is not as expensive as it
>sounds.
>> >     It is only checked in the service thread on exit from the event
>loop
>> >     wait.
>> >
>> >     -Andy
>> >
>> >      >On Fri, Jan 11, 2019 at 10:19 PM Andy Green <andy at warmcat.com
>> >     <mailto:andy at warmcat.com>> wrote:
>> >      >
>> >      >>
>> >      >>
>> >      >> On 12/01/2019 09:50, Dave Horton wrote:
>> >      >> > Hi - I’m a bit of a newbie, looking forward to using this
>> library
>> >      >to
>> >      >> build a high-performance multithreaded websocket client.
>> >      >>
>> >      >> It seems you understood the main point, which is lws runs
>in a
>> >     single
>> >      >> thread with an event loop.
>> >      >>
>> >      >>    My program will need to establish multiple connections
>to many
>> >      >> different far-end web servers, and send a large amount of
>near
>> >      >real-time
>> >      >> data over those connections.
>> >      >>
>> >      >> Multiple clients is no problem (in one event loop /
>thread).
>> >      >>
>> >      >> >  From reading the README docs and examples (thanks!) I
>think I
>> >     have
>> >      >a
>> >      >> basic idea of what I need to do (but interested in key
>things I
>> am
>> >      >missing):
>> >      >> > - While my program will have many threads that want to
>send, I
>> >     will
>> >      >have
>> >      >> one wsi service thread running my callback, and just call
>> >      >> ‘lws_callback_on_writable’ from the other threads
>> >      >> > - I will use ringbuffers to stash the data waiting to go
>out
>> (e.g
>> >      >> waiting to get a LWS_CALLBACK_CLIENT_WRITEABLE event).
>> >      >>
>> >      >> Sounds good.
>> >      >>
>> >      >> > My first question is whether there is any documentation
>on the
>> >      >> ringbuffer API that I could study?  It looks like there are
>a
>> bunch
>> >      >of
>> >      >> useful functions, but if there is a good overview doc on
>them it
>> >      >would be
>> >      >> helpful.
>> >      >>
>> >      >> Not really... there's some docs in the header, and
>> >     "documentation" in
>> >      >> the form of the related examples, eg
>> >      >>
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-ring/protocol_lws_minimal.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/client-server/minimal-ws-proxy/protocol_lws_minimal.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/dbus-server/minimal-dbus-ws-proxy/protocol_lws_minimal_dbus_ws_proxy.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/http-server/minimal-http-server-sse-ring/minimal-http-server-sse-ring.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-client/minimal-ws-client-echo/protocol_lws_minimal_client_echo.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-client/minimal-ws-client-tx/minimal-ws-client.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-broker/minimal-ws-broker.c
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-server/minimal-ws-server-echo
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/plugins/raw-proxy/protocol_lws_raw_proxy.c
>> >      >>
>> >      >>  > Most of the ws client examples illustrate a client that
>> >      >initializes a
>> >      >> context and then creates a client connection right away.
>> >      >>  >
>> >      >>  > My case is slightly different — at startup I need to
>create
>> the
>> >      >> context and then poll in the service thread, and then some
>time
>> >     later
>> >      >a
>> >      >> foreign thread needs to connect to a remote endpoint.
>> >      >>  >
>> >      >>  > I know how a foreign thread can call call
>> >     lws_callback_on_writable
>> >      >> when it has data to write on an existing connection, but
>how can
>> it
>> >      >> signal the service thread so as to cause a new client
>connection
>> >      >> entirely to be made?
>> >      >>
>> >      >> The best tool for thread synchronization is
>lws_cancel_service().
>> >      >He's
>> >      >> very robust (the foreign thread simply adds a byte into a
>pipe)
>> and
>> >      >lws
>> >      >> has automatically both created the pipe and set it up that
>any
>> >      >incoming
>> >      >> data on the lws end of it "causes an event" in the event
>loop.
>> If
>> >      >the
>> >      >> "event loop" is poll(), it means the poll wait is
>immediately
>> >     stopped
>> >      >> and lws will broadcast LWS_CALLBACK_EVENT_WAIT_CANCELLED to
>every
>> >      >> protocol on every vhost.  In the case multiple threads
>called it
>> >      >before
>> >      >> we can respond, lws reads and discards all the pipe
>content, so
>> it
>> >      >only
>> >      >> creates one cancel event instead of spamming n uselessly.
>> >      >>
>> >      >> This is the recommended method to do any thread sync,
>including
>> the
>> >      >> "there's something to write in a ringbuffer".  In your code
>> handling
>> >      >> LWS_CALLBACK_EVENT_WAIT_CANCELLED, you can take your mutex
>> >     protecting
>> >      >> your shared structs / ringbuffers and find out what needs
>> attention,
>> >      >> calling on_writable() from the lws context on affected wsi.
>> >      >>
>> >      >> > Any other pointers or guidance welcome..
>> >      >>
>> >      >> You don't have to use lws_ring, you can use some other
>ringbuffer
>> >      >> abstraction.  But I actually use lws_ring in difficult
>cases
>> >     like the
>> >      >> mirror protocol where it handles important but nonbvious
>> situations
>> >      >like
>> >      >> rx flow control management.
>> >      >>
>> >      >>
>> >      >>
>> >      >
>>
>https://libwebsockets.org/git/libwebsockets/tree/plugins/protocol_lws_mirror.c
>> >      >>
>> >      >> ... so it may be worth the effort to get it working based
>on the
>> >      >examples.
>> >      >>
>> >      >> -Andy
>> >      >>
>> >      >> > Dave
>> >      >> >
>> >      >> > _______________________________________________
>> >      >> > Libwebsockets mailing list
>> >      >> > Libwebsockets at ml.libwebsockets.org
>> >     <mailto:Libwebsockets at ml.libwebsockets.org>
>> >      >> > https://libwebsockets.org/mailman/listinfo/libwebsockets
>> >      >> >
>> >      >>
>> >
>>


More information about the Libwebsockets mailing list