[Libwebsockets] performance client vs server

Andy Green andy at warmcat.com
Mon Nov 7 04:58:19 CET 2016


On Mon, 2016-11-07 at 14:40 +1100, jb wrote:
> Hi Andy,
> I managed to get it about 6x faster by inserting this
> chunk of code into lws_handshake_client.
> 
> Sorry I'm not too good with diffs or pull requests yet ..

Assuming lws came to you via git clone, you can just do

$ git diff

to get a diff of your changes.  Stick them in a file like

$ git diff > mydiff

and send the file is enough, assuming it's not overwhelmed with other
changes.

Note lws is LGPL... easiest way to deal with that is contribute your
patch.

> However you can see from the screen shot that if one
> gulps payload when the parser is in its most common state
> then there is massive performance improvement for 
> extended runs of large payloads that are not masked.

Yes this isn't news... the code I pointed to did the same thing to the
very similar code on the server side.

> Before: Server 10% client 100% - throughput maybe 1.5 gigabit
> After: Server 50% client 100% - throughput about 6-7 gigabit.
> 
> The new code is in braces. In lws_handshake_client.
> this stops most of the character by character calling
> of lws_client_rx_sm
> which has local variables and a lot of popping and pushing

Local vars don't cost anything per se in C.  It just adjusts the stack
frame one time on entry and one time on exit.  However we know from the
server change avoiding calling this bytewise the way that did is much
more efficient.

> and if()s.  Similar performance if you just gulp 128 byte
> chunks, rather than "len" and then let lws_client_rx_sm()
> clean up the remainder.
> 
> I've no idea what this breaks! but yeah the gain is available.

That's why following the code I pointed to is maybe a good idea... it's
been in for a while.

-Andy

> On Mon, Nov 7, 2016 at 1:40 PM, Andy Green <andy at warmcat.com> wrote:
> > On Mon, 2016-11-07 at 13:12 +1100, jb wrote:
> > > Using a modified fraggle.c, removing deflate, increasing the
> > message
> > > size to batches of 32k, removing the generation of random data
> > and
> > > the checksums, I see that when the client runs at 100% cpu the
> > server
> > > is only running at 10% cpu. (fraggle.c is arranged so when a
> > client
> > > connects the server sends a bunch of messages).
> > >
> > > Doing a quick profile it looks like all the client cpu time is
> > taken
> > > up by lws_client_rx_sm which appears to be a character by
> > > character state machine for receiving bytes.
> > >
> > > It isn't totally clear to me why the server is 10x faster than
> > the
> > > client at sending data than the client is at reading data. If the
> > > server sends a 32k block of zeros as a binary message, at some
> > point
> > > isn't there a payload length and a payload of 32k does each byte
> > have
> > > to be processed individually on one side but not the other?
> > 
> > Take a look at this
> > 
> > https://github.com/warmcat/libwebsockets/blob/master/lib/parsers.c#
> > L146
> > 1
> > 
> > On the server side, the equivalent parser got a patch optimizing
> > the
> > bulk data flow.
> > 
> > If you'd like to port that to the client side, patches are welcome.
> > 
> > -Andy
> > 
> > 
> > > thanks.
> > > _______________________________________________
> > > Libwebsockets mailing list
> > > Libwebsockets at ml.libwebsockets.org
> > > http://libwebsockets.org/mailman/listinfo/libwebsockets
> > 
> 



More information about the Libwebsockets mailing list