[Libwebsockets] lws_write speed

Andy Green andy at warmcat.com
Wed May 18 06:10:24 CEST 2016

On 05/17/2016 10:39 PM, Roger Light wrote:
> On Tue, May 17, 2016 at 1:43 AM, Andy Green <andy at warmcat.com> wrote:
>> But with a single-threaded / event driven system the goal is to make sure it
>> never blocks.  If the OS isn't able to accept your whole buffer on the
>> socket, actually lws steps in and copies the rest into a buffer and
>> auto-drains it, emulating the threaded model (and consequently making
>> everything infefficient, if functional).  But that's a backup for
>> emergencies, needed because there is no way to know how much the socket will
>> accept until after you did the write, it's not how it should work.
>> How it should work is write stuff in chunks that are usually accepted by the
>> socket, for example 2KB or 4KB.  You can do that once per WRITEABLE
>> callback, and if there's more, ask to be called back when writable.  If the
>> system is otherwise idle and more can be written, you'll be called back
>> immediately.
> Is this what happens in all cases? It doesn't match my experience, but
> maybe something changed. I see the calls to lws_write() as operating
> in a similar (but not identical...) manner to write(), i.e. I do
> roughly this:
> pos = 0;
> len = 100000;
> ...
> count = lws_write(wsi, &buf[pos], len, LWS_WRITE_BINARY);
> pos += count;
> len -= count;
> if(len) lws_callback_on_writable(context, wsi);
> ...
> If it was making a copy of the rest into a buffer then this wouldn't
> work. The time I could think when what you're saying would apply is if
> there are extensions enabled.

No it works fine, because after copying to the internal buffer 
lws_write() lies and returns the whole amount as "sent".  It has to do 
that because the buffer it was given is usually on the stack and will 
immediately be lost.

Seeing what has happened, lws then disables any further WRITABLE 
callbacks to the user code and requests and services them automatically 
from the temp buffer.  When the malloc'd temp buffer is drained, it is 
kept around (on the basis if you wrote that much once on this wsi, your 
code is probably planning to do so again) and only realloc'd if the next 
one is bigger.  WRITABLE callbacks are reenabled when the temp buffer is 
drained.  The temp buffer is freed when the wsi closes.

It 'feels like' - has the semantics of - a blocking write, but it 
isn't... there is only one thread by default so it couldn't be a real 
blocking write or everything would grind to a halt.

If the overhead is OK, then this is OK... but it's actually intended to 
perfectly hide the emergency case that write() returned something really 
unexpected like 5 (it is perfectly free to do so, due to dynamic memory 
conditions) when it almost always otherwise takes 2048 or 4096.  With 
this backup system, the user code doesn't have to take care about it, 
lws will deal with it seamlessly.

But if that affects performance, especially if "len" is big, then the 
buffer can be sent piecemeal as I described, avoiding the malloc / memcpy.


PS: also I learned to my surprise, the pattern of giving write() a big 
length and letting it nibble what it wants is a really bad performance 
idea.  The problem is the kernel processes all of the pages every time 
before passing the request to the network stack, if len is counted in MB 
that is a huge amount of time and CPU lost each call, that will only 
accept a fraction of the processed pages.

lws now restricts the length write() sees each time to rx_buffer_size 
(default 4096) even if the length is huge.

> Cheers,
> Roger

More information about the Libwebsockets mailing list