[Libwebsockets] lws_write speed
andy at warmcat.com
Wed May 18 06:10:24 CEST 2016
On 05/17/2016 10:39 PM, Roger Light wrote:
> On Tue, May 17, 2016 at 1:43 AM, Andy Green <andy at warmcat.com> wrote:
>> But with a single-threaded / event driven system the goal is to make sure it
>> never blocks. If the OS isn't able to accept your whole buffer on the
>> socket, actually lws steps in and copies the rest into a buffer and
>> auto-drains it, emulating the threaded model (and consequently making
>> everything infefficient, if functional). But that's a backup for
>> emergencies, needed because there is no way to know how much the socket will
>> accept until after you did the write, it's not how it should work.
>> How it should work is write stuff in chunks that are usually accepted by the
>> socket, for example 2KB or 4KB. You can do that once per WRITEABLE
>> callback, and if there's more, ask to be called back when writable. If the
>> system is otherwise idle and more can be written, you'll be called back
> Is this what happens in all cases? It doesn't match my experience, but
> maybe something changed. I see the calls to lws_write() as operating
> in a similar (but not identical...) manner to write(), i.e. I do
> roughly this:
> pos = 0;
> len = 100000;
> count = lws_write(wsi, &buf[pos], len, LWS_WRITE_BINARY);
> pos += count;
> len -= count;
> if(len) lws_callback_on_writable(context, wsi);
> If it was making a copy of the rest into a buffer then this wouldn't
> work. The time I could think when what you're saying would apply is if
> there are extensions enabled.
No it works fine, because after copying to the internal buffer
lws_write() lies and returns the whole amount as "sent". It has to do
that because the buffer it was given is usually on the stack and will
immediately be lost.
Seeing what has happened, lws then disables any further WRITABLE
callbacks to the user code and requests and services them automatically
from the temp buffer. When the malloc'd temp buffer is drained, it is
kept around (on the basis if you wrote that much once on this wsi, your
code is probably planning to do so again) and only realloc'd if the next
one is bigger. WRITABLE callbacks are reenabled when the temp buffer is
drained. The temp buffer is freed when the wsi closes.
It 'feels like' - has the semantics of - a blocking write, but it
isn't... there is only one thread by default so it couldn't be a real
blocking write or everything would grind to a halt.
If the overhead is OK, then this is OK... but it's actually intended to
perfectly hide the emergency case that write() returned something really
unexpected like 5 (it is perfectly free to do so, due to dynamic memory
conditions) when it almost always otherwise takes 2048 or 4096. With
this backup system, the user code doesn't have to take care about it,
lws will deal with it seamlessly.
But if that affects performance, especially if "len" is big, then the
buffer can be sent piecemeal as I described, avoiding the malloc / memcpy.
PS: also I learned to my surprise, the pattern of giving write() a big
length and letting it nibble what it wants is a really bad performance
idea. The problem is the kernel processes all of the pages every time
before passing the request to the network stack, if len is counted in MB
that is a huge amount of time and CPU lost each call, that will only
accept a fraction of the processed pages.
lws now restricts the length write() sees each time to rx_buffer_size
(default 4096) even if the length is huge.
More information about the Libwebsockets