[Libwebsockets] lws_write speed

Andy Green andy at warmcat.com
Thu May 19 04:05:22 CEST 2016



On 05/18/2016 05:35 PM, Roger Light wrote:
> On Wed, May 18, 2016 at 5:10 AM, Andy Green <andy at warmcat.com> wrote:
>
>> No it works fine, because after copying to the internal buffer lws_write()
>> lies and returns the whole amount as "sent".  It has to do that because the
>> buffer it was given is usually on the stack and will immediately be lost.
>
> Ok, I don't believe that this will be done on the stack most of the
> time but I understand the reasoning.

It depends on how the user code generates the data... buffering it on 
the stack is often desirable unless where it's stored has made 
arrangements for LWS_PRE and getting the data XOR'd, sometimes the raw 
data needs transforming when it's sent as part of the ws protocol.

>> Seeing what has happened, lws then disables any further WRITABLE callbacks
>> to the user code and requests and services them automatically from the temp
>> buffer.  When the malloc'd temp buffer is drained, it is kept around (on the
>> basis if you wrote that much once on this wsi, your code is probably
>> planning to do so again) and only realloc'd if the next one is bigger.
>> WRITABLE callbacks are reenabled when the temp buffer is drained.  The temp
>> buffer is freed when the wsi closes.
>
> I see, that makes sense in the context of the previous
>
>> PS: also I learned to my surprise, the pattern of giving write() a big
>> length and letting it nibble what it wants is a really bad performance idea.
>> The problem is the kernel processes all of the pages every time before
>> passing the request to the network stack, if len is counted in MB that is a
>> huge amount of time and CPU lost each call, that will only accept a fraction
>> of the processed pages.
>
> Ah, that's very interesting and not something I'd thought about,
> thanks for the tip.
>
> I wanted to test it out of course though, so tried sending a ~60MB
> file (not using websockets) either using write(full_remaining_length)
> or write(4096) and using callgrind to look at both cases  (yes, this
> is only looking at the user space). This is with an application
> operating as a client with only 2 socket connections open. What I saw
> was that passing the full length meant write() was called 1665 times,
> but limited to 4096 bytes it was called 15725 times.
>
> Doing further investigation to look at what was actually being
> returned from write() gave me a smallest write of 1428 (this only
> happened twice), a mean of 40559, median of 19992 and maximum of
> 1098132.
>
> It's clear that those numbers are much smaller than the 60MB total
> size and so trying to pass that every single time would result in a
> loss in performance from what you said. On the other hand, limiting to
> 4096 bytes at once seems like it would reduce performance as well from
> what I've seen.
>
> FWIW, this is Ubuntu 14.04 running on an Intel Atom N2800 with 2GB RAM
> - connecting to a remote host in a different country.

The surprising thing is the kernel ever took 1MB for a non-127.x.x.x 
address.  I guess it did it at the very start.

There's a tradeoff with the requested length vs the chance of having to 
buffer some of it.  I think the median is most indicative, under the 
test conditions mainly the kernel would take ~20K.  But of course if the 
kernel came under memory pressure, or there were many active 
connections, or connections slow to ACK, that figure is highly dynamic.

Maybe what we should do is keep a small ringbuffer of per-connection 
stats like that and let the user code predict what would be an optimal 
size based on what went through the last few recent sends.  Occasionally 
probing if it should go bigger isn't so bad, if it only goes over by a 
bit, the malloc'd buffer for the leftover bit is only small, although 
sending the small leftover bit next time might hit throughput a bit.

-Andy

> Cheers,
>
> Roger
>



More information about the Libwebsockets mailing list