[Libwebsockets] lws_write speed
andy at warmcat.com
Thu May 19 04:05:22 CEST 2016
On 05/18/2016 05:35 PM, Roger Light wrote:
> On Wed, May 18, 2016 at 5:10 AM, Andy Green <andy at warmcat.com> wrote:
>> No it works fine, because after copying to the internal buffer lws_write()
>> lies and returns the whole amount as "sent". It has to do that because the
>> buffer it was given is usually on the stack and will immediately be lost.
> Ok, I don't believe that this will be done on the stack most of the
> time but I understand the reasoning.
It depends on how the user code generates the data... buffering it on
the stack is often desirable unless where it's stored has made
arrangements for LWS_PRE and getting the data XOR'd, sometimes the raw
data needs transforming when it's sent as part of the ws protocol.
>> Seeing what has happened, lws then disables any further WRITABLE callbacks
>> to the user code and requests and services them automatically from the temp
>> buffer. When the malloc'd temp buffer is drained, it is kept around (on the
>> basis if you wrote that much once on this wsi, your code is probably
>> planning to do so again) and only realloc'd if the next one is bigger.
>> WRITABLE callbacks are reenabled when the temp buffer is drained. The temp
>> buffer is freed when the wsi closes.
> I see, that makes sense in the context of the previous
>> PS: also I learned to my surprise, the pattern of giving write() a big
>> length and letting it nibble what it wants is a really bad performance idea.
>> The problem is the kernel processes all of the pages every time before
>> passing the request to the network stack, if len is counted in MB that is a
>> huge amount of time and CPU lost each call, that will only accept a fraction
>> of the processed pages.
> Ah, that's very interesting and not something I'd thought about,
> thanks for the tip.
> I wanted to test it out of course though, so tried sending a ~60MB
> file (not using websockets) either using write(full_remaining_length)
> or write(4096) and using callgrind to look at both cases (yes, this
> is only looking at the user space). This is with an application
> operating as a client with only 2 socket connections open. What I saw
> was that passing the full length meant write() was called 1665 times,
> but limited to 4096 bytes it was called 15725 times.
> Doing further investigation to look at what was actually being
> returned from write() gave me a smallest write of 1428 (this only
> happened twice), a mean of 40559, median of 19992 and maximum of
> It's clear that those numbers are much smaller than the 60MB total
> size and so trying to pass that every single time would result in a
> loss in performance from what you said. On the other hand, limiting to
> 4096 bytes at once seems like it would reduce performance as well from
> what I've seen.
> FWIW, this is Ubuntu 14.04 running on an Intel Atom N2800 with 2GB RAM
> - connecting to a remote host in a different country.
The surprising thing is the kernel ever took 1MB for a non-127.x.x.x
address. I guess it did it at the very start.
There's a tradeoff with the requested length vs the chance of having to
buffer some of it. I think the median is most indicative, under the
test conditions mainly the kernel would take ~20K. But of course if the
kernel came under memory pressure, or there were many active
connections, or connections slow to ACK, that figure is highly dynamic.
Maybe what we should do is keep a small ringbuffer of per-connection
stats like that and let the user code predict what would be an optimal
size based on what went through the last few recent sends. Occasionally
probing if it should go bigger isn't so bad, if it only goes over by a
bit, the malloc'd buffer for the leftover bit is only small, although
sending the small leftover bit next time might hit throughput a bit.
More information about the Libwebsockets