[Libwebsockets] Malloc and http headers
andy at warmcat.com
Fri Dec 25 08:23:38 CET 2015
On 12/24/2015 08:54 AM, Andy Green wrote:
> Hi -
> While wandering about the code in various ways the last few weeks, doing
> quite restful cleaning activities, I noticed we are basically willing to
> allocate headers storage for any amount of connects at the moment. We
> free() it when the connection upgrades to ws or closes of course but the
> peak allocation in a connection storm is not really controlled.
> At the moment if we get a connection, it's enough to make us allocate
> the struct lws (~256 bytes) and allocated_headers (~2300 bytes).
> Actually for mbed3 where there's only 256KB RAM in the system, that's
> not so good... even for larger systems it's better if under stress it
> doesn't just spew mallocs but makes the connections wait beyond a
> certain point until the guys using the headers completed, timed out or
> upgraded to ws.
> The default header content limit of 1024 could then be increased, if we
> strictly controlled how many of them could be around at a time.
> About mallocs in general, ignoring one-time small allocs and the
> extensions, we have these lws_malloc + lws_zalloc:
> - client-parser.c: allocates and keeps a client per-connection buffer
> for ping payload (PONG must later repeat the payload according to the
> standard). Specified to be < 128 bytes.
> - client.c: the client per-connection user buffer
> - client-handshake.c: the client struct lws
> - getifaddrs.c: allocates the connection peer name temporarily
> - hpack.c (http2): allocates dynamic header dictionary as needed
> - libwebsockets.c: user space allocation
> - output.c: per-connection truncated send storage
> - parsers.c: the http header storage freed at ws upgrade (struct
> allocated_headers = 1024 header content + 164 x 8-byte frags + ~100 =
> ~2300 bytes); server per-connection ping payload buffer (<128)
> - server.c: the server per-connection rx_user_buffer; the struct lws
> for new connections
> - service.c: rx flow control cache (he had a buffer of rx, but during
> processing set rx flow control... need to cache remaining and return to
> event loop)
> How about the following
> 1) Make new connection accept flow controllable (modulate his POLLIN)
> 2) Have the max connection header content size settable by info, default
> to 2048.
> 3) Preallocate a pool of struct allocated_headers in the context, how
> many is set by info, default to say 8. (default to 16KB reserved for
> HTTP headers in the context... can be as low as 1 x 1024 set in info or
> as big as you like... but it will be finite now).
> 4) Switch to using the pool and flow control accepts if they run dry...
> timeouts should stop this becoming a DoS
> 5) Put the PONG / Close buffer as a unsigned char pingbuf in the
> struct _lws_websocket_related (part of the wsi union active when in ws
> mode) and eliminate the related malloc management code. struct lws will
> bloat to ~384 but PONG / Close buffer is part of ws standard and the
> related malloc is gone. If PONG is in use, it will be used on every
> connection. And every connection may receive a Close at some point,
> which also needs this buffer. So might as well bite the bullet.
> This shouldn't affect the ABI except wrt the info struct, everything
> else is private / internal changes. (A bit late it might have been
> smart to bloat the info struct with a fake array at the end we reduce
> when we add new members.)
> Lws is used in a lot of different usecases from very small to very
> large, I think this makes things better for everyone but if it sounds
> like trouble or could be better, discussion is welcome.
I eventually understood Bruce's point in offlist email, but lws actually
already does what he's suggesting (drop the associated allocated_headers
after CALLBACK_HTTP even for non-upgrades) since a long time ago, and
this won't affect that.
So I implemented all these above changes earlier today.
It defaults to 16 x 1024 header storage, but I changed test-server to
restrict it to 1 x 1024 so we can flush out any problems. It hit it
with 100 ab connections with 10 concurrent, and it was fine, serving
them all inside 24ms each... also fine with the test client and 2 x
chrome windows doing Ctrl-R.
So I think the incoming connection queuing is OK.
I also improved the logging at startup about who is using what memory.
The effect of 64-bit pointers + extensions + ssl is pretty noticeable:
on x86_64 with all those the wsi is 520 bytes but on mbed3 (no ext, 32
bit server only) only 232 bytes.
For the info struct, I attempted to have a system where there was an
unused void* array at the end, and we remove them according to what was
added. But there's no nice way to do it I could think of when you might
add a pointer or an int or a char and sizeof void* may be 4 or 8...
Windows compiler blows up if computed padding arrays have zero length
and are not at the end of the struct.
So I left it that we'll permanently have void * at the end of the
info struct to provide some cover for older app binaries to provide
zero-d (default) new members to newer libraries. We haven't been adding
info members very often that should cover several versions if nothing
else changed (let's face it... historically something else changed).
Oh... and github says 1337 commits atm... Happy Christmas.
More information about the Libwebsockets