libwebsockets
Lightweight C library for HTML5 websockets
|
<figcaption>LHP Stream-parses HTML and CSS into a DOM and then into DLOs (lws Display List Objects). Multiple, antialiased, proportional fonts, JPEG and PNGs are supported. A linewise rasterizer is provided well-suited to resource-constrained devices with SPI based displays.</figcaption>
<figcaption>Page fetched from https://libwebsockets.org/lhp-tests/t1.html
by an ESP32, and rendered by lws on a 600x448 ACEP 7-colour EPD with 24-bit composition. The warning symbol at the bottom right is a .png img in an absolutely positioned <div>
. The yellow shapes at the top right are divs with css-styled rounded corners. The red div is partly transparent. Display only has a 7 colour palette. Server only sends CSS/HTML/JPEG/PNG, all parsing and rendering done on the ESP32.</figcaption>
Lws is able to parse and render a subset of CSS + HTML5 on very constrained devices such as ESP32, which have a total of 200KB heap available after boot at best. There are some technology advances in lws that allow much greater capability that has previously been possible on those platforms.
The goal is that all system display content is expressed in HTML/CSS by user code, which may also be dynamically generated, with CSS responsive layout simplifying managing the same UI over different display dimensions.
There are restrictions - most generic html on the internet are too complex or want more assets from different hosts than tiny devices can connect to - but they are quite far beyond what you would expect from a 200KB heap limit. It is very possible to mix remote and local http content over h2 including large JPEG and PNG images and express all UI in html/css.
<img>
, toplevel async renderer api takes an lws VFS file:// or https:// URL retrieved via SS. There's easy, customizable lws VFS support at SS for transparently referencing dynamically-generated or .text, or SD card-stored HTML, or other assetsstyle=
element attributes not supported.Heap Costs during active decode (while rendering line that includes image)
Feature | Decoder Cost in heap (600px width) |
---|---|
JPEG-Grayscale | 6.5KB |
JPEG-YUV 4:4:4 | 16.4KB |
JPEG-YUV 4:4:2v | 16.4KB |
JPEG-YUV 4:4:2h | 31KB |
JPEG-YUV 4:4:0 | 31KB |
PNG | 36KB |
Connecting to an external tls source costs around 50KB. So for very constrained targets like ESP32, the only practical way is a single h2 connection that provides the assets as streams multiplexed inside a single tls tunnel.
Integrates CA trust bundle dynamic querying into lws, with openssl and mbedtls. It can support all the typical Mozilla 130+ trusted CAs, using the trust chain information from the server cert to identify the CA cert required, and just instantiating that one to validate the server cert, if it trusts it. The trust CTX is kept around in heap for a little while for the case there are multiple connections needing it.
No heap is needed for trusted certs that are not actively required. This means lws can securely connect over tls to arbitrary servers like a browser would without using up all the memory; without this it's not possible to support arbitrary connections securely within the memory constraints.
Lws supports a logical Display List for graphical primitives common in HTML + CSS, including compressed antialiased fonts, JPEG, PNG and rounded rectangles.
This intermediate representation allows display surface layout without having all the details to hand, and provides flexibility for how to render the logical representation of the layout.
There may not be enough heap to hold a framebuffer for even a midrange display device, eg an RGB buffer for the 600 x 448 display at the top of the page is 800KB. Even if there is, for display devices that hold a framebuffer on the display, eg, SPI TFT, OLED, or Electrophoretic displays, the display data is anyway sent linewise (perhaps in two planes, but still linewise) to the display.
In this case, there is no need for a framebuffer at the device, if the software stack is rewritten to stream-parse all the page elements asynchronously, and each time enough is buffered, processed and composed to produce the next line's worth of pixels. Only one or two lines' worth of buffer is required then.
This is the lws approach, rewrite the asset decoders to operate completely statefully so they can collaborate to provide just the next line's data Just-in- Time.
Lws includes fully stream-parsed decoders, which can run dry for input or output at any state safely, and pick up where they left off when more data or space is next available.
These were rewritten from UPNG and Picojpeg to be wholly stateful. These DLO are bound to flow-controlled SS so the content can be provided to the composer Just In Time. The rewrite requires that it can exit the decode at any byte boundary, due to running out of input, or needing to flush output, and resume with the same state, this is a complete inversion of the original program flow where it only returns when it has rendered the whole image into a fullsize buffer and decode state is spread around stack or filescope variables.
PNG transparency is supported via its A channel and composed by modulating alpha.
Based on mcufont, these are 4-bit antialised fonts produced from arbitrary TTFs. They are compressed, a set of a dozen different font sizes from 10px thru 32px and bold sizes only costs 100KB storage. The user can choose their own fonts and sizes, the encoder is included in lws.
The mcufont decompressor was rewritten to be again completely stateful, glyphs present on the current line are statefully decoded to produce that line's-worth of output only and paused until the next line. Only glyphs that appear on the current line have instantiated decoders.
The anti-alias information is composed into the line buffer as alpha.
Secure Streams and lws VFS now work together via file://
URLs, a SS can be directed to a local VFS resource the same way as to an https://
resource. Resources from https:// and file:// can refer to each other in CSS or <img>
cleanly.
<figcaption>All local and remote resources are fetched using Secure Streams with a VFS file://
or https://
URL. Delivery of enough data to render the next line from multiple sources without excess buffering is handled by lws_flow
.</figcaption>
Dynamic content, such as dynamic HTML, can be registered in a DLO VFS filesystem and referenced via SS either as the toplevel html document or by URLs inside the HTML.
.jpg
and .png
resources can be used in the html and are fetched using their own SS, if coming from the same server over h2, these have very modest extra memory needs since they are sharing the existing h2 connection and tls.
All of the efforts to make JPEG or PNG stream-parsed are not useful if either there is an h1 connection requiring a new TLS session that exhausts the heap, or even if multiplexed into the same h2 session, the whole JPEG or PNG is dumped too quickly into the device which cannot buffer it.
On constrained devices, the only mix that allows multiple streaming assets that are decoded as they come, is an h2 server with streaming modulated by h2 tx credit. The demos stream css, html, JPEG and CSS from libwebsockets.org over h2. In lws, lws_flow
provides the link between maximum buffering targets and the tx_credit flow control management.
The number of assets that can be handled simultaneously on an HTML page is restricted by the irreducible heap cost of decoding them, about 36KB + an RGB line buffer for PNGs, and an either 8 (YUV4:4:4) or 16 RGB (4:4:2 or 4:4:0) line buffer for JPEG.
However, PNG and JPEG decode occurs lazily starting at the render line where the object starts becoming visible, and all DLO objects are destroyed after the last line where they are visible. The SS responsible for fetching and regulating the bufferspace needed is started at layout-time, and the parser is started too up to the point that the header with the image dimensions is decoded, but not beyond that where the large decoder allocation is required.
It means only images that appear on the same line have decoders that are instantiated in memory at the same time; images that don't share any horizontal common lines do not exist in heap simultaneously; basically multiple vertically stacked images cost little more than one.
The demo shows that even on ESP32, the images are cheap enough to allow a full size background JPEG with a partially-transparent PNG composed over it.
Internally, lws provides either a 8-bit Y (grayscale) or 32-bit RGBA (trucolor) composition pipeline for all display elements, based on if the display device is monochrome or not. Alpha (opacity) is supported. This is true regardless of final the bit depth of the display device, so even B&W devices can approximate the same output.
Gamma of 2.2 is also applied before palettization, then floyd-steinberg dithering, all with just a line buffer and no framebuffer needed at the device. The assets like JPEG can be normal, RGB ones, and the rendering adapts down to the display palette and capabilities dynamically.
The lws_display
support in lws has been extended to a variety of common EPD controllers such as UC8171, supporting B&W, B&W plus a third colour (red or yellow typically) and 4-level Gray. The ILI9341 driver for the displays found on WROVER KIT and the ESP32S Kaluga KIT has been enhanced to work with the new display pipline using native 565.
<figcaption>HTML rendered on the device from file:// VFS-stored normal RGB JPEG and HTML/CSS, by ESP32 with BW-Red palette 400x300 EPD</figcaption>
<figcaption>Test html rendered to 24-bit RGB data directly</figcaption>
<figcaption>Test html rendered to 300x240 4-gray palette EPD (from RGB JPEG also fetched from server during render) using Y-only composition... notice effectiveness of error diffusion use of the palette</figcaption>
<figcaption>Test html rendered to 104 x 212 BW-Red palette EPD, red h1 text set by CSS color:#f00
, on a lilygo ESP32-based EPD label board</figcaption>
<figcaption>Test html rendered to 104 x 212 BW flexible EPD, notice font legibility, effectiveness of dither and presence of line breaks</figcaption>
<figcaption>ESP32 WROVER KIT running the example carousel on a 320x200 565 RGB SPI display. 10s delay between tests snipped for brevity, otherwise shown realtime. Moire is artifact of camera. As composition is linewise, the JPEG and other data from libwebsockets.org is arriving and being completely parsed / composed in the time taken to update the display. Interleaved SPI DMA used to send line to display while rendering the next.</figcaption>
To maximize the scalability, HTML is parsed into an element stack, consisting of a set of nested parent-child elements. As an element goes out of scope and the parsing moves on to the next, its parents also go out of scope and are destroyed... new parsents are kept in the stack again only while they have children in scope. This keeps a strict pressure against large instantaneous heap allocations for HTML parsing, but it has some implications.
This "goldfish memory" "keyhole parsing" scheme by itself is inadequate when the dimensions of future elements will affect the dimensions of the current one, eg, a table where we don't find out until later how many rows it has, and so how high it is. There's also a class of retrospective dimension acquisition, eg, where a JPEG img
is in a table, but we don't find out its dimensions until we parse its header much later, long after the whole http parser stack related to it has been destroyed, and possibly many other things laid out after it.
You basically give it an https://
or file://
URL, a structure for the render state, and a callback for when the DLOs have been created and lines of pixels are being emitted. The source fetching, parsing, layout, and finally rendering proceed asynchronously on the event loop without blocking beyond the time taken to emit by default 4 lines.
In these examples, the renderer callback passes the lines of pixels to the lws_display
blit op.
See ./include/libwebsockets/lws-html.h
for more information.
Also see ./minimal-examples-lowlevel/api-tests/api-test-lhp-dlo/main.c
example, you can render to 24-bit RGB on stdout by giving it a URL, eg
The raw RGB can be opened in GIMP.