[Libwebsockets] LWS Full-text search
andy at warmcat.com
Fri Oct 19 01:29:54 CEST 2018
Master has grown a generic, scaleable, lightweight full-text search api
that has been migrated from gitohashi; gitohashi now uses the lws
implementation. It's originally designed for cheap fulltext searches of
potentially huge git trees like the Linux kernel.
It can very rapidly index one or more UTF-8 text "files" (up to hundreds
of thousands of them) into a single index file... the input files may be
virtual / in-memory only as in the gitohashi case.
The index file can be queried to provide:
- smart autocomplete results (these are optimized with the paths
leading to the highest number of hits first)
- lists of files that have matches
- line number and line file offsets for "hits"
- optionally quote the actual text on hit lines, if the original files
are still available.
The results come as linked-lists of structs inside in a struct lwsac.
There's a demo here
which is an indexed text of "The Picture of Dorian Gray" in searchable
form with autocomplete. The demo coverts the results lwsac to JSON for
transport on XHR.
The minimal example behind the demo is here (the libwebsockets.org
version is the same protocol plugin running in lwsws)
General overview and some info is here:
CI test is here
The api-test-fts minimal example includes a cli app that allows you to
create index files from other files given as an argument list, but it's
also simple to do programmatically.
The actual querying only needs enough memory to hold the results in the
lwsac, it costs almost nothing otherwise... it just walks various
structures directly in the index file.
Creating the indexes is more expensive, but for example to index all the
.c and .h in lws master (about 3MB source) takes 124ms and peak
allocation of 3MB on my box, producing a 1.4MB index file.
Doing the same on the Linux kernel sources at 4.14 (53K files, 695MB
source) takes 50s with a peak RAM allocation under 80MB and an index
file of 350MB. Again the queries are very low cost even on weak hardware.
More information about the Libwebsockets