[Libwebsockets] Possible bugs - minimal-http-client-multi example

Andy Green andy at warmcat.com
Thu Sep 17 09:38:04 CEST 2020



On 9/16/20 3:21 PM, Dorit Mari wrote:
> Hi,
> 
> I found 3 possible problems when running the minimal example 
> “minimal-examples\http-client\minimal-http-client-multi”, which I’d like 
> to report.
> 
> I used version v4.1.0, on PC running Windows 10.
> 
>  1. *_Crash:_*
> 
> Without any code changes, when running “minimal-http-client-multi” 
> example with cmd line input:
> 
> --no-tls --h1 --port 80 --path /api/MVC/status/current --server 
> 10.33.70.85 -c 3
> 
> lws crashes.

I can't exactly reproduce this easily since no suitable local server, 
and yours not public, the nearest equivalent I can come up with we can 
both try is

$ lws-minimal-http-client-multi --no-tls --h1 --port 80 --path 
/status/401 --server httpbin.org -c 3

this is returning 3 x 401 OK (on linux) and valgrind shows no problems 
there either.

[2020/09/17 07:10:07:4522] U: LWS minimal http client [-s (staggered)] 
[-p (pipeline)]
[2020/09/17 07:10:07:4995] U:    [--h1 (http/1 only)] [-l (localhost)] 
[-d <logs>]
[2020/09/17 07:10:07:4999] U:    [-n (numbered)] [--post]
[2020/09/17 07:10:07:5006] N: LWS: 4.1.99-v4.1.0-15-g44b46f470, loglevel 
1031
[2020/09/17 07:10:07:5007] N: NET CLI SRV H1 H2 WS MQTT SS-JSON-POL 
SSPROX IPv6-absent
[2020/09/17 07:10:08:3633] U: started connection 0x54d2ec0: idx 0 
(/status/401)
[2020/09/17 07:10:08:3683] U: started connection 0x54ed360: idx 1 
(/status/401)
[2020/09/17 07:10:08:3701] U: started connection 0x5500740: idx 2 
(/status/401)
[2020/09/17 07:10:08:4754] U: callback_http: doing GET flow
[2020/09/17 07:10:08:4847] U: callback_http: doing GET flow
[2020/09/17 07:10:08:4850] U: callback_http: doing GET flow
[2020/09/17 07:10:08:6201] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
0, resp 401
[2020/09/17 07:10:08:6217] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54d2ec0: idx 0
[2020/09/17 07:10:08:6247] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
2, resp 401
[2020/09/17 07:10:08:6248] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x5500740: idx 2
[2020/09/17 07:10:08:6459] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
1, resp 401
[2020/09/17 07:10:08:6459] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54ed360: idx 1
[2020/09/17 07:10:08:6462] U: Done: all OK
[2020/09/17 07:10:08:6909] U: Duration: 319ms
[2020/09/17 07:10:08:6960] U: Exiting with 0

and I try it on windows 10

C:\Users\agreen\libwebsockets\b1>.\bin\Debug\lws-minimal-http-client-multi.exe 
--no-tls --h1 --port 80 --path /status/401 --server httpbin.org -c 4
[2020/09/16 20:17:02:1420] U: LWS minimal http client [-s (staggered)] 
[-p (pipeline)]
[2020/09/16 20:17:02:1420] U:    [--h1 (http/1 only)] [-l (localhost)] 
[-d <logs>]
[2020/09/16 20:17:02:1420] U:    [-n (numbered)] [--post]
[2020/09/16 20:17:02:1580] N: LWS: 4.1.99-v4.1.0-17-g87f06b3d, loglevel 1031
[2020/09/16 20:17:02:1580] N: NET CLI SRV H1 H2 WS IPv6-absent
[2020/09/16 20:17:02:2040] U: started connection 0000020FD6BFEE90: idx 0 
(/status/401)
[2020/09/16 20:17:02:2200] U: started connection 0000020FD6C00010: idx 1 
(/status/401)
[2020/09/16 20:17:02:2200] U: started connection 0000020FD6BFE090: idx 2 
(/status/401)
[2020/09/16 20:17:02:2360] U: started connection 0000020FD6BFFC90: idx 3 
(/status/401)
[2020/09/16 20:17:02:2990] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3920] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
1, resp 401
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0000020FD6C00010: idx 1
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
2, resp 401
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0000020FD6BFE090: idx 2
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
0, resp 401
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0000020FD6BFEE90: idx 0
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
3, resp 401
[2020/09/16 20:17:02:4390] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0000020FD6BFFC90: idx 3
[2020/09/16 20:17:02:4390] U: Done: all OK
[2020/09/16 20:17:02:4390] U: Duration: 203ms
[2020/09/16 20:17:02:4540] U: Exiting with 0

C:\Users\agreen\libwebsockets\b1>

> It sends 3 Get requests, and receives 401 response for all 3.
> 
> The crash also happens with "-c 1" (only 1 Get request is sent, and lws 
> receives 1 401 response).
> 
> the 401 response looks like:
> 
> HTTP/1.1 401 Unauthorized
> 
> WWW-Authenticate: Basic realm="Realm1"
> 
> Content-Type: application/json
> 
> Transfer-Encoding: chunked
> 
> Server: Web-Server
> 
> Connection: close
> 
> The crash is in __lws_close_free_wsi_final(struct lws *wsi), in the clause
> 
>                  if (wsi->a.vhost)
> 
>                                  
> wsi->a.vhost->protocols[0].callback(wsi, LWS_CALLBACK_WSI_DESTROY,
> 
>                                                                                                    wsi->user_space, NULL, 0);

So what is the "crash"... something is NULL, or crazy, or what?  Eg, wsi 
is NULL, the .callback is NULL... what actually makes the segfault or 
whatever?

> The console output from the run is:
> 
> [2020/09/16 14:38:56:1630] U: LWS minimal http client [-s (staggered)] 
> [-p (pipeline)]
> 
> [2020/09/16 14:38:56:1690] U:    [--h1 (http/1 only)] [-l (localhost)] 
> [-d <logs>]
> 
> [2020/09/16 14:38:56:1730] U:    [-n (numbered)] [--post]
> 
> [2020/09/16 14:38:56:1760] N: LWS: 4.1.99-v4.1.0-12-g18d4ac4e, loglevel 1031
> 
> [2020/09/16 14:38:56:1790] N: NET CLI SRV H1 H2 WS IPv6-absent
> 
> [2020/09/16 14:38:56:4493] U: started connection 0067f630: idx 0 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:38:56:4582] U: started connection 00680f18: idx 1 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:38:56:4652] U: started connection 00682618: idx 2 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:38:56:4702] U: callback_http: doing GET flow
> 
> [2020/09/16 14:38:56:4742] U: callback_http: doing GET flow
> 
> [2020/09/16 14:38:56:4772] U: callback_http: doing GET flow
> 
> [2020/09/16 14:38:56:4832] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
> 0, resp 401
> 
> [2020/09/16 14:38:56:5111] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
> 1, resp 401
> 
> [2020/09/16 14:38:56:5161] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
> 2, resp 401
> 
> [2020/09/16 14:38:56:5391] E: Done: failed: 3
> 
> If I change the code, and add in "case 
> LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER" an Authorization header, so 
> that instead of 401, we receive 200 OK responses -> there is no crash.

OK.  But generally if it's not a well-known redirect, lws doesn't feel a 
non-200 is a big deviation, it proceeds with the rest of the parsing as 
normal.

There are platform differences but in this kind of logic, it should be 
the same for windows and linux.

>  2. *_HTTP Requests aren't sent when pipelining in http 1.1 is used,
>     without "staggering" (the command line input doesn't include "-s"):_*
> 
> I tried sending 4 Get requests, with pipelining enabled, with http 1.1, 
> no "staggering":
> 
> with cmd line input:
> 
> --no-tls --h1 -p --port 80 --path /api/MVC/status/current --server 
> 10.33.70.85 -c 4
> 
> ==>> no HTTP request was sent out. The server does listen at the IP 
> address and port I set, and would respond with 200 OK to a Get request 
> to the path I set.
> 
> console output:
> 
> [2020/09/16 14:50:34:4482] U: LWS minimal http client [-s (staggered)] 
> [-p (pipeline)]
> 
> [2020/09/16 14:50:34:4561] U:    [--h1 (http/1 only)] [-l (localhost)] 
> [-d <logs>]
> 
> [2020/09/16 14:50:34:4611] U:    [-n (numbered)] [--post]
> 
> [2020/09/16 14:50:34:4631] N: LWS: 4.1.99-v4.1.0-12-g18d4ac4e, loglevel 1031
> 
> [2020/09/16 14:50:34:4661] N: NET CLI SRV H1 H2 WS IPv6-absent
> 
> [2020/09/16 14:50:34:7544] U: started connection 0067f660: idx 0 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:50:34:7613] N: lws_vhost_active_conns: apply 00680f48 to 
> txn queue on 0067f660 state 0x10000202
> 
> [2020/09/16 14:50:34:7673] U: started connection 00680f48: idx 1 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:50:34:7703] N: lws_vhost_active_conns: apply 00682620 to 
> txn queue on 0067f660 state 0x10000202
> 
> [2020/09/16 14:50:34:7753] U: started connection 00682620: idx 2 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:50:34:7803] N: lws_vhost_active_conns: apply 00683cf8 to 
> txn queue on 0067f660 state 0x10000202
> 
> [2020/09/16 14:50:34:7853] U: started connection 00683cf8: idx 3 
> (/api/MVC/status/current)
> 
> [2020/09/16 14:50:34:7893] E: CLIENT_CONNECTION_ERROR: Unable to connect
> 
> [2020/09/16 14:50:34:7932] E: Done: failed: 4
> 
> [2020/09/16 14:50:34:7952] U: Duration: 6ms
> 
> [2020/09/16 14:50:34:8631] U: Exiting with 1
> 
> Press <RETURN> to close this window...

Hm again on Linux

valgrind ./bin/lws-minimal-http-client-multi --no-tls --h1 --port 80 
--path /status/401 --server httpbin.org -c 4 -p
==54113== Memcheck, a memory error detector
==54113== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==54113== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==54113== Command: ./bin/lws-minimal-http-client-multi --no-tls --h1 
--port 80 --path /status/401 --server httpbin.org -c 4 -p
==54113==
[2020/09/17 07:16:22:2985] U: LWS minimal http client [-s (staggered)] 
[-p (pipeline)]
[2020/09/17 07:16:22:3451] U:    [--h1 (http/1 only)] [-l (localhost)] 
[-d <logs>]
[2020/09/17 07:16:22:3455] U:    [-n (numbered)] [--post]
[2020/09/17 07:16:22:3461] N: LWS: 4.1.99-v4.1.0-15-g44b46f470, loglevel 
1031
[2020/09/17 07:16:22:3463] N: NET CLI SRV H1 H2 WS MQTT SS-JSON-POL 
SSPROX IPv6-absent
[2020/09/17 07:16:23:2191] U: started connection 0x54d2ec0: idx 0 
(/status/401)
[2020/09/17 07:16:23:2210] N: lws_vhost_active_conns: apply 0x54ed360 to 
txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2238] U: started connection 0x54ed360: idx 1 
(/status/401)
[2020/09/17 07:16:23:2239] N: lws_vhost_active_conns: apply 0x54eec80 to 
txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2240] U: started connection 0x54eec80: idx 2 
(/status/401)
[2020/09/17 07:16:23:2241] N: lws_vhost_active_conns: apply 0x54f05a0 to 
txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2242] U: started connection 0x54f05a0: idx 3 
(/status/401)
[2020/09/17 07:16:23:3372] U: callback_http: doing GET flow
[2020/09/17 07:16:23:4777] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
0, resp 401
[2020/09/17 07:16:23:4792] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54d2ec0: idx 0
[2020/09/17 07:16:23:4868] N: 
_lws_generic_transaction_completed_active_conn: pipeline queue passed 
wsi 0x54d2ec0 on to queued wsi 0x54ed360
[2020/09/17 07:16:23:5038] U: callback_http: doing GET flow
[2020/09/17 07:16:23:5956] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
1, resp 401
[2020/09/17 07:16:23:5958] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54ed360: idx 1
[2020/09/17 07:16:23:5961] N: 
_lws_generic_transaction_completed_active_conn: pipeline queue passed 
wsi 0x54ed360 on to queued wsi 0x54eec80
[2020/09/17 07:16:23:5967] U: callback_http: doing GET flow
[2020/09/17 07:16:23:6899] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
2, resp 401
[2020/09/17 07:16:23:6901] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54eec80: idx 2
[2020/09/17 07:16:23:6905] N: 
_lws_generic_transaction_completed_active_conn: pipeline queue passed 
wsi 0x54eec80 on to queued wsi 0x54f05a0
[2020/09/17 07:16:23:6911] U: callback_http: doing GET flow
[2020/09/17 07:16:23:7835] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx: 
3, resp 401
[2020/09/17 07:16:23:7838] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP 
0x54f05a0: idx 3
[2020/09/17 07:16:23:7847] U: Done: all OK
[2020/09/17 07:16:23:8636] U: Duration: 638ms
[2020/09/17 07:16:23:8674] U: Exiting with 0
==54113==
==54113== HEAP SUMMARY:
==54113==     in use at exit: 0 bytes in 0 blocks
==54113==   total heap usage: 30,559 allocs, 30,559 frees, 2,455,965 
bytes allocated
==54113==
==54113== All heap blocks were freed -- no leaks are possible
==54113==
==54113== For lists of detected and suppressed errors, rerun with: -s
==54113== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


> If I do use Staggering (this is the only change from above scenario 
> which fails), so the cmd line input is:
> 
> --no-tls -s --h1 -p --port 80 --path /api/MVC/status/current --server 
> 10.33.70.85 -c 4
> 
> All 4 Get requests succeed.
> 
> The only code change I made to minimal-http-client-multi.c was in 
> callback_http(), to add Authorization header (since my server expects it):
> 
>                  case LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER:
> 
>                                  /*
> 
>                                  * Tell lws we are going to send the 
> body next...
> 
>                                  */
> 
>                                  if (posting && 
> !lws_http_is_redirected_to_get(wsi)) {
> 
>                                                  lwsl_user("%s: doing 
> POST flow\n", __func__);
> 
>                                                  
> lws_client_http_body_pending(wsi, 1);
> 
>                                                  
> lws_callback_on_writable(wsi);
> 
>                                  } else
> 
>                                  {
> 
>                                                  lwsl_user("%s: doing 
> GET flow\n", __func__);
> 
>                                                  unsigned char **p = 
> (unsigned char **)in, *end = (*p) + len;
> 
>                                                  char buff[128];
> 
>                                                  if 
> (lws_http_basic_auth_gen("Admin", "Admin", buff, sizeof(buff)))
> 
>                                                                  break;
> 
>                                                  if 
> (lws_add_http_header_by_token(wsi, WSI_TOKEN_HTTP_AUTHORIZATION,
> 
>                                                                                  (unsigned char *)buff, (int)strlen(buff), p, end))
> 
>                                                                  return -1;
> 
>                                  }
> 
>                                  break;
> 
>  3. *_I am not sure that pipelining in http 1.1  actually works:_*
> 
> I changed the "staggering" code, so lws will only wait 1 millisecond 
> between requests, like this:

The stagger test is there because it reflects the underlying situation 
in the process of the client connections for eg h2 case, whoever the 
first client request was to the same endpoint becomes the "leader", 
while he is trying to establish his connection asynchronously, if more 
connection requests for his endpoint come in, they are queued against 
him so they can potentially reuse his connection.

If the leader's h2 connection succeeds, then the queued guys jump in on 
his network connection as parallel mux streams.

If there is a valid h2 connection already established to the same 
endpoint at the time a new request comes, the additional client 
connection requests simply join it as parallel mux streams directly, 
without being queued.

So there is a different behaviour that can be expected depending on the 
interval between one client connection request to the same endpoint and 
another.  After one transaction completes, there is another interval 
based heuristic that the idle network connection is held open for a few 
seconds so additional connections may jump in.

H1 has a similar situation, the first guy must do the dns lookup, the 
connection at tcp and then tls level, if more connection requests come 
during this they queue on this "leader" and then are passed in as the 
transactions complete.  If they come shortly after the leader completed, 
and the connection idle, they can go out or queue if more than one.  If 
they come after the idle connection terminated, the first of these new 
guys becomes the leader and regenerates his own connection, any more 
queue on him.

> static void
> 
> stagger_cb(lws_sorted_usec_list_t *sul)
> 
> {
> 
>                  lws_usec_t next;
> 
>                  /*
> 
>                  * open the connections at 100ms intervals, with the
> 
>                  * last one being after 1s, testing both queuing, and
> 
>                  * direct H2 stream addition stability
> 
>                  */
> 
>                  lws_try_client_connection(&i, stagger_idx++);
> 
>                  if (stagger_idx == count)
> 
>                                  return;
> 
>                  next = 1 * LWS_US_PER_MS;
> 
>                  lws_sul_schedule(context, 0, &sul_stagger, stagger_cb, 
> next);
> 
> }
> 
> and also added a print log in "case 
> LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER":
> 
>                  case LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER:
> 
>                                  
> lwsl_user("LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER %p: idx %d\n",
> 
>                                                    wsi, idx);
> 
> and used cmd line input:
> 
> --no-tls -s --h1 -p --port 80 --path /api/MVC/status/current --server 
> 10.33.70.55 -c 6
> 
> 6 Get requests were sent and succeeded, but they were sent serially. 
> Each request was sent only after the previous request completed
> 
> successfully with 200 OK - I saw this in wireshark capture. In the 
> console output, I also saw that lws_client_connect_via_info()
> 
> ("started connection ..." line) was called 5 times before the first 
> request succeeded. But each request is done serially.
> 
> When pipelining is enabled, next request should be sent before the 
> response for the last request is received.

"should"... lws doesn't do that because it is too complicated to deal 
with speculative issue of post bodies, which may be huge.

If you want that, this is FOSS, patches are welcome... but you should 
carefully look first at the amount of CI testing around the variations 
in GET and POST multiple client connections that already exist and work 
well, and figure in how you are going to test whatever you want to the 
same extent.

Generally speaking, all h1 is a legacy 30+ year old protocol, it should 
work but most of its problems are already solved in h2.  There isn't any 
pipelining in that you can just join in parallel.  Any time spent / 
wasted on new implementations or optimizations is better done in h2, or 
quic.

-Andy



More information about the Libwebsockets mailing list