[Libwebsockets] Possible bugs - minimal-http-client-multi example

Dorit Mari Dorit.Mari at audiocodes.com
Thu Sep 17 11:51:11 CEST 2020


Andy, thank you for your quick response. Please see my answers inline (for first and third issues).

-----Original Message-----
From: Andy Green [mailto:andy at warmcat.com]
Sent: Thursday, September 17, 2020 10:38
To: Dorit Mari <Dorit.Mari at audiocodes.com>; libwebsockets at ml.libwebsockets.org
Subject: Re: [Libwebsockets] Possible bugs - minimal-http-client-multi example



On 9/16/20 3:21 PM, Dorit Mari wrote:
> Hi,
>
> I found 3 possible problems when running the minimal example
> “minimal-examples\http-client\minimal-http-client-multi”, which I’d
> like to report.
>
> I used version v4.1.0, on PC running Windows 10.
>
>  1. *_Crash:_*
>
> Without any code changes, when running “minimal-http-client-multi”
> example with cmd line input:
>
> --no-tls --h1 --port 80 --path /api/MVC/status/current --server
> 10.33.70.85 -c 3
>
> lws crashes.

I can't exactly reproduce this easily since no suitable local server, and yours not public, the nearest equivalent I can come up with we can both try is

$ lws-minimal-http-client-multi --no-tls --h1 --port 80 --path
/status/401 --server httpbin.org -c 3

this is returning 3 x 401 OK (on linux) and valgrind shows no problems there either.

[2020/09/17 07:10:07:4522] U: LWS minimal http client [-s (staggered)] [-p (pipeline)]
[2020/09/17 07:10:07:4995] U:    [--h1 (http/1 only)] [-l (localhost)]
[-d <logs>]
[2020/09/17 07:10:07:4999] U:    [-n (numbered)] [--post]
[2020/09/17 07:10:07:5006] N: LWS: 4.1.99-v4.1.0-15-g44b46f470, loglevel
1031
[2020/09/17 07:10:07:5007] N: NET CLI SRV H1 H2 WS MQTT SS-JSON-POL SSPROX IPv6-absent
[2020/09/17 07:10:08:3633] U: started connection 0x54d2ec0: idx 0
(/status/401)
[2020/09/17 07:10:08:3683] U: started connection 0x54ed360: idx 1
(/status/401)
[2020/09/17 07:10:08:3701] U: started connection 0x5500740: idx 2
(/status/401)
[2020/09/17 07:10:08:4754] U: callback_http: doing GET flow
[2020/09/17 07:10:08:4847] U: callback_http: doing GET flow
[2020/09/17 07:10:08:4850] U: callback_http: doing GET flow
[2020/09/17 07:10:08:6201] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
0, resp 401
[2020/09/17 07:10:08:6217] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54d2ec0: idx 0
[2020/09/17 07:10:08:6247] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
2, resp 401
[2020/09/17 07:10:08:6248] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x5500740: idx 2
[2020/09/17 07:10:08:6459] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
1, resp 401
[2020/09/17 07:10:08:6459] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54ed360: idx 1
[2020/09/17 07:10:08:6462] U: Done: all OK
[2020/09/17 07:10:08:6909] U: Duration: 319ms
[2020/09/17 07:10:08:6960] U: Exiting with 0

and I try it on windows 10

C:\Users\agreen\libwebsockets\b1>.\bin\Debug\lws-minimal-http-client-multi.exe
--no-tls --h1 --port 80 --path /status/401 --server httpbin.org -c 4
[2020/09/16 20:17:02:1420] U: LWS minimal http client [-s (staggered)] [-p (pipeline)]
[2020/09/16 20:17:02:1420] U:    [--h1 (http/1 only)] [-l (localhost)]
[-d <logs>]
[2020/09/16 20:17:02:1420] U:    [-n (numbered)] [--post]
[2020/09/16 20:17:02:1580] N: LWS: 4.1.99-v4.1.0-17-g87f06b3d, loglevel 1031
[2020/09/16 20:17:02:1580] N: NET CLI SRV H1 H2 WS IPv6-absent
[2020/09/16 20:17:02:2040] U: started connection 0000020FD6BFEE90: idx 0
(/status/401)
[2020/09/16 20:17:02:2200] U: started connection 0000020FD6C00010: idx 1
(/status/401)
[2020/09/16 20:17:02:2200] U: started connection 0000020FD6BFE090: idx 2
(/status/401)
[2020/09/16 20:17:02:2360] U: started connection 0000020FD6BFFC90: idx 3
(/status/401)
[2020/09/16 20:17:02:2990] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3140] U: callback_http: doing GET flow
[2020/09/16 20:17:02:3920] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
1, resp 401
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0000020FD6C00010: idx 1
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
2, resp 401
[2020/09/16 20:17:02:4080] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0000020FD6BFE090: idx 2
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
0, resp 401
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0000020FD6BFEE90: idx 0
[2020/09/16 20:17:02:4230] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
3, resp 401
[2020/09/16 20:17:02:4390] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0000020FD6BFFC90: idx 3
[2020/09/16 20:17:02:4390] U: Done: all OK
[2020/09/16 20:17:02:4390] U: Duration: 203ms
[2020/09/16 20:17:02:4540] U: Exiting with 0

C:\Users\agreen\libwebsockets\b1>


>> Dorit: I tried the same input as you "--no-tls --h1 --port 80 --path /status/401 --server httpbin.org -c" - and there was no crash.
>> When it crashes when accessing my server, it is due to "SIGSEV" signal. "wsi->a.vhost->protocols" has value 0xfeeefeee (the wsi and wsi->a look OK, but wsi->a.vhost doesn't look like it was initialized; it has other members also with values like 0xfeeefeee or -17891602), so we crash when trying to access "wsi->a.vhost->protocols[0].callback".
>> I notices the 401 response from httpbin.org is different than the 401 from my server (which can be found below). Notably, there is "Content-Length: 0" in the 401 from httpbin.org, while the >> 401 from my server doesn't contain Content-Length header. Maybe this is relevant?


> It sends 3 Get requests, and receives 401 response for all 3.
>
> The crash also happens with "-c 1" (only 1 Get request is sent, and
> lws receives 1 401 response).
>
> the 401 response looks like:
>
> HTTP/1.1 401 Unauthorized
>
> WWW-Authenticate: Basic realm="Realm1"
>
> Content-Type: application/json
>
> Transfer-Encoding: chunked
>
> Server: Web-Server
>
> Connection: close
>
> The crash is in __lws_close_free_wsi_final(struct lws *wsi), in the
> clause
>
>                  if (wsi->a.vhost)
>
>
> wsi->a.vhost->protocols[0].callback(wsi, LWS_CALLBACK_WSI_DESTROY,
>
>
> wsi->user_space, NULL, 0);

So what is the "crash"... something is NULL, or crazy, or what?  Eg, wsi is NULL, the .callback is NULL... what actually makes the segfault or whatever?

> The console output from the run is:
>
> [2020/09/16 14:38:56:1630] U: LWS minimal http client [-s (staggered)]
> [-p (pipeline)]
>
> [2020/09/16 14:38:56:1690] U:    [--h1 (http/1 only)] [-l (localhost)]
> [-d <logs>]
>
> [2020/09/16 14:38:56:1730] U:    [-n (numbered)] [--post]
>
> [2020/09/16 14:38:56:1760] N: LWS: 4.1.99-v4.1.0-12-g18d4ac4e,
> loglevel 1031
>
> [2020/09/16 14:38:56:1790] N: NET CLI SRV H1 H2 WS IPv6-absent
>
> [2020/09/16 14:38:56:4493] U: started connection 0067f630: idx 0
> (/api/MVC/status/current)
>
> [2020/09/16 14:38:56:4582] U: started connection 00680f18: idx 1
> (/api/MVC/status/current)
>
> [2020/09/16 14:38:56:4652] U: started connection 00682618: idx 2
> (/api/MVC/status/current)
>
> [2020/09/16 14:38:56:4702] U: callback_http: doing GET flow
>
> [2020/09/16 14:38:56:4742] U: callback_http: doing GET flow
>
> [2020/09/16 14:38:56:4772] U: callback_http: doing GET flow
>
> [2020/09/16 14:38:56:4832] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
> 0, resp 401
>
> [2020/09/16 14:38:56:5111] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
> 1, resp 401
>
> [2020/09/16 14:38:56:5161] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
> 2, resp 401
>
> [2020/09/16 14:38:56:5391] E: Done: failed: 3
>
> If I change the code, and add in "case
> LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER" an Authorization header,
> so that instead of 401, we receive 200 OK responses -> there is no crash.

OK.  But generally if it's not a well-known redirect, lws doesn't feel a
non-200 is a big deviation, it proceeds with the rest of the parsing as normal.

There are platform differences but in this kind of logic, it should be the same for windows and linux.

>  2. *_HTTP Requests aren't sent when pipelining in http 1.1 is used,
>     without "staggering" (the command line input doesn't include
> "-s"):_*
>
> I tried sending 4 Get requests, with pipelining enabled, with http
> 1.1, no "staggering":
>
> with cmd line input:
>
> --no-tls --h1 -p --port 80 --path /api/MVC/status/current --server
> 10.33.70.85 -c 4
>
> ==>> no HTTP request was sent out. The server does listen at the IP
> address and port I set, and would respond with 200 OK to a Get request
> to the path I set.
>
> console output:
>
> [2020/09/16 14:50:34:4482] U: LWS minimal http client [-s (staggered)]
> [-p (pipeline)]
>
> [2020/09/16 14:50:34:4561] U:    [--h1 (http/1 only)] [-l (localhost)]
> [-d <logs>]
>
> [2020/09/16 14:50:34:4611] U:    [-n (numbered)] [--post]
>
> [2020/09/16 14:50:34:4631] N: LWS: 4.1.99-v4.1.0-12-g18d4ac4e,
> loglevel 1031
>
> [2020/09/16 14:50:34:4661] N: NET CLI SRV H1 H2 WS IPv6-absent
>
> [2020/09/16 14:50:34:7544] U: started connection 0067f660: idx 0
> (/api/MVC/status/current)
>
> [2020/09/16 14:50:34:7613] N: lws_vhost_active_conns: apply 00680f48
> to txn queue on 0067f660 state 0x10000202
>
> [2020/09/16 14:50:34:7673] U: started connection 00680f48: idx 1
> (/api/MVC/status/current)
>
> [2020/09/16 14:50:34:7703] N: lws_vhost_active_conns: apply 00682620
> to txn queue on 0067f660 state 0x10000202
>
> [2020/09/16 14:50:34:7753] U: started connection 00682620: idx 2
> (/api/MVC/status/current)
>
> [2020/09/16 14:50:34:7803] N: lws_vhost_active_conns: apply 00683cf8
> to txn queue on 0067f660 state 0x10000202
>
> [2020/09/16 14:50:34:7853] U: started connection 00683cf8: idx 3
> (/api/MVC/status/current)
>
> [2020/09/16 14:50:34:7893] E: CLIENT_CONNECTION_ERROR: Unable to
> connect
>
> [2020/09/16 14:50:34:7932] E: Done: failed: 4
>
> [2020/09/16 14:50:34:7952] U: Duration: 6ms
>
> [2020/09/16 14:50:34:8631] U: Exiting with 1
>
> Press <RETURN> to close this window...

Hm again on Linux

valgrind ./bin/lws-minimal-http-client-multi --no-tls --h1 --port 80 --path /status/401 --server httpbin.org -c 4 -p ==54113== Memcheck, a memory error detector ==54113== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==54113== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info ==54113== Command: ./bin/lws-minimal-http-client-multi --no-tls --h1 --port 80 --path /status/401 --server httpbin.org -c 4 -p ==54113==
[2020/09/17 07:16:22:2985] U: LWS minimal http client [-s (staggered)] [-p (pipeline)]
[2020/09/17 07:16:22:3451] U:    [--h1 (http/1 only)] [-l (localhost)]
[-d <logs>]
[2020/09/17 07:16:22:3455] U:    [-n (numbered)] [--post]
[2020/09/17 07:16:22:3461] N: LWS: 4.1.99-v4.1.0-15-g44b46f470, loglevel
1031
[2020/09/17 07:16:22:3463] N: NET CLI SRV H1 H2 WS MQTT SS-JSON-POL SSPROX IPv6-absent
[2020/09/17 07:16:23:2191] U: started connection 0x54d2ec0: idx 0
(/status/401)
[2020/09/17 07:16:23:2210] N: lws_vhost_active_conns: apply 0x54ed360 to txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2238] U: started connection 0x54ed360: idx 1
(/status/401)
[2020/09/17 07:16:23:2239] N: lws_vhost_active_conns: apply 0x54eec80 to txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2240] U: started connection 0x54eec80: idx 2
(/status/401)
[2020/09/17 07:16:23:2241] N: lws_vhost_active_conns: apply 0x54f05a0 to txn queue on 0x54d2ec0 state 0x10000202
[2020/09/17 07:16:23:2242] U: started connection 0x54f05a0: idx 3
(/status/401)
[2020/09/17 07:16:23:3372] U: callback_http: doing GET flow
[2020/09/17 07:16:23:4777] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
0, resp 401
[2020/09/17 07:16:23:4792] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54d2ec0: idx 0
[2020/09/17 07:16:23:4868] N:
_lws_generic_transaction_completed_active_conn: pipeline queue passed wsi 0x54d2ec0 on to queued wsi 0x54ed360
[2020/09/17 07:16:23:5038] U: callback_http: doing GET flow
[2020/09/17 07:16:23:5956] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
1, resp 401
[2020/09/17 07:16:23:5958] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54ed360: idx 1
[2020/09/17 07:16:23:5961] N:
_lws_generic_transaction_completed_active_conn: pipeline queue passed wsi 0x54ed360 on to queued wsi 0x54eec80
[2020/09/17 07:16:23:5967] U: callback_http: doing GET flow
[2020/09/17 07:16:23:6899] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
2, resp 401
[2020/09/17 07:16:23:6901] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54eec80: idx 2
[2020/09/17 07:16:23:6905] N:
_lws_generic_transaction_completed_active_conn: pipeline queue passed wsi 0x54eec80 on to queued wsi 0x54f05a0
[2020/09/17 07:16:23:6911] U: callback_http: doing GET flow
[2020/09/17 07:16:23:7835] U: LWS_CALLBACK_ESTABLISHED_CLIENT_HTTP: idx:
3, resp 401
[2020/09/17 07:16:23:7838] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP
0x54f05a0: idx 3
[2020/09/17 07:16:23:7847] U: Done: all OK
[2020/09/17 07:16:23:8636] U: Duration: 638ms
[2020/09/17 07:16:23:8674] U: Exiting with 0 ==54113== ==54113== HEAP SUMMARY:
==54113==     in use at exit: 0 bytes in 0 blocks
==54113==   total heap usage: 30,559 allocs, 30,559 frees, 2,455,965
bytes allocated
==54113==
==54113== All heap blocks were freed -- no leaks are possible ==54113== ==54113== For lists of detected and suppressed errors, rerun with: -s ==54113== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


> If I do use Staggering (this is the only change from above scenario
> which fails), so the cmd line input is:
>
> --no-tls -s --h1 -p --port 80 --path /api/MVC/status/current --server
> 10.33.70.85 -c 4
>
> All 4 Get requests succeed.
>
> The only code change I made to minimal-http-client-multi.c was in
> callback_http(), to add Authorization header (since my server expects it):
>
>                  case LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER:
>
>                                  /*
>
>                                  * Tell lws we are going to send the
> body next...
>
>                                  */
>
>                                  if (posting &&
> !lws_http_is_redirected_to_get(wsi)) {
>
>                                                  lwsl_user("%s: doing
> POST flow\n", __func__);
>
>
> lws_client_http_body_pending(wsi, 1);
>
>
> lws_callback_on_writable(wsi);
>
>                                  } else
>
>                                  {
>
>                                                  lwsl_user("%s: doing
> GET flow\n", __func__);
>
>                                                  unsigned char **p =
> (unsigned char **)in, *end = (*p) + len;
>
>                                                  char buff[128];
>
>                                                  if
> (lws_http_basic_auth_gen("Admin", "Admin", buff, sizeof(buff)))
>
>
> break;
>
>                                                  if
> (lws_add_http_header_by_token(wsi, WSI_TOKEN_HTTP_AUTHORIZATION,
>
>
> (unsigned char *)buff, (int)strlen(buff), p, end))
>
>
> return -1;
>
>                                  }
>
>                                  break;
>
>  3. *_I am not sure that pipelining in http 1.1  actually works:_*
>
> I changed the "staggering" code, so lws will only wait 1 millisecond
> between requests, like this:

The stagger test is there because it reflects the underlying situation in the process of the client connections for eg h2 case, whoever the first client request was to the same endpoint becomes the "leader", while he is trying to establish his connection asynchronously, if more connection requests for his endpoint come in, they are queued against him so they can potentially reuse his connection.

If the leader's h2 connection succeeds, then the queued guys jump in on his network connection as parallel mux streams.

If there is a valid h2 connection already established to the same endpoint at the time a new request comes, the additional client connection requests simply join it as parallel mux streams directly, without being queued.

So there is a different behaviour that can be expected depending on the interval between one client connection request to the same endpoint and another.  After one transaction completes, there is another interval based heuristic that the idle network connection is held open for a few seconds so additional connections may jump in.

H1 has a similar situation, the first guy must do the dns lookup, the connection at tcp and then tls level, if more connection requests come during this they queue on this "leader" and then are passed in as the transactions complete.  If they come shortly after the leader completed, and the connection idle, they can go out or queue if more than one.  If they come after the idle connection terminated, the first of these new guys becomes the leader and regenerates his own connection, any more queue on him.

> static void
>
> stagger_cb(lws_sorted_usec_list_t *sul)
>
> {
>
>                  lws_usec_t next;
>
>                  /*
>
>                  * open the connections at 100ms intervals, with the
>
>                  * last one being after 1s, testing both queuing, and
>
>                  * direct H2 stream addition stability
>
>                  */
>
>                  lws_try_client_connection(&i, stagger_idx++);
>
>                  if (stagger_idx == count)
>
>                                  return;
>
>                  next = 1 * LWS_US_PER_MS;
>
>                  lws_sul_schedule(context, 0, &sul_stagger,
> stagger_cb, next);
>
> }
>
> and also added a print log in "case
> LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER":
>
>                  case LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER:
>
>
> lwsl_user("LWS_CALLBACK_CLIENT_APPEND_HANDSHAKE_HEADER %p: idx %d\n",
>
>                                                    wsi, idx);
>
> and used cmd line input:
>
> --no-tls -s --h1 -p --port 80 --path /api/MVC/status/current --server
> 10.33.70.55 -c 6
>
> 6 Get requests were sent and succeeded, but they were sent serially.
> Each request was sent only after the previous request completed
>
> successfully with 200 OK - I saw this in wireshark capture. In the
> console output, I also saw that lws_client_connect_via_info()
>
> ("started connection ..." line) was called 5 times before the first
> request succeeded. But each request is done serially.
>
> When pipelining is enabled, next request should be sent before the
> response for the last request is received.

"should"... lws doesn't do that because it is too complicated to deal with speculative issue of post bodies, which may be huge.

If you want that, this is FOSS, patches are welcome... but you should carefully look first at the amount of CI testing around the variations in GET and POST multiple client connections that already exist and work well, and figure in how you are going to test whatever you want to the same extent.

Generally speaking, all h1 is a legacy 30+ year old protocol, it should work but most of its problems are already solved in h2.  There isn't any pipelining in that you can just join in parallel.  Any time spent / wasted on new implementations or optimizations is better done in h2, or quic.



>> Dorit: Please help me make sure I understand correctly the behavior when pipelining (LCCSCF_PIPELINE) is enabled for HTTP 1.1 client - If we have an established, idle connection to some destination, and  then we send multiple requests (e.g. HTTP Get) to that destination - will they always be sent serially, each request only sent after the previous request received a response, and all of them on the same TCP/TLS connection? Does the pipelining option (LCCSCF_PIPELINE) only mean reuse of the TCP/TLS connection?
>> I changed the stagger_cb() function, so after the first request we wait 5 seconds. The first request succeeds, and leaves an established, idle TCP connection to the destination. After that, there is 5 milliseconds interval between requests (between calls to lws_try_client_connection()). All the requests are sent on the same TCP connection, serially (only after a request receives a response, we send the next request). Is this the expected behavior?


-Andy

This email and any files transmitted with it are confidential material. They are intended solely for the use of the designated individual or entity to whom they are addressed. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful.

If you have received this email in error please immediately notify the sender and delete or destroy any copy of this message


More information about the Libwebsockets mailing list