[Libwebsockets] surprise when using equivalent file descriptors

andy at warmcat.com andy at warmcat.com
Wed Nov 4 21:14:50 CET 2020



On November 4, 2020 7:32:42 PM UTC, Per Bothner <per at bothner.com> wrote:
>I just ran into a confusing bug.  I may have
>already found a fix, but I'm not sure if it's the right fix.
>
>DomTerm uses lws for websockets, https, plain sockets, and file
>descriptors.
>Specifically, it wraps logical stdin/stdout/stderr using
>lws_adopt_descriptor_vhost.
>There are some complications because "logical" stdin/stdout/stderr
>may be proxied over either ssh or a Unix domain socket, but the
>problem I encountered is when stdin/stdout are connected to the
>terminal (pty),
>created before domterm starts up.  Since they have different file
>descriptors
>(0 and 1) domterm calls lws_adopt_descriptor_vhost once for each.
>
>So what seems to happen is I get RAW_RX_FILE callback on the lws
>corresponding to fd 0 - and then I get another RAW_RX_FILE callback
>on the lws corresponding to fd 1.  The latter fails with read returning
>-1,
>with errno=11=EAGAIN, which confused me until (I think) I figured out
>the issue and fix.
>
>The obvious fix once I realized the failure was EAGAIN was:
>
>         n = read(...);
>         if (n < 0 && errno == EAGAIN)
>             return 0;
>         if (n <= 0)
>             return -1;
>         return handle_input(...)
>
>That seems to work - but perhaps it wrong to create 2 different lws
>instances for distinct file descriptors that are dup'd to each other?

If we're literally talking about both the original fd and its dup() both managed by lws event loop, it's not really 'wrong', but it means that from the event / wake perspective if one fd feels POLLIN state is called for because something to read on the underlying shared socket, the other, dup-ed fd will definitely feel the same way, and probably atomically, ie either neither fd will come out of the wait with POLLIN indicated, or both will.

Normally the RX callback is only executed due to a POLLIN indication on the fd, we already know there will be something to read and there won't be any EAGAIN possible.  But servicing the first fd also read what was originally thought to be waiting on the dup'd guy too, and when he services his separate POLLIN there is usually nothing to read any more in the shared underlying object, unless new rx data raced it.

>It may be difficult to detect this situation, but doing raw
>synchronous writes to stdout (without using a struct lws)
>is unlikely to hang or cause other problems, especially if
>I restrict it to the isatty(1) case.

If it's about that dup case, accepting EAGAIN as normal for fd isn't much overhead.... but... from your earlier description, if fd 1== stdout, the fd might not support POLLIN / rx if, eg, opened O_WRONLY.  For both cases if the fd's real purpose is simplex output like driving stdout or stderr you can defeat POLLIN event on the fd by 'set' (the flag is 0) rx flow control on it permanently

https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-misc.h#n598

-Andy



More information about the Libwebsockets mailing list