[Libwebsockets] how is partial write handled

Andy Green andy at warmcat.com
Fri Oct 18 00:23:42 CEST 2013


LANGLOIS Olivier PIS -EXT <olivier.pis.langlois at transport.alstom.com> wrote:
>Hi Andy,
>>
>> Normally that would be correct.
>>
>> But for lws I don't believe that's the case.  We use SO_SNDBUF to
>reserve an
>> amount of buffer space on the socket set by the protocol struct
>definition.
>> Although it's not well documented what the effect of SO_SNDBUF is I
>> understand this to mean we won't get signalled by poll() the socket
>is
>> writeable until that amount of packet data is writeable in one hit. 
>I have
>> been unable to reproduce partial sends since adding this.
>
>This is true for UDP where the send buffer is not really used anyway.
>However the socket option you need to use to have the effect that you
>describe is SO_SNDLOWAT.
>
>but even if you have sufficient space in the send buffer, I'm not even
>sure that you can totally rule out the possibility of a partial write.



>Can I use your library if I cannot predict ahead of time the length of
>my messages because their length is variable?

Yes... websockets itself defines fragments for that purpose.

>> For websocket messages that are larger than the maximum buffer
>> reservation the kernel would accept, you must use websocket
>fragments.
>> But at some point with multi-MB messages you have to use websocket
>> fragments anyway.
>
>I respectfully disagree with you. While using websocket fragments might
>be preferable to make multiplexing extension more effective, as far as
>I can see, there is nothing in the protocol stopping you to make bigger
>message than SO_SNDBUF bytes. Quite the opposite, WebSocket message
>header payload len field support len size up to 64 bits.

Yes I was not clear, I was explaining how it is with lws and nonblocking behaviour.  You 'must use websocket fragments' because we cannot block until the buffer is exhausted and lws has no buffer lifetime allowing it to go back into the poll loop.

>> > I'm currently using 1.22. Is the situation better in the git head
>> > version?
>> >
>> > If not, is this enhancement in the short-term roadmap?
>>
>> If you can demonstrate the truncated send issue exists under the
>conditions
>> above I will add support for dealing with it.
>>
>I do not have the time to do the experiment myself but I can give some
>ideas and links to help better appreciate the issue.
>
>1. In Stevens UNIX Network Programming Volume 1, you can check what
>Stevens has to say. I do not have my copy around but I can probably
>quote a paragraph from the non-blocking i/o chapter tomorrow saying
>that no matter what, you must prepare to handle partial writes.

I agree with it but if the kernel cooperates with only signalling writability for whole packets and the code never issues packet sizes above that, it would be worked around.

>2. Read Linux source code that handles non-blocking i/o. I would think
>part of it could be found under fs subfolder.

Dude that's defined in the networking stack.  This isn't anything to do with file nonblocking io.  I did try to study it but the behaviour about deciding when there is 'enough' buffer reservation on a socket vs memory pressure was too difficult to understand.

>3. To easily reproduce and assuming (payload len < SO_SNDBUF value &&
>SO_SNDBUF value % payload len != 0), just use a network simulator
>gateway that drops TCP ACK segments randomly and have a server sending
>at a decent pace payloads of size equal to SO_SNDBUF value/2 -1.
>
>Also note that even if your send buffer is perfectly x times the size
>of your payload len, you are dependent on how your stack will segment
>the stream based on its MTU and also dependent on what is the
>acknowledgement strategy of the peer stack and packet drops of the
>network and possibly many others variables that are out of control of a
>single process.

Yeah... no doubt that leaves little room on the socket's buffer.  But the decision on kernelside is way more complex.  I found this problem originally by putting the server under memory pressure not by network behaviour.  Under those conditions it estimates waiting for more buffer space may not lead to good throughput and goes with pollout early.

However since so_sndbuf I can't reproduce it.

The trac bug currently has someone saying they can reproduce, so I'll follow that up.  Unfortunately this is happening just as I am about to enter 'I have no time at all' mode.

-Andy

>
>________________________________
>CONFIDENTIALITY : This e-mail and any attachments are confidential and
>may be privileged. If you are not a named recipient, please notify the
>sender immediately and do not disclose the contents to another person,
>use it for any purpose or store or copy the information in any medium.





More information about the Libwebsockets mailing list