[Libwebsockets] trouble with external POLL array handling

Edwin van den Oetelaar oetelaar.automatisering at gmail.com
Wed Jan 16 20:41:54 CET 2013


Hello Andy and other folks,

I was already very impressed with the speed improvements in the last
few days, now is time for another approach.
We were using a hash table to find the index of the fd, ok seems logical.
Let's see how many fd's can be there, what is the maximum fd we need
to look up for its index...
Some research gave me :

   int k = getdtablesize();
    fprintf(stderr, "I can open %d files\n", k);
    // this means my lookuptable has to be this large to be safe!!
    // getdtablesize() returns the maximum number of files a process
    // can have open, one more than the largest possible value for a file
    // descriptor.
    // source : http://linux.die.net/man/2/getdtablesize

this reports 32000 on my system, but on normal systems something like
4096 I think.

So we do all this trouble to lookup some int to int mapping?? with a
maximum of 32000.

I tried this :

#ifdef EXTERNAL_POLL
#define MAX_POLL_ELEMENTS (32000)

struct pollfd pollfds[MAX_POLL_ELEMENTS] = {};
int fd_lookup[MAX_POLL_ELEMENTS] = {};

and replaced the external poll handling with this :

#ifdef EXTERNAL_POLL
        /*
         * callbacks for managing the external poll() array appear in
         * protocol 0 callback
         */

    case LWS_CALLBACK_ADD_POLL_FD:
        if (count_pollfds == MAX_POLL_ELEMENTS) {
            fprintf(stderr, "LWS_CALLBACK_ADD_POLL_FD: too many
sockets to track\n");
            return 1;
        }
        assert(fd < MAX_POLL_ELEMENTS);

        fd_lookup[fd] = count_pollfds; // no hashtable anymore

        pollfds[count_pollfds].fd = (int) (long) user;
        pollfds[count_pollfds].events = (int) len;
        pollfds[count_pollfds++].revents = 0;
        break;

    case LWS_CALLBACK_DEL_POLL_FD:
        if (v) fprintf(stderr, "Removing fd %d\n", fd);

        m = fd_lookup[fd]; // m is position of fd in pollfds[] array
        count_pollfds--;
        if (count_pollfds) {
            pollfds[m] = pollfds[count_pollfds]; // move last in empty place
            pollfds[count_pollfds].fd = -1; // invalidate last item
        }
        fd_lookup[fd] = -1; // invalidate
        break;

    case LWS_CALLBACK_SET_MODE_POLL_FD:
        if (v) fprintf(stderr, "Set Poll mode fd=%d mode=%d\n", fd, (int) len);
        n = fd_lookup[fd]; // m is position of fd in pollfds[] array

        pollfds[n].events |= (int) (long) len;
        break;

    case LWS_CALLBACK_CLEAR_MODE_POLL_FD:
        n = fd_lookup[fd]; // m is position of fd in pollfds[] array
        if (v) fprintf(stderr, "Clear Poll mode fd=%d mode=%d\n", fd,
(int) len);
        pollfds[n].events &= ~(int) (long) len;
        break;
#endif

This is even faster than the other approach.
It is even simpler, and does not use a lot of memory or CPU anyway.

Do you want to try and benchmark it?
Maybe this could be a way to handle other stuff too?

Best regards,
Edwin van den Oetelaar


On Wed, Jan 16, 2013 at 2:24 AM, "Andy Green (林安廸)" <andy at warmcat.com> wrote:
> On 16/01/13 00:28, the mail apparently from "Andy Green (林安廸)" included:
>
>> On 15/01/13 18:03, the mail apparently from "Andy Green (林安廸)" included:
>>>
>>> On 15/01/13 17:54, the mail apparently from Edwin van den Oetelaar
>>> included:
>>>>
>>>> Under high concurrency and high load I notice connection refused.
>>>>
>>>> socket.h
>>>> #define SOMAXCONN    128
>>>>
>>>> in libwebsockets/lib/libwebsockets.c
>>>> line 2965: listen(sockfd, SOMAXCONN );
>>>>
>>>> This seems to be logical, and I never looked at it before but....
>>>> since I can do stuff like :
>>>>
>>>> echo "2048 64512" > /proc/sys/net/ipv4/ip_local_port_range
>>>> echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
>>>> echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
>>>> echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout
>>>>
>>>>>>>>> echo "65536" > /proc/sys/net/core/somaxconn
>>>
>>>
>>> ha
>>>
>>>> echo "65536" > /proc/sys/net/ipv4/tcp_max_syn_backlog
>>>> echo "262144" > /proc/sys/net/netfilter/nf_conntrack_max
>>>>
>>>> Maybe the listen() backlog should be set so a large value (which will
>>>> be truncated automatically to what the system can handle) or
>>>> configurable?
>>>
>>>
>>> Yes configurable is the way I think, I'll add a patch for it later
>>> tonight.
>>>
>>> Actually these big instantaneous backlogs are always going to be
>>> synthetic in origin I would think.  Of course we then need to support it
>>> to see it work with the synthetic tests, so it doesn't get us out of the
>>> problem.  But then having it configurable will be enough.
>>>
>>>> An other way would be to handle the listen() fd different and accept
>>>> more than 1 connection per loop over the fd list?
>>>>
>>>> Just some thoughts,
>>>
>>>
>>> I think it's OK to leave them as backlog, since they are newbies to the
>>> server it's better they wait a little under load than everybody else
>>> gets latency because we favour new joiners.
>>
>>
>> Your idea was a good one... this is the only thing I found that will
>> shift the symptom
>>
>> diff --git a/test-server/test-server.c b/test-server/test-server.c
>> index ed316ac..7219de5 100644
>> --- a/test-server/test-server.c
>> +++ b/test-server/test-server.c
>> @@ -689,6 +689,15 @@ int main(int argc, char **argv)
>>                                          if
>> (libwebsocket_service_fd(context,
>>
>> &pollfds[n]) < 0)
>>                                                  goto done;
>> +
>> +
>> +               n = 1;
>> +               while (n > 0) {
>> +                       n = poll(pollfds, 1, 0);
>> +                       if (n > 0)
>> +                               libwebsocket_service_fd(context,
>> &pollfds[0]);
>> +               }
>> +
>>   #else
>>                  n = libwebsocket_service(context, 50);
>>   #endif
>>
>> (the patch is a hack not for commit, it relies on some special knowledge
>> that first fd in there is the listen socket)
>>
>>
>> [agreen at kaiji libwebsockets]$ ab -t 100 -n 5000 -c 300
>> 'http://127.0.0.1:7681/'
>> This is ApacheBench, Version 2.3 <$Revision: 1373084 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 500 requests
>> Completed 1000 requests
>> Completed 1500 requests
>> Completed 2000 requests
>> Completed 2500 requests
>> Completed 3000 requests
>> Completed 3500 requests
>> Completed 4000 requests
>> Completed 4500 requests
>> Completed 5000 requests
>> Finished 5000 requests
>>
>>
>> Server Software:        libwebsockets
>> Server Hostname:        127.0.0.1
>> Server Port:            7681
>>
>> Document Path:          /
>> Document Length:        8447 bytes
>>
>> Concurrency Level:      300
>> Time taken for tests:   9.555 seconds
>> Complete requests:      5000
>> Failed requests:        0
>> Write errors:           0
>> Total transferred:      42680000 bytes
>> HTML transferred:       42235000 bytes
>> Requests per second:    523.28 [#/sec] (mean)
>> Time per request:       573.304 [ms] (mean)
>> Time per request:       1.911 [ms] (mean, across all concurrent requests)
>> Transfer rate:          4362.05 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>                min  mean[+/-sd] median   max
>> Connect:        0   10  27.0      2     173
>> Processing:   266  550  57.7    557     683
>> Waiting:       20   54  17.0     52     134
>> Total:        271  561  53.9    565     683
>>
>> Percentage of the requests served within a certain time (ms)
>>    50%    565
>>    66%    582
>>    75%    594
>>    80%    602
>>    90%    620
>>    95%    631
>>    98%    647
>>    99%    661
>>   100%    683 (longest request)
>
>
> I added a patch in the library to formalize that we gratuitously check the
> listen socket for pending connections every n non-listen socket fd services.
> It also autodetects connection storms as found in ab and adapts while that
> is ongoing to service up to 2 listen socket connections per normal socket
> service.
>
> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=65b0e910610fce54bc88df19b530c2294575f55c
>
> I also optimized the http file send code..
>
> http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/commit/?id=583f8b3b11aa68d9823786dfdd8d28c4352907e3
>
> ...the same test that was 9.5s last night is now able to complete in 3.4s
>
>
> $ ab -t 100 -n 5000 -c 300 'http://127.0.0.1:7681/'
> This is ApacheBench, Version 2.3 <$Revision: 1373084 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 500 requests
> Completed 1000 requests
> Completed 1500 requests
> Completed 2000 requests
> Completed 2500 requests
> Completed 3000 requests
> Completed 3500 requests
> Completed 4000 requests
> Completed 4500 requests
> Completed 5000 requests
> Finished 5000 requests
>
>
> Server Software:        libwebsockets
> Server Hostname:        127.0.0.1
> Server Port:            7681
>
> Document Path:          /
> Document Length:        8447 bytes
>
> Concurrency Level:      300
> Time taken for tests:   3.400 seconds
>
> Complete requests:      5000
> Failed requests:        0
> Write errors:           0
> Total transferred:      42680000 bytes
> HTML transferred:       42235000 bytes
> Requests per second:    1470.76 [#/sec] (mean)
> Time per request:       203.976 [ms] (mean)
> Time per request:       0.680 [ms] (mean, across all concurrent requests)
> Transfer rate:          12260.17 [Kbytes/sec] received
>
>
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        7   24  15.6     20     125
> Processing:    32  172  50.2    161     407
> Waiting:       27  154  49.4    142     386
> Total:         81  196  48.3    182     428
>
>
> Percentage of the requests served within a certain time (ms)
>   50%    182
>   66%    185
>   75%    188
>   80%    194
>   90%    304
>   95%    316
>   98%    322
>   99%    328
>  100%    428 (longest request)
>
> There's still spread in the latencies but I think that's just normal now.
>
> -Andy
>



More information about the Libwebsockets mailing list