Faster HTTP usage

What module do you use when doing any kind of HTTP programming - LWP? By far the most popular module for the HTTP protocol in Perl and one which just works and works well with a minimal amount of application code. Now you might be put off by the 123 bug reports in the LWP rt queue but in all my time I've never hit one myself - other than an issue which was not LWP but Perl and to do with tainted data used with LWP (see Why is taint mode so slow?).

I've recently been trying to speed up a small application that makes a huge number of HTTP GET and small POST requests. One important area was that we needed to support cookies as the initial POST is to login (where a cookie is set) and subsequent POSTs need to pass the cookie. Whenever I profile it with Devel::NYTProf (thank you Tim) LWP related modules come up at the top (although in interesting areas like select()). Knowing that LWP is a lot of Perl code the obvious area to look at was an XS module with more compiled C code. We tried GHTTP, Furl, and WWW::Curl::Easy in addition to LWP and there was a clear winner (for us) - WWW::Curl::Easy. Comparing the code using LWP and WWW::Curl::Easy for a simple GET:

use LWP::UserAgent;
my $url = 'http://xxx.yyy.zzz:82/x.dat';
my $ua = LWP::UserAgent->new(agent => 'Bet');
my $get = HTTP::Request->new(GET => $url);
my $response = $ua->request($get);
 die "argh ", $response->message if !$response->is_success;

and with WWW::Curl::Easy:

use WWW::Curl::Easy;
my $url = 'http://xxx.yyy.zzz:82/x.dat';
my $curl = WWW::Curl::Easy->new;
$curlp->setopt(CURLOPT_NOPROGRESS, 1); # shut off the built-in progress meter
$curl->setopt(CURLOPT_HEADER, 0); # don't include hdr in body
$curl->setopt(CURLOPT_URL, $url);
my $response_body;
open (my $fileb, "+>", \$response_body);
$curl->setopt(CURLOPT_WRITEDATA, $fileb);
my $head_ref;
open (my $fileh, "+>", \$head_ref);
$curl->setopt(CURLOPT_WRITEHEADER, $fileh);
my $retcode = $curl->perform;
die "argh ", $curl->errbuf unless $retcode == 0;

So, not much more code but a little extra to write the HTTP headers separately from the body which makes parsing them easier later. You also need to remember that since Curl is writing the header data to a scalar via a file handle you need to seek to the start and truncate it between each request unless you set WRITEHEADER each time. Of course, reading the headers is considerably easier in LWP.

WWW::Curl::Easy is a fairly simple wrapper around libcurl which is written in C. You might initially be put off by the lack of pod but in fact it does not need much as it is just a wrapper around libcurl so most of the documentation is in libcurl. The equivalent LWP and WWW::Curl::Easy code shows quite clearly that LWP is far more concise (in Perl terms) but the speedup we got using libcurl was worth having. On a simple GET request with 2000 GETs we got:

LWP: 9 wallclock secs ( 5.76 usr + 1.05 sys = 6.81 CPU) @ 293.69/s (n=2000)
GHTTP: 3 wallclock secs ( 0.50 usr + 0.70 sys = 1.20 CPU) @ 1666.67/s (n=2000)
Curl: 3 wallclock secs ( 0.26 usr + 0.25 sys = 0.51 CPU) @ 3921.57/s (n=2000)
Furl: 3 wallclock secs ( 0.93 usr + 0.22 sys = 1.15 CPU) @ 1739.13/s (n=2000)

So far so good. Changing to WWW::Curl::Easy seemed a no brainer except when I tried POSTs it was massively slower. Strangely, it was far less CPU time but more than twice as slow in real time:

Benchmark: timing 1000 iterations of curl, lwp...
curl: 56 wallclock secs ( 0.16 usr + 0.09 sys = 0.25 CPU) @ 4000.00/s (n=1000)
(warning: too few iterations for a reliable count)
lwp: 22 wallclock secs ( 4.29 usr + 0.60 sys = 4.89 CPU) @ 204.50/s (n=1000)

Those numbers seemed strange to me; why was Curl apparently waiting because clearly it was spending far less CPU time. A quick experiment with the Curl options showed that setting either CURLOPT_TCP_NODELAY or CURLOPT_FORBID_REUSE changed the results to:

Benchmark: timing 1000 iterations of curl, lwp...
curl: 18 wallclock secs ( 0.28 usr + 0.10 sys = 0.38 CPU) @ 2631.58/s (n=1000)
(warning: too few iterations for a reliable count)
lwp: 22 wallclock secs ( 4.33 usr + 0.61 sys = 4.94 CPU) @ 202.43/s (n=1000)

But the nagle algorithm surely is not involved here as I was only doing one POST and surely Curl was doing only a single write to the socket! I'd looked at the WWW::Curl::Easy pod and it had a nice formadd method where you can add form arguments so my code looked like:

    seek $curl_hdr_fh, 0, 0;
    truncate $curl_hdr_fh, 0;

    my $url = 'http://xxx.yyy.zzz:82';
    $curlp->setopt(CURLOPT_URL, $url);

    my $form = WWW::Curl::Form->new;
    $form->formadd('method', 'send_keep_alive');
    $curlp->setopt(CURLOPT_HTTPPOST, $form);
    my @hdrs = ('Expect:');
    $curlp->setopt(CURLOPT_HTTPHEADER, \@hdrs);
    my $response_body;
    open(my $fileb, "+>", \$response_body);
    $curlp->setopt(CURLOPT_WRITEDATA, $fileb);
    my $retcode = $curlp->perform;
    my $response_code =  $curlp->getinfo(CURLINFO_HTTP_CODE);
    if ($retcode != 0 || ($response_code != 200 && $response_code != 304)) {
        die "Failed to get url - $response_code, ", $curlp->errbuf;
    }
    #print $response_body, "\n";

Examining an strace of the code I could see 2 writes to the socket connected to the HTTP server, the first for the HTTP header and the second for the POSTed data (which is very small). This is how nagle intervenes as if the second write is small and done before the ACK from the other end nagle holds on to the data briefly in the hope you are going to write more. Thanks to the nice people on the libcurl mailing list the problem was identified as formadd does a multipart form and internally in libcurl this results in 2 writes to the socket. It appears there is a way to do a simple POST like this:

    seek $curl_hdr_fh, 0, 0;
    truncate $curl_hdr_fh, 0;

    $curlp->setopt(CURLOPT_POST, 1);
    my $url = 'http://xxx.yyy.zzz:82';
    $curlp->setopt(CURLOPT_URL, $url);

    my %args = (method => 'send_keep_alive');
    my $rp='';
    while (my ($key, $value) = each %args) {
        $rp .= uri_escape_utf8($key) . '=' . uri_escape_utf8($value) . '&'
    }
    chop $rp;
    $curlp->setopt(CURLOPT_POSTFIELDS, $rp);
    my @hdrs = ('Expect:',
                'Content-Type: application/x-www-form-urlencoded;charset=utf-8');
    $curlp->setopt(CURLOPT_HTTPHEADER, \@hdrs);
    my $response_body;
    open (my $fileb, "+>", \$response_body);
    $curlp->setopt(CURLOPT_WRITEDATA, $fileb);
    my $retcode = $curlp->perform;
    my $response_code =  $curlp->getinfo(CURLINFO_HTTP_CODE);
    if ($retcode != 0 || ($response_code != 200 && $response_code != 304)) {
        die "Failed to get url - $response_code, ", $curlp->errbuf;
    }
    #print $response_body, "\n";

and this results in a single write to the socket and hence no nagle \o/

Benchmarking again I now get:

Benchmark: timing 1000 iterations of curl, lwp...
curl: 17 wallclock secs ( 0.36 usr + 0.08 sys = 0.44 CPU) @ 2272.73/s (n=1000)
lwp: 22 wallclock secs ( 4.14 usr + 0.71 sys = 4.85 CPU) @ 206.19/s (n=1000)

Finally there was the issue of cookies. In LWP using and storing cookies to a file is simple:

    my $ua = LWP::UserAgent->new;

    $ua->env_proxy;
    my $cookie_jar = HTTP::Cookies->new(
        file     => 'cookie_file'},
        autosave => 1,
       );
    $ua->cookie_jar($cookie_jar);

As I'm using the same LWP::UserAgent for all GET and POST requests and they are to the same domain I can issue a POST to login and follow it with a GET (which needs the cookie) and all is well, LWP just does the right thing. However, with Curl I'm using a separate WWW::Curl::Easy object for GETing and POSTing because I never found a way of resetting the object method so the cookie obtained in the POST is not available in the following GETs. Fortunately, Curl has a way of sharing cookies so I ended up with something like the following in my own wrapper:

        $self->{protocol_share} = XXX:Protocol::Curl::create_share();
        $self->{protocolg} = XXX::Protocol::Curl->new(
            cookie_file => 'cookie_file',
            share =>$self->{protocol_share});
        $self->{protocolp} = XXX::Protocol::Curl->new(
            cookie_file => 'cookie_file',
            share => $self->{protocol_share});

and a XXX::Protocol::Curl's as something like:

Package XXX::Protocol::Curl;

sub create_share {
    my $curlsh = WWW::Curl::Share->new();
    $curlsh->setopt(CURLSHOPT_SHARE, CURL_LOCK_DATA_COOKIE);
    return $curlsh;

}
sub new {
    my ($class, %attrs) = @_;

    croak ("Need a cookie file") if !exists($attrs{cookie_file});

    my $self = {};

    $self->{curl} = WWW::Curl::Easy->new();
    if (exists ($attrs{share})) {
        $self->{curl}->setopt(CURLOPT_SHARE, $attrs{share});

    }

    $self->{curl}->setopt(CURLOPT_NOPROGRESS, 1); # shut off the built-in progress meter
    $self->{curl}->setopt(CURLOPT_HEADER, 0); # don't include hdr in body
    $self->{curl}->setopt(CURLOPT_ENCODING, 'gzip');
    $self->{curl}->setopt(CURLOPT_PROXY, ''); # no proxy
    my $curl_hdr = '';
    $self->{curl_hdr_fh} = undef;
    open $self->{curl_hdr_fh}, "+>", \$curl_hdr;
    $self->{curl}->setopt(CURLOPT_WRITEHEADER, $self->{curl_hdr_fh});
    $self->{curl}->setopt(CURLOPT_COOKIEFILE, $attrs{cookie_file});
    $self->{curl}->setopt(CURLOPT_COOKIEJAR, $attrs{cookie_file});

    bless ($self, $class);
}

sub ourpost {
    my ($self, $url, $postargs) = @_;

    my $hdr_fh = $self->{curl_hdr_fh};

    seek $hdr_fh, 0, 0;
    truncate $hdr_fh, 0;

    $self->{curl}->setopt(CURLOPT_POST, 1); # we are POSTing
    $self->{curl}->setopt(CURLOPT_URL, $url);

    my $postdata = '';
    while (my ($key, $val) = each %$postargs) {
        $postdata .= uri_escape_utf8($key) . '=' uri_escape_utf8($val) . '&';
    }
    chop $postdata;
    $self->{curl}->setopt(CURLOPT_POSTFIELDS, $postdata);

    $self->{curl}->setopt(
        CURLOPT_HTTPHEADER,
        ['Expect:',
         'Content-Type: application/x-www-form-urlencoded;charset=utf-8']);

    my $response_body;
    open(my $fileb, "+>", \$response_body);
    $self->{curl}->setopt(CURLOPT_WRITEDATA, $fileb);

    my $ret = $self->{curl}->perform;

    my $response_code =  $self->{curl}->getinfo(CURLINFO_HTTP_CODE);

    if ($ret != 0) {
        confess("Failed to get ", $url, ", ", $response_code, ", ",
            $self->{curl}->errbuf);
    }

    # get the headers
    seek $hdr_fh, 0, 0;
    my @rhdrs;
    while (<$hdr_fh>) {
        #print "$_";
        chomp;
        if (m/^([^:]+):(.*)$/) {
            push @rhdrs, $1, $2;
        }
    }

    my $response = HTTP::Response->new(
        $response_code, $self->{curl}->errbuf, \@rhdrs, $response_body);

    return $response;
}

So in addition to getting a speedup from using libcurl my application code hardly changes since I return an HTTP::Response for both LWP and Curl.

Now when I profile the code using Curl the HTTP parts barely register and my application is loads faster.

Update I saw in the Perl weekly at September 19, 2011 Gabor Szabo said "The code is more complex though and I am not sure if being faster on the CPU really matters. After all these both solve network intensive tasks. Waiting for the HTTP request still takes a lot more time than the CPU usage". I would have liked to have answered that but there does not seem to be a way to comment on the Perl weekly. The first thing is I don't know of any computers which only run one process so the less CPU time your process uses the more is available to the other processes. The second thing is that WWW::Curl::Easy is faster in real time than LWP; if you don't think 2000 GETs in 3s instead of 9s is an improvement you need then you don't need to bother using Curl, I do want that improvement.

Update2 When I ran this code with our actuall application the POSTs were still taking longer than LWP even though my test code is the other way around. At the moment I'm still setting CURLOPT_FORBID_REUSE.

Comments

Test repeated with new modules

Since I wrote this a few new modules have been brought to my attention e.g., Hijk and HTTP::Tiny. I thought it was about time I revisited this (and Mr Bunce prodded me a little too) as if there is anything faster than our current Curl solution it is fairly easy to add. However we need GET and POST support and cookie support which I think makes using Hijk a little more difficult.

For now I haven't looked into cookie support or POST but here are my results for 2000 GETs for a JSON encoded file of 335114 bytes.

The lwpghhtp is LWP::Protocol::implementor('http', 'LWP::Protocol::GHTTP')

The http_tiny is actually the one from https://github.com/clintongormley/p5-http-tiny with keep_alive => 1 set (Tim asked for). BTW, in this test keep_alive => 1 make no difference so I hope I was doing it right (but see below):

my $http_tiny = HTTP::Tiny->new(keep_alive => 1);
timethis($its, 'http_tiny()');

sub http_tiny {
    my $response = $http_tiny->get($url);
}

lwp       2000: 23 wallclock secs (16.54 usr +  4.66 sys = 21.20 CPU) @ 94.34/s (n=2000)
lwpghttp  2000: 16 wallclock secs ( 6.48 usr +  7.17 sys = 13.65 CPU) @ 146.52/s (n=2000)
ghttp     2000:  9 wallclock secs ( 1.72 usr +  3.38 sys =  5.10 CPU) @ 392.16/s (n=2000)
curl      2000:  5 wallclock secs ( 1.02 usr +  2.83 sys =  3.85 CPU) @ 519.48/s (n=2000)
furl      2000: 10 wallclock secs ( 4.56 usr +  2.01 sys =  6.57 CPU) @ 304.41/s (n=2000)
hijk      2000:  7 wallclock secs ( 3.23 usr +  1.79 sys =  5.02 CPU) @ 398.41/s (n=2000)
http_tiny 2000: 13 wallclock secs ( 6.72 usr +  3.90 sys = 10.62 CPU) @ 188.32/s (n=2000)

Versions used were:

LWP::UserAgent 6.04
HTTP::GHTTP 1.07
WWW::Curl::Easy 4.15
Furl 3.01
Hijk 0.12
HTTP::Tiny 0.017

It was suggested I used Dumbbench so here are the results for the same test except I didn't run lwpghttp and added HTTP::Tiny with/without keep_alive = 1.

LWP: Ran 2551 iterations (533 outliers).
LWP: Rounded run time per iteration: 1.07349e-02 +/- 2.7e-06 (0.0%)
GHTTP: Ran 2811 iterations (793 outliers).
GHTTP: Rounded run time per iteration: 4.11825e-03 +/- 6.2e-07 (0.0%)
WWW::Curl::Easy: Ran 2951 iterations (871 outliers).
WWW::Curl::Easy: Rounded run time per iteration: 2.45655e-03 +/- 9.5e-07 (0.0%)
Furl: Ran 2315 iterations (239 outliers).
Furl: Rounded run time per iteration: 4.6248e-03 +/- 7.2e-06 (0.2%)
hijk: Ran 3098 iterations (1068 outliers).
hijk: Rounded run time per iteration: 3.17664e-03 +/- 7.7e-07 (0.0%)
HTTP::Tiny keep_alive: Ran 2551 iterations (521 outliers).
HTTP::Tiny keep_alive: Rounded run time per iteration: 4.74813e-03 +/- 8.4e-07 (0.0%)
HTTP::Tiny: Ran 2678 iterations (637 outliers).
HTTP::Tiny: Rounded run time per iteration: 4.8193e-03 +/- 1.1e-06 (0.0%)

HTTP::Tiny uses Dist::Zilla which for various reasons I cannot install on this machine so I just ran the test code with perl -I/path/to/lib in the git repo. I know I was running this one as $HTTP::TINY::VERSION was undefined and warn $INC{'HTTP/Tiny.pm'}; output the correct path.

I did verify I was getting into the keep_alive code in HTTP::Tiny:

    if ( !$drop_connection && $self->{keep_alive} ) {
        my $connection = $response->{headers}{connection} || '';
        #print "KEEP $connection\n"; # <------------ this printed KEEP keep-alive
        if ( $connection eq 'keep-alive'
            or ( $response->{protocol} eq 'HTTP/1.1' and $connection ne 'close' )
            )
        {
            $self->{handle} = $handle;
        }
        else {
            $handle->close;
        }

I assumed keep_alive in the pull request for HTTP::Tiny was HTTP "Connection: Keep-Alive" but I could not see that in the headers sent. It seems it is more like don't close the socket. I think HTTP 1.1 assumes keep alive. I am using HTTP 1.1 in nginx with "keepalive_timeout 65" and "tcp_nodelay on".

HTTP Benchmarking

Clinton mailed this to me because he could not register on this site because I disabled registration when I was getting loads of spam. Clintons email:

I can't see anything obviously wrong with the keep-alive test on HTTP::Tiny, but I haven't looked too closely there. A comment on the benchmarking in general: you're testing the download of a 300kB file, but for a lot of use cases, this isn't the typical load.

For instance, with Elasticsearch.pm, users will be both sending and receiving, and often much smaller files than 300kB. I currently support an LWP, HTTP::Tiny and NetCurl backend for Elasticsearch, and I've just thrown together a backend for Hijk as well:

* LWP: https://github.com/elasticsearch/elasticsearch-perl/blob/master/lib/Elas...
* HTTP::Tiny https://github.com/elasticsearch/elasticsearch-perl/blob/master/lib/Elas...
* NetCurl https://github.com/elasticsearch/elasticsearch-perl/blob/master/lib/Elas...
* Hijk https://github.com/elasticsearch/elasticsearch-perl/blob/hijk/lib/Elasti...

I ran read and write tests with different body sizes against a local Elasticsearch instance (10,000 read & 10,000 write, run 3 times, all with persistent connections), with these results: https://gist.github.com/clintongormley/8980968

Hijk seems to do very well on these tests, pipping NetCurl at the post on all body sizes. I have no idea what black magic the authors have employed to make this work :) What I don't understand is why LWP is so slow. When I NYTProf'ed it a year ago, it looked like it was spending a lot of time parsing headers for some reason, but I didn't delve any further.

I also compared my fork of HTTP::Tiny with keepalive on and off. (The tests above were run with my fork and keepalive on). I tested this on a Mac, which runs out of sockets pretty quickly if you don't use persistent connections, so this test was run with 1,000 read/write requests run 10 times: https://gist.github.com/clintongormley/8980882

The benchmarking code is here: https://gist.github.com/clintongormley/8980988

Anything I'm missing here?

thanks

Clint