teh bigbro blog(tm)
Bigbro's foray into the scary world of blogging

Thu, 29 Jun 2006

ApacheCon '06 : mod_perl for speed freaks

Philippe Chiasson

We've seen that mod_perl can make your CGI code run a lot faster, without making any changes to the source. Let's look at how we can make things even faster by making some code changes to optimise things.

First, use mod_status and ab for benchmarking. Comparing apples with apples is important. When you have mod_perl installed, you can use Apache2::Status, which is the mod_perl equivalent. No overhead and comes with the mod_perl package, so you already have it and there's no disadvantage to having it configured on a production server (assuming you don't allow the world to access it.)

A useful method of looking at memory is GTop for perl, which only works on Linux systems, but usually comes with Gnome.

use GTop;
my $gtop = GTop->new;

my $proc_mem= $gtop->proc_mem($$);
for( qw(size vsize share rss )) {
	printf "%s => %d\n", $_, $proc_mem->$_();
}

Perl sacrifices memory for speed - so lets take advantage of module pre-loading to get libraries loaded before forking, which means that all the children can use shared copies of the libraries (rather than each of them having to load the libraries themselves.) This works because the code in libraries rarely changes, so shared memory can be used (and the Copy on Write functionality will never be used.)
PerlModule CGI
PerlModule DBI
...

Perl was meant to be run as a quick script type language, which means it uses delayed initialisation and lazy instantiation where possible (in general.) This means that scripts run quickly, since initialisation only occurs when required, but not so good with mod_perl type stuff. Force initialisation at the start to get all the methods initialised and ready:
DBI->install_driver('mysql');
CGI->compile(':all');

You can use ModPerl::RegistryLoader() and call the handler($url, $filename) to pre-populate the Registry cache. Be very careful that the URL does match the file pointed at, otherwise you're putting dirty data in the cache and all bets are off. This means your script is in shared memory - so be careful, because if you change the code, every thread will have to reload the script and you'll have n copies of script in memory (where n is the number of threads running.) Of course, the obvious way of getting around this is to restart the server so that the cache is pre-populated with the updated code.
The cache is indexed by filename, so if using mod_rewrite, you might want to make multiple calls to the handler() with the multiple URLs.

use Foo Vs use Foo()
use POSIX; adds about 696k to our memory. It adds about 800 function calls.
use POSIX(); adds about 316k, because we tell it to not import things unless expliciitly specified.

use Apache2::Const
This will import multiple copies, since it puts a copy in every thread.
use Apache2::Const qw(OK DECLINED);
	return OK;

Whereas this will only have one copy, though you need to refer to the Apache2::Const namespace in your code.
use Apache2::Const -compile => qw(OK DECLINED)
	return Apache2::Const::OK

SetHandler perl-script means that lots of things are done for you, such as STDIN/STDOUT are tied, %ENV, @INC are saved/restored and %ENV changes are propogated. All of these things are done automatically, but if we don't need them (or are willing to work around them for performance gains) we can use SetHandler modperl which does none of these things. This means that it is NOT thread safe (since the environment is a per process value) and may leave HTTP environment variables on the environment. We have to be careful of security implications of this.

We can get much closer control over mod_perl than we have over CGI.
PerlOptions AutoLoad      # Default. Disabling means that you MUST preload everything.
PerlOptions -GlobalRequest
PerlOptions ParseHeader
PerlOptions SetupEnv      # Disable to give an almost empty environment.

KeepAlive Off
With mod_perl, you generally want to set KeepAlive off, since the default timeout of 15 seconds for keeping a connection up is very high. If using Prefork MPMs, this is not a problem, though you have to be careful of the thread-safedness of things that are linked into the scripts you might be running.

Memory Leaks
Perl was designed to run fast and then terminate, whereas mod_perl is a long-running process. Since some perl optimisations don't apply, all kinds of things can and most likely will 'leak' (although it's not a leak, perl knows it's there and will release it when it terminates - it's just that mod_perl might not terminate for a long time.)
Remember that perl passes by value, which means copies are made. This can cause memory requirements to grow very quickly. Passing by reference avoids that.

Apache2::SizeLimit
#startup.pl
use Apache2::SizeLimit;
$Apache2::SizeLimit::MAX_PROCESS_SIZE       = 12000;
$Apache2::SizeLimit::MIN_SHARE_SIZE         =  6000;
$Apache2::SizeLimit::MAX_UNSHARED_SIZE      =  5000;
$Apache2::SizeLimit::CHECK_EVERY_N_REQUESTS = 4;

# httpd.conf
PerlCleanupHandler Apache2::SizeLimit
More information on the sites mentioned in my earlier post on mod_perl.
posted at: 12:04 | path: /technical | permanent link to this entry


copyright © 2005-2008, Gareth Eason