teh bigbro blog(tm)
Bigbro's foray into the scary world of blogging
Thu, 29 Jun 2006
ApacheCon '06 : CGI to mod_perl 2.0, fast
Philippe Chiasson
After a comedy false start involving a picture of his new small child and the basics of how to install mod_perl 2.0, we got down to business.
mod_perl means you can save on CPU cycles, memory, money, time, effort (with Apache::* modules on CPAN) and probably a few other things too.
The CGI model:
- Forking
- Startup
- Teardown
...so this is quite expensive in terms of memory and CPU. 2 x forks, 2 x execs, perl startup and perl shutdown per page hit. There's got to be a better alternative.
mod_perl is embedded in the httpd server - so perl starts once, with the web server; and gets torn down once, when the webserver executes. Perl CGI scripts are called as perl functions, which is cheap. Let's look at how much faster this is using Apache Bench.
ab -c1 -n50 http://blah/cgi-bin/hello.pl
This gives us about 5.5 requests per second. That's not bad for the slow laptop the test is running on. But let's see if we get a performance improvement by installing and using mod_perl.
LoadModule perl_module modules/mod_perl.so
Alias /perlrun /var/www/cgi-bin
On apachectl restart, we get an extra message telling us the version of mod_perl that apache is linked with. Scripts run under mod_perl will have an extra environment variable, containing the version string of the mod_perl the script runs under.
For the exact same invocation of ab as above, we get about 83 requests per second on the exact same script.
mod_perl aims to be the closest alternative to mod_cgi - but if we are willing to stray slightly from CGI methods, we can use something like ModPerl::Registry. Let's benchmark that with the same script, using the same ab config line as listed above. 156 requests per second - because there's now a cache for your code.
Bugzilla will NOT run under Registry, but will run happily under mod_perl. Remember that mod_perl is NOT CGI, it's merely an emulation - so certain code constructs will not work.
How about something like the following?
#!/usr/bin/perl
use CGI;
my $q = new CGI;
print $q->header('text/plain');
$counter++;
print <<"EOF";
counted $counter
EOF
Since we don't initialise $counter, the mod_perl / Registry caches the $counter variable, so depending on which fork of apache you hit, you'll hit $counter, which will increment - so you'll get values greater than 1.
The simple solution to this is to use my $counter always.
Remember that perl treats everything as subroutines, which means that even if you use my to create a variable, but action that variable in a subrouting from script-global scope, you may get funny effects similar to the previous example. This is because Registry uses 'subification' to convert each script into a subroutine. This means that subs in your scripts will be subroutines of subroutines, which can create a closure. A warning sign might be "Variable $counter will not stay shared at..." in logs. This effectively turns the $counter variable into a global attached to the subroutine.
Best practices for making sure your scripts will work under PerlRun or Registry
- use strict
- use warnings
- avoid globals
- look for hints in your error_log file
Under mod_perl, for speed reasons, we don't change the directory, which means that relative requests for libraries will likely fail. So mod_perl doesn't fork and doesn't chdir. There's another reason for not changing directory! Apache uses threaded MPMs, and chdir is not thread safe (remember that cwd() is a process property, not a thread property.) To try and not break things, we try not to change directory if we can possibly help it. Fortunately, for file inclusions, we can fix this pretty easily:
- use lib();
- require "/fully/qualified/lib/path.pl";
- If you're using MPM Prefork, use:
- ModPerl::RegistryPrefork
- ModPerl::PerlRunPrefork
Another issue can occur with Requires. mod_perl caches Require - which means that the code is loaded from the required file only once. If raw code is in a library file, and requires initialisation, be sure to wrap the initialisation code in a subroutine - otherwise it will only get called once, for the first time it's loaded. Wrapping it in a subroutine and explicitly calling that will ensure that the initialisation code gets run every time.
More information available at:
posted at: 10:32 | path: /technical | permanent link to this entry

copyright © 2005-2008, Gareth Eason