Following my last two posts (http://blog.entek.org.uk/?p=106 and http://blog.entek.org.uk/?p=112) I took some profilers to my codes.
First up my revised Python implementation:
> python -m cProfile checkmail2
...
16225093 function calls in 8.663 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8.663 8.663 :1()
1 0.000 0.000 0.000 0.000 UserDict.py:17(__getitem__)
1 5.906 5.906 8.662 8.662 checkmail2:3()
1 0.000 0.000 8.663 8.663 {execfile}
37 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
22 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
16224990 2.755 0.000 2.755 0.000 {method 'startswith' of 'str' objects}
37 0.001 0.000 0.001 0.000 {open}
1 0.000 0.000 0.000 0.000 {posix.listdir}
Quite clearly there is a huge number (16 million!) of calls to startswith which is the biggest time-sink outside the main script.
Comparing the Perl implementation:
> perl -d:DProf checkmail3
...
> dprofpp
Total Elapsed Time = 3.426896 Seconds
User+System Time = 3.401494 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
0.12 0.004 0.007 4 0.0010 0.0016 main::BEGIN
0.03 0.001 0.001 5 0.0002 0.0002 File::Basename::BEGIN
0.03 0.001 0.001 1 0.0009 0.0009 warnings::BEGIN
0.03 0.001 0.001 37 0.0000 0.0000 File::Basename::_strip_trailing_sep
0.03 0.001 0.001 37 0.0000 0.0000 File::Basename::fileparse
0.00 0.000 0.002 37 0.0000 0.0000 File::Basename::basename
0.00 0.000 0.000 1 0.0003 0.0003 File::Glob::doglob
0.00 0.000 0.000 1 0.0001 0.0001 DynaLoader::dl_load_file
0.00 0.000 0.000 1 0.0001 0.0003 XSLoader::load
0.00 0.000 0.000 1 0.0001 0.0001 File::Basename::fileparse_set_fstype
0.00 0.000 0.000 1 0.0001 0.0001 Exporter::import
0.00 0.000 0.000 2 0.0000 0.0000 warnings::import
0.00 0.000 0.000 3 0.0000 0.0000 strict::import
0.00 0.000 0.000 1 0.0000 0.0003 File::Glob::csh_glob
0.00 0.000 0.000 1 0.0000 0.0000 strict::bits
Ignoring the actual times which are not directly comparable due to the profiling overheads we can clearly see Perl is benefiting hugely from the inbuilt regex engine as there are 0 function calls associated with each line check.
I did replace the ‘str.startswith’ implementation of the Python script with a version which used ‘re’ regex objects, but this showed even worse performance:
> time python checkmail4
...
python checkmail4 8.42s user 0.33s system 99% cpu 8.765 total
Profiling this one we see the overhead of using ‘re.match’ was about double that of ‘str.startswith’ and, obviously, the number of calls remained the same. On top of this I introduced additional overhead of two calls to ‘re.compile’ at the start of the script, which the profiler showed incurred not insignificant function calls of their own:
> python -m cProfile checkmail4
...
16225312 function calls in 12.416 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.416 12.416 :1()
1 0.000 0.000 0.000 0.000 UserDict.py:17(__getitem__)
1 6.662 6.662 12.415 12.415 checkmail4:3()
2 0.000 0.000 0.000 0.000 re.py:186(compile)
2 0.000 0.000 0.000 0.000 re.py:227(_compile)
2 0.000 0.000 0.000 0.000 sre_compile.py:367(_compile_info)
2 0.000 0.000 0.000 0.000 sre_compile.py:38(_compile)
4 0.000 0.000 0.000 0.000 sre_compile.py:480(isstring)
2 0.000 0.000 0.000 0.000 sre_compile.py:486(_code)
2 0.000 0.000 0.000 0.000 sre_compile.py:501(compile)
15 0.000 0.000 0.000 0.000 sre_parse.py:144(append)
2 0.000 0.000 0.000 0.000 sre_parse.py:146(getwidth)
2 0.000 0.000 0.000 0.000 sre_parse.py:184(__init__)
21 0.000 0.000 0.000 0.000 sre_parse.py:188(__next)
2 0.000 0.000 0.000 0.000 sre_parse.py:201(match)
19 0.000 0.000 0.000 0.000 sre_parse.py:207(get)
2 0.000 0.000 0.000 0.000 sre_parse.py:307(_parse_sub)
2 0.000 0.000 0.000 0.000 sre_parse.py:385(_parse)
2 0.000 0.000 0.000 0.000 sre_parse.py:669(parse)
2 0.000 0.000 0.000 0.000 sre_parse.py:73(__init__)
2 0.000 0.000 0.000 0.000 sre_parse.py:96(__init__)
2 0.000 0.000 0.000 0.000 {_sre.compile}
16224990 5.751 0.000 5.751 0.000 {built-in method match}
1 0.000 0.000 12.416 12.416 {execfile}
6 0.000 0.000 0.000 0.000 {isinstance}
44 0.000 0.000 0.000 0.000 {len}
37 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
59 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
24 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
4 0.000 0.000 0.000 0.000 {min}
37 0.002 0.000 0.002 0.000 {open}
13 0.000 0.000 0.000 0.000 {ord}
1 0.000 0.000 0.000 0.000 {posix.listdir}
Quite clearly from the profiler output each call to either ‘str.startswith’ or ‘re.match’ use a very small amount of processor time (too small to be output) but the cumulative effect of 16 million calls is where the big slowdown was occurring. To get around this I tried implementing the ‘str.startswith’ version using string splicing (i.e. “line[:5] == ‘From ‘” rather than “line.startswith(‘From ‘)”) and the result was dramatic:
> time python checkmail4
...
python checkmail4 3.86s user 0.31s system 99% cpu 4.186 total
The profiler output for this version shows that the number of function calls is now on a par with the Perl implementation:
> python -m cProfile checkmail4
...
110 function calls in 4.311 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.311 4.311 :1()
1 0.000 0.000 0.000 0.000 UserDict.py:17(__getitem__)
1 4.308 4.308 4.311 4.311 checkmail4:3()
1 0.000 0.000 4.311 4.311 {execfile}
37 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
29 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
37 0.003 0.000 0.003 0.000 {open}
1 0.000 0.000 0.000 0.000 {posix.listdir}
This puts the Python version within 0.6s of the Perl version, which is close enough for me. Especially considering this is effectively comparing Perl to Python on Perl’s hometurf of text matching.
I think Perl would probably still out perform Python if I was wanting to do something more fancy involving regex substitutions but Python’s performance issues, in this case, seem to be purely down to function call overheads which Perl sidesteps by incorporating the regex engine into the core language.