There should be more than one way to do it in Python

Python has a philosophy of ‘There should be one– and preferably only one –obvious way to do it’ (http://www.python.org/dev/peps/pep-0020/) rather then Perl’s ‘There’s more than one way to do it’ (Programming Perl Third Edition, Larry Wall et al.). This is great, in theory – it leads to greater consistency between disparate programs and makes it easier for individual programmers to pick up someone else’s code.

The problem comes where the obvious way is not, for whatever reason, the practical way. For example, the obvious way to test if a string begins with another string is to use "string1".startswith("string2"). This is, however, significantly less performant than doing "string1"[7:] == "string2" which means when doing a large number of these tests you have to use this form, despite it not being the obvious method.

Unless you are familiar with Python’s sequence slicing syntax (http://docs.python.org/library/stdtypes.html#typesseq) I do not find "string1"[7:] to be obvious (even though I would expect most programmers to be able to hazard a, probably correct, guess to what it does), which means I would precede that line with a # Check if string1 starts with "string2". When I feel the need to comment on a specific line of code it is usually because I do not think what it is doing is sufficiently obvious, which means it violates Python’s ‘There should be one– and preferably only one –obvious way to do it’.

3 thoughts on “There should be more than one way to do it in Python”

  1. Pretty sure you mean “string1″[:7]. The irony.

    Another way of handling this would be to write a starts_with function and call that; you could then document the reason in that function, away from the uses of it.

  2. Yes I did, which kinda proves the point that it’s not obvious unless you are familiar with Python (I just re-read the doc I linked to which describes the string slicing syntax and was not 100% sure myself which way round it was untill I tested it!).

    Your suggestion (which I will have to try from a performance testing point of view) brings up another interesting ‘pythonic’ quirk – Python does not allow manipulating its inbuilt objects (where as Ruby does, for example) so I cannot simply add a new method to ‘str’ (called, say, ‘faststartswith’). I can see why this is the case – the idea of changing the implementation of a core language object makes me shudder and would be fraught with danger if something critical was “tweaked”.

    The point is, we have just discussed 3 different ways of performing the same task. One is clearly the pythonic “obvious” way (str.startswith inbuilt) but, for performance reasons, maybe undesirable – certainly in cases where I’m crunching data and need to check of the order of thousands or millions of strings. The other two are, in my opinion, perfectly acceptible solutions but the mear fact of their existance and desirability in this use case flys in the face of Python’s “There should be one […] obvious way to do it”. In summary; I think this philosophy is wrong.

  3. I’ve done a quick test (code below) which shows that adding another method which wraps the slicing method in a generic way is even less performant than using the inbuilt str.startswith method. I can’t say I’m surprised.It looks to me that Python’s method calling overheads are quite significant.

    The output was:

    > python -mcProfile test.py /tmp
    Running test startswith
    469
    Running test slicing
    469
    Running test method
    469
    slicing: 6s
    startswith: 14s
    method: 19s
    __main__: 40s
    63384348 function calls in 40.119 seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 6.369 6.369 6.369 6.369 test.py:17(slicing)
    21128110 7.735 0.000 9.606 0.000 test.py:29(starts_with)
    1 0.001 0.001 40.119 40.119 test.py:3()
    1 9.688 9.688 19.294 19.294 test.py:32(method)
    1 9.049 9.049 14.455 14.455 test.py:9(startswith)
    21128111 1.871 0.000 1.871 0.000 {len}
    1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
    1 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects}
    21128110 5.406 0.000 5.406 0.000 {method 'startswith' of 'str' objects}
    3 0.000 0.000 0.000 0.000 {open}
    8 0.000 0.000 0.000 0.000 {time.time}

    The code is:


    #!/usr/bin/env python

    from time import time

    FILE="/Volumes/Macintosh HD/tmp/huge_mbox_file"
    SEARCH_STRING="From "

    def startswith():
    count=0
    with open(FILE) as f:
    for line in f:
    if line.startswith(SEARCH_STRING):
    count = count + 1
    print count

    def slicing():
    count=0
    # Yes, this is an optimisation to avoid calling len > 21million times
    # HOWEVER if we were using slicing in a tight loop for a specific search
    # string the length would be known and could be hard coded.
    search_len=len(SEARCH_STRING)
    with open(FILE) as f:
    for line in f:
    if line[:search_len] == SEARCH_STRING:
    count = count + 1
    print count

    def starts_with(haystack, needle):
    return haystack[:len(needle)] == needle

    def method():
    count=0
    with open(FILE) as f:
    for line in f:
    if starts_with(line, SEARCH_STRING):
    count = count + 1
    print count

    if __name__ == '__main__':
    starttimes={}
    endtimes={}

    starttimes['__main__']=time()

    for method_ in (startswith, slicing, method):
    print "Running test %s" % method_.__name__
    starttimes[method_.__name__]=time()
    method_()
    endtimes[method_.__name__]=time()

    endtimes['__main__']=time()

    for key in starttimes.keys():
    print "%10s: %ds" % (key, endtimes[key] - starttimes[key])

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>