Why does python's built in binary search function run so much faster?

16

(Already answered by sharth's comment.)

I've written a binary search algorithm in python, that more or less follows the same structure as the bisect_left function found in the bisect module. In fact it has a couple less conditionals as I know that the high point will be the length of the list and the low will be 0. Yet for some reason the built in function runs 5 times as fast as mine.

My code is as follows:

def bisection_search(word, t):

    high = len(t)
    low = 0

    while low < high:
        half = (high+low)/2
        if t[half] < word:
            low = half + 1
        else:
            high = half
    return low

The source code for the built in function is:

def bisect_left(a, x, lo=0, hi=None):
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    return lo

As you can see, virtually identical. However the timed out put for the my function (searching for the last term in an ordered list of 100,000 words) is -3.60012054443e-05, where as the built in achieves -6.91413879395e-06. What explains this difference?

In the source code there is a comment at the end that says "Overwrite above definitions with a fast C implementation" - is this what explains the difference? If so, how would I go about creating such a precompiled module?

Any advice is greatly appreciated.

advert2013

Posted 2014-02-22T01:27:06.720

Reputation: 2 171

1http://docs.python.org/2/extending/index.html – Ignacio Vazquez-Abrams – 2014-02-22T01:28:28.333

7is this what explains the difference? yes – roippi – 2014-02-22T01:29:05.443

1You can try a JIT compiler like PyPy - I believe your code will run considerably faster on it at the expense of the availability of some modules – Benjamin Gruenbaum – 2014-02-22T01:32:30.407

6

Here is the c code that makes up _bisect: http://hg.python.org/cpython/file/0199bff14c5c/Modules/_bisectmodule.c

– Bill Lynch – 2014-02-22T01:34:15.737

Thanks for your help, especially the links - I'm really surprised there is such a massive difference, that would add up really quickly if done on any sort of scale. – advert2013 – 2014-02-22T01:49:37.950

1Yeah, well-written C code is usually at least an order of magnitude faster than pure Python code. There's a lot of things involved in executing Python code, and all of them consume CPU time. – Max Noel – 2014-02-22T16:39:03.363

1"how would I go about creating such a precompiled module?" -- you could try Cython (Python-like language to create C extensions for CPython). pyximport allows to compile extensions on the fly (for development). – jfs – 2014-02-22T22:27:17.620

Of course, a poorly-written C implementation can run much slower than a Python implementation, so don't think of C as magical optimization dust to make your scripts faster. – Ignacio Vazquez-Abrams – 2014-02-23T01:31:54.257

and you need to provide pre-built, with the right compiler(s), versions of your extension for those environments that don't include the right compiler(s). – Steve Barnes – 2014-03-18T15:15:29.530

Answers

7

To summarise the remarks above so the question can be closed, the reason the built in module is faster is because the modules are precompiled in c. There are basically two options to attempt to replicate such performance, one is to use a JIT compiler like PyPy where the compilation is done at run time, the other is to compile your own modules in C, using Cython or some other variant to integrate the C code with python. The link from sharth above to the c code for bisect is particularly helpful and can be found here. Thanks again for all the help.

advert2013

Posted 2014-02-22T01:27:06.720

Reputation: 2 171

2Please include a general summary of the actual answer here for the sake of future readers, as your answer (the one I'm commenting on) currently isn't an answer. – Vulcan – 2014-02-22T22:29:14.383

Done. Yeah I wasn't sure what you are supposed to do when the question is answered in the comments, hope that summary is ok. I will close the question tomorrow when it lets me. – advert2013 – 2014-02-23T01:27:22.337