Often, we rely on our old habits. We get comfortable and have a tendency to do things the same old way. Same thing happens when you’re programming. But a day will come when you’ll ask yourself, is this the fastest way to perform this task ? And when this happens to you (and if the given task is in Python), you’ll be glad that a package like
timeit exist. Sure there are other ways to organize timing contest in Python. With the package time for example, you can start by setting a
t0 = time.time(), perform your task and then print the elapsed time
print time.time()-t0. I use this all the time.
timeit makes it simple to test small chunks of code and is callable from the command line.
Following the example in this article from Xiaonuo Gantan on PythonCentral, I’ve been able to see that list comprehension is still the fastest to replace characters in a list of strings.
Actually, I’m using a pandas dataframe with really messy column names. I want to replace all the weird characters in them because I need to convert my pandas dataframe to a R dataframe. R doesn’t like weird symbols in column names. I’ve always been using list comprehension to do this but I recently saw that pandas has a
map function. I was wondering if the
map function would be faster. So, here’s my test :
import re def wrapper(func, *args, **kwargs): def wrapped(): return func(*args, **kwargs) return wrapped def f1 (l) : # Using regular expression return [re.sub(r'=', 'eq', x, flags=re.IGNORECASE) for x in list(l)] def f2 (l) : # Using map function with a conditional check for unicode or string return l.map(lambda x: x.replace('=', 'eq') if isinstance(x, (str, unicode)) else x) def f3 (l) : # Using map function without the check return l.map(lambda x: x.replace('=', 'eq')) def f4 (l) : # Using list comprehension return [x.replace('=', 'eq') for x in list(l)] def f5 (l) : # Using a for loop c =  for e in l : c.append(e.replace('=','eq')) return c fs = [f1,f2,f3,f4,f5] for f in fs : wrapped = wrapper(f,df.columns) print '%s : %.3f sec for 10000 iterations ' % (f.func_name,timeit.timeit(wrapped, number=10000))
Here’s the output :
f2 : 0.927 sec for 10000 iterations
f3 : 0.607 sec for 10000 iterations
f4 : 0.358 sec for 10000 iterations
f5 : 0.478 sec for 10000 iterations
And the winner is… f4!