Let’s get back to the statistical module, that will calculate an Hypergeometric Distribution (HD) p value so we can define the overrepresented motifs. Last time we saw it, we just had defined the factorial function, which is immensely helpful in this case due to the number of factorial calculations needed in the HD. The factorial function was the one below
1 2 

but as mentioned in the comments by
Dave
and by Mike via email the method used is not the best method to
calculate factorial in
Python. The best
approach in this case is to use operator.mul
. All functions in the
operator modules are in implemented in pure C and they mimic the same
operators in Python. So in this module we can find mul
for
multiplication, sub
for subtraction, add
for additions, etc. The
operator.mul
needs two arguments to multiply, and in our case we still
need to use reduce
to sum all the results from a series of
multiplications. As parameters we should use a range
, that can start
with 2, that should go up to the number we want the factorial plus one.
Finally our function would be
1 2 3 4 5 

The time gain, quickly measured in a nonscientific fashion in my system, is around 5 to 15%, depending on the factorial being calculated. It may seem a small gain, but when you need to calculate almost a million factorials for all possible motifs the amount of time saved is crucial. Next time we will be back with more statistics, expanding the module.