After a long hiatus we are (almost) back on track in order to get our scripts to determine overrepresented motifs in DNA sequences. Last time we checked we defined the “best” factorial function in Python
1 2 3 4 5 

and Andrew Dalke sent a couple of links pointing out to a binomial calculation function, one of them is below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

There is a similar implementation in the Python Cookbook online and it is clearly more convenient using this function than actually coding to calculate an identical value using factorials. But anyway, let’s see how a function that calculates one binomial coefficient from the Hypergeometric Distribution (HD) by using the factorial function. Later we benchmark both methods. Each binomial coefficient can be expanded as !%7D) and the HD has three of them. From the Wikipedia “In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement.” In the next post we will define each term of the HD for the motifs case. In each binomial expansion we would have to calculate three factorial values, nine in total. With the binomial function, only three values need to be calculated. So, using the factorial function we would need to code something like this in order to calculate one of the binomials
1 2 3 4 5 6 7 

Next, we will benchmark and see if there is an advantage to either method.