We will take a break on developing the statistical module to obtain overrepresented motifs (I will introduce mul in the next stats post), and take a deeper look at the possibilities on obtaining the motif quorums. Mike DeHaemer, a regular commenter and contributor to the blog, sent me a Python script with 8 different ways distributed in 13 distinct functions for obtaining the motif quorums.
I will take advantage of his contribution and post all of them, with some quick comments on each one of them (his code comments were kept in each function). After, a small benchmarking will be posted. Most of the functions need to import a couple of module
1 2 

and they have two
parameters, a sequence list and the length of the motifs. The first
function uses again the defaultdict
and it is very similar to the one
used in the final version of the quorum script. The defaultdict
is
initialized as a list and the ids are added to a the list, keys are
motifs, only if they are not already present in it. The sequence id is
generated in a variable incremented each time the loop iterates.
1 2 3 4 5 6 7 8 9 10 11 12 13 

The second function is very similar to the
first one, with the caveat that sequence id numbers are generated with
enumerate
.
1 2 3 4 5 6 7 8 9 10 11 12 

enumerate
is a object based on another iterable object. When called
enumerate
always returns a tuple of an indexed series. For instance,
in our case above, enumerate will return a series of tuples
(0, sequence1), (1, sequence2) ... (n, sequenceN)
. That’s the reason
the enumerate loop uses a tuple as its index
1


Next couple of posts will cover the other functions sent by Mike. Then we will go back to the statistical module.