Zientzilaria

Covering the bioinformatics niche and much more

Obtaining Overrepresented Motifs in DNA Sequences, Part 7

| Comments

Continuing on Mike’s functions to obtain motif quorums. We see function 3, 4 and 5. Function get_quorums_03, uses an old friend of the blog, sets. Recall that sets are very similar to lists, but their are unordered and items are unique.

1
2
3
4
5
6
7
8
9
10
11
12
def get_quorums_03(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(set)
    id_no = 0
    for seq in seqs:
        id_no += 1
        for n in range(len(seq)-mlen):
            quorum[seq[n:n+mlen]].add(id_no)
    return quorum

Basically, the sequence numbers (an incremented counter) are added to a defaultdict which was initialized as a set. This way you don’t need to check for the existence of the sequence number in the defaultdict list and count on the ability of set of being unique. Function 4 is very similar to function 3 with the difference of using enumerate (as in function 02) to make the sequence numbers.

1
2
3
4
5
6
7
8
9
10
def get_quorums_04(seqs, mlen):
    """
    add seq id_no to a set
    use 'enumerate' to create seq_no
    """
    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for n in range(len(seq)-mlen):
            quorum[seq[n:n+mlen]].add(id_no)
    return quorum

Function 5 adds a twist, which is to have an enumerate to set the sequence range (motif/word width) start and stop. This way the window is sliding based on the tuple created by the enumerate method and not on the slicing that were used in all other functions. Again, a defaultdict is initialized as set and the sequence numbers are generated by an enumerate.

1
2
3
4
5
6
7
8
9
10
11
def get_quorums_05(seqs, mlen):
    """
    add seq id_no to a set
    use 'enumerate' to create seq_no
    use enumerate(range(...)) to create start/stop indices for motif
    """
    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for s, e in enumerate(range(mlen, len(seq))):
            quorum[seq[s:e]].add(id_no)
    return quorum