Zientzilaria

Covering the bioinformatics niche and much more

Fasta Module: Generating Reverse Complement of DNA Sequences

| Comments

As shown in the GenBank DNA parser script, it is really useful to have the ability to get the reverse complement of some DNA sequences. The reverse complement of a 5’-3’ DNA sequence is on its complementary strand. Using our fasta module it is easy to implement a function to generate the antiparallel sequence Basically we can do this in two step, one – obtaining the complement and two – reversing it.

1
2
3
4
5
6
7
8
9
10
11
12
13
def complement(seq):
  abuffer = []
  for i in seq:
      if i == 'A':
          abuffer.append('T')
      elif i == 'C':
          abuffer.append('G')
      elif i == 'T':
          abuffer.append('A')
      elif i == 'G':
          abuffer.append('C')

  return abuffer

That’s not a Pythonic approach, but it works reasonably well for short sequences. What would be a Pythonic approach to it? Using dictionaries and list comprehension. From the Python online documentation: List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs. Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it.

It is basically a specific way to modify lists using a loop and if clauses when needed. In a couple of lines we can do what we accomplished in 10 lines with the above excerpt. First we need a dictionary to set how the nucleotides are related

1
2
complement = {'A':
'T', 'C': 'G', 'G': 'C', 'T': 'A'}

and then a list comprehension to modify each nucleotide

1
complseq = [complement[base] for base in seq]

The list comprehension means: for each base in the sequence get the dictionary value for each key (a nucleotide in the initial sequence). The complete function would be

1
2
3
def complement(seq): complement = {'A': 'T', 'C': 'G',
'G': 'C', 'T': 'A'} complseq = [complement[base] for base in seq] return
complseq

One step solved. We are already able to get the complement, now we need to reverse it. Simple. A Python list method automatically reverts list elements. Basically what we need then is below

1
2
def reverse_complement(seq): seq =
list(seq) seq.reverse() return ''.join(complement(seq))

The call from any script to this function would be simply be

1
fasta.reverse_complement('AACCGGTTTTGGCCAA')

Batteries included, indeed.

more can be found here. that served as an inspiration for the function