Zientzilaria

Covering the bioinformatics niche and much more

Sequences and Strings - Part II

| Comments

Another important task for many biologists is to merge/concatenate different strings of DNA in one unique sequence. We can modify the previous script to concatenate two distinct DNA sequences in one. We start using code_01 structure, adding some extra elements (line 3):

1
2
3
4
#! /usr/bin/env python 
myDNA = 'ACGTACGTACGTACGTACGTACGT'
myDNA2 = "TCGATCGATCGATCGATCGA"
print myDNA, myDNA2

So far, we added a new string containing an extra DNA sequence and we print both sequences. In Python print statement automatically adds a new line at the end of the string to be printed, unless you add a comma (,) to the end. The comma is also needed if you are going to print more than one string in order to separate them (try removing the comma from the code above). Now, how do we merge myDNA and myDNA2? Easy in Python: just sum them with a plus signal:

1
2
myDNA3 = myDNA + myDNA2
print myDNA3

Notice that in Python strings are immutable, meaning they cannot be changed. This immutability confers some advantages to the code where strings (in Python strings are not variables) cannot be modified anywhere in the program and also allowing some performance gain in the interpreter. So, in order to have our sequences merged we create a third sequence that receives both strings. Finally our code will be (some captions were added):

1
2
3
4
5
6
7
8
#! /usr/bin/env
python myDNA = "ACGTACGTACGTACGTACGTACGT"
myDNA2 = "TCGATCGATCGATCGATCGA"
print "First and Second sequences"
print myDNA, myDNA2
myDNA3 = myDNA + myDNA2
print "Concatenated sequence"
print myDNA3

Easy, eh? Of course these two simple scripts do no scratch the surface of Python programming, but they are a start.