After a long time away, I am back. Things have been hectic in the lab and there is a shortage of free time to do everything else. Let’s get back to Python. Last time we have seen a dictionary structure that contained our genetic code. We are going to see a script that will translate DNA into proteins and we are also going to see how to create a module in Python that can be imported in any script. This module will contain the function that translates the DNA, and for the time being only that. Let’s see how our module looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
One function, called translate_dna
, that
receives a DNA sequence and outputs a protein sequence. It is a “long”
function because we have in the middle the genetic code in Python’s
dictionary format. Our translation loop is very simple, it reads the DNA
sequence three nucleotides at a time
1
|
|
means that it will loop from 0 to the size of the DNA sequence in steps of 3, so basically we start at 0 and then jumping directly to 3. This is done to obey the translation structure based on codons of three nucleotides. Sometimes the DNA sequence entered does not have a size multiple of three and that’s the reason we use an error checking before accessing the dictionary
1
|
|
This will test for any possible error, or codons that are smaller than three nucleotides. If the key exists it is returned and addes to our protein string. The code that will use this function is this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
No secrets or new things here. Just notice that we import a module which is not part of the
common Python modules, but was created by us. In this case the
identification in the import will be the name of the .py
file that
contains the function(s) we are going to use. This file also needs to be
located in the same directory of the script, if not installed in the
Python libraries/modules directory. Notice that to use the function we
need to call
1
|
|
as we would do with
the sys
or the re
modules. We can now create any modules that will
contain different functions that can be reused anytime without need of
extra coding. For instance, someone can create a module that would read
a FASTA file and return sequences and sequence names in string or a list
and send it to anyone interested in the same functionality. Everything
we would need to do is to have this file installed in Python or in the
same directory of our script and we would take advantage of all
functionality contained in the module. That easy.