Let’s improve our previous script and put the contents of the file in a
variable similar to an array. Python understands different formats of
compound data types, and list
is the most versatile. A list
in
Python can be assigned by a series of elements (or values) separated by
a comma and surrounded by square brackets
1
|
|
Now, we are going to read the same file and store the
DNA sequence in a list
and output this variable. The beginning of the
script is the same, where we basically tell Python that the file name is
AY162388.seq.
1 2 |
|
We are going to change the way we read the file. Instead of just opening and then reading line-by-line, we are going to open it a read all the lines at once, by using this
1
|
|
Notice the part in bold? In the previous
script, we open and store the contents of the file in a file object
.
Now , we are opening the file and just after it is opened, we are
reading all the lines of the file at once and storing them in
file object
, but is a Python’s list
of strings. Before, if we
wanted to manipulate our DNA sequence, we would had to read it, and then
in the loop store in a variable of our choice. In this script, we do
that all at once, and the result is a variable that we can change the
way we wanted. The code without the output part is
1 2 3 |
|
Try putting a print
statement after the last line to print the file
list. You will get
something like this
['GTGACTT...TTGTTC\n', 'TTTAAATA....TAATC\n', 'AGTGA...CTATG\n', 'GAGCTCA....TATAGC\n', ...]
which is exactly the description of a Python’s list. You see all lines,
separated by comma and surrounded by square brackets. Notice that each
line has a carriage return (\n
) symbol at the end. Let’s make
the output a little nicer including a loop. Remember when I introduced
loop I wrote that Python iterates over “items in a sequence of items”,
what is a good synonym for list
. So the loop should be as
straightforward as
1 2 |
|
Putting everything together gives us
1 2 3 4 5 6 |
|