Last time we saw how to use a set intersection to check for clusters of DNA/protein sequences and their genome intersections. We going to use the same example but this time we will see how we can change it and create a function that would calculate an arbitrary number of intersections at a time and also be able to check the intersection of more than two sets. Our previous code to calculate intersections was
1 2 3 4 5 6 7 8 |
|
and that’s does not gives us the most important
piece of information: the intersection of A, B and C. Let’s see how we
can do it. Python has a function that can help us this time: reduce
.
This function allows to reduce a series of values to a single one, by
employing a function defined in its parameters. From the file tutorial
1
|
|
Basically reduce
will make the function
modify the values of a iterable (i.e. list). In our case reduce
helps
us calculate the intersection of many sets at once (our sets are genA
,
genB
and genC
)
1
|
|
where reduce
is applying the Set
method
intersection to a list of sets passed to the function. Python’s reduce
might not be available in 3.0, but it is worth knowing. Now that we have
a Pythonic way of checking the intersections we need to put reduce
in
a function, to make our code more elegant.
1 2 |
|
Done. In this function reduce
will operate by
first checking the intersection between my_set1
and my_set2
and
storing it internally, then checking for the intersection between the
previous result and my_set3
. But wait … get_intersection
receives
three arguments, so it will be able to return the intersection only when
three sets are sent, thus we would have to create another function that
can receive two items in order to calculate the intersection between two
items. And if we have more than three genomes to compare … There is a
trick.
If we define our function receiving only one argument and put a
star/asterisk on it, this means that an arbitrary number of arguments
will be provided to this function in a tuple (and a tuple is iterable,
so it can be used in a reduce). Changing our function accordingly, we
will have
1 2 |
|
so we can send 2, 3, 4, 5 or n number of sets to this function and it will return the intersection among all of them. Our simple script would be (including the above function)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|