Zientzilaria

Covering the bioinformatics niche and much more

Managing Your Biological Data With Python: A Review

| Comments

In an effort to bring this blog and website back from the realms of the internet archive only, I am trying to setup a series of quasi in-dept review of different Bioinformatics and programming books recently (or not so) published.

I would like to start the series with a title that caught my attention since I first saw it on Amazon: Managing your Biological Data with Python by Via, Rother and Tramontano, published by CRC Press. I had high hopes for this book and thought it would had become a good addition to my arms-reach bookshelf for programming problems on a daily basis. But I was wrong, very wrong.

Overall, it is a merely a mediocre book. The text feels rushed, and the code depicted seems careless designed, with no or barely none technical edition by the publishers and editors. There are many small mistakes in the listings, and sometimes approaches use different lines to obtain the same result without any warning from the authors. The quotes on the backcover mention the discussion of “Python-driven applications” for beginned and intermediate programmers of packages like PyMol and BioPython. The latter is one of the most comprehensive Python modules for Bioinformatics applications is only 40-plus pages long, I don’t think it’s enough even to introduce BioPython. Comparing to the classic (for me) Python for Bioinformatics, where more than half of its pages are dedicated to BioPython and you get an idea.

Some quick examples of the lack of care in preparing the book:

  • on page 64 the authors present a table with operators used in if conditions. Strangely a very Pythonic comparison is missing: A < variable < B.

  • on page 70 a code listing shows line[0:1] for list slicing. Next page similar listing shows line[0]. It is strange enough to slice a list in Python to get one item …

  • some code examples have wrong identation.

  • on page 85 there’s a dictionary search for a key. The has_key method is not used, introduced or mentioned.

  • when dealing with regexes, ‘^>’ is considered complex. Again, I understand that the book has a beginner demographic, but this regex is far from being complex.

  • easy_install is used in the book as the to-go method to add third-party libraries. But a 2014 book not even touching on pip is quite strange.

  • Matplotlib, one of the most complete and complex Python’s plotting libaries, deserves less than 12 pages.

  • There’s a Python-R chapter, but no Pandas.

  • There are 15 pages about Classes.

  • it’s 2014, and os.system() calls are still used in the book. There’s a discussion from 2008 that the method was going to be deprecated

And this is just a short list of things that really caught my eye when skimming through the book, after reading the first four or five chapters with some attention. Another extremely annoying aspect of the book is the line

Source: Adapted from code published by A.Via/K.Rother under the Python License

in every code listing. There are extreme examples, where a couple lines of code require the quote above. This is either a reason to believe that us scientists are too far removed from what happens in the world around us, or some worries authors that their groundbreaking Python code will be used somewhere else without attribution.

In conclusion, make yourself a favour and do not buy this book. But if you really want to buy it, my used-as-new copy is for sale.

PacktPub New Campaign

| Comments

PacktPub is back again on a new campaign: Level up. Until October 2nd all ebooks and videos are just $10 or less, and you can save more if you buy more:

  • Any 1 or 2 eBooks/Videos –$10 each

  • Any 3 to 5 eBooks/Videos – $8 each

  • Any 6 or more eBooks/Videos –$6 each

To take advantage of this promotion click here.

Alternative Way of Installing Circos on OS X

| Comments

On my last post about installing Circos on OS X, we have dealt with the traditional way of creating the required libraries for Circos to run. Most of the comments that post gets are related to difficulties of GD in getting some of the libraries correct (either PNG or JPEG). I have tested another way of installing these libraries and running Circos properly.

In the past, MacPorts was the main source of pre-compiled libraries and programs for OS X that could be installed directly from the CLI, something like Debian’s apt-get. But, at the same time, it was a nice tool to have, MacPorts had a lot of problems and sometimes couldn’t find the proper files, or even won’t complete the installation. Lately, Homebrew has been gaining a lot of ground replacing MacPorts and it’s a very easy tool to use. Homebrew has a list of recipes to download, compile and symlink software ohn OS X (you might have problems here and there, but it’s quite stable).

I used an OS X Virtual Machine to install Circos’ required libraries in order to avoid conflicts to my currently installed versions.

Starting the terminal, we first need to install Homebrew, just run:

1
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

We can also run

1
brew doctor

to check for problems, but you should be ready to go. We don’t need to install all libraries for GD, just make Homebrew install GD and all requirements will be fulfilled:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ brew install gd
==> Installing gd dependency: libpng
==> Downloading http://downloads.sf.net/project/libpng/libpng15/older-releases/1.5.14/libpng-1.5.14.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/libpng/1.5.14
==> make install
🍺  /usr/local/Cellar/libpng/1.5.14: 15 files, 1.0M, built in 14 seconds
==> Installing gd dependency: jpeg
==> Downloading http://www.ijg.org/files/jpegsrc.v8d.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/jpeg/8d
==> make install
🍺  /usr/local/Cellar/jpeg/8d: 18 files, 768K, built in 14 seconds
==> Installing gd
==> Downloading https://bitbucket.org/libgd/gd-libgd/downloads/libgd-2.1.0.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/gd/2.1.0 --with-png=/usr/local/opt/libpng --without-freetype --with-jpeg=/usr
==> make install
🍺  /usr/local/Cellar/gd/2.1.0: 32 files, 1.0M, built in 18 seconds

Check for all the other missing Perl modules, install them with cpan

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cpan$ install Config::General
-output omitted-

cpan$ install Graphics::ColorObject
-output omitted-

cpan$ install Math::Bezier
-output omitted-

cpan$ install Math::VecStat
-output omitted-

cpan$ install Readonly
-output omitted-

cpan$ install Set::IntSpan
-output omitted-

and install Perl’s GD interface manually

Installing GD on perl
1
2
3
4
5
6
srctemp$ curl -O http://www.cpan.org/authors/id/L/LD/LDS/GD-2.49.tar.gz (if curl fails copy and past on your browser)
srctemp$ tar -xzvf GD-2.49.tar.gz
srctemp$ cd GD-2.49
srctemp/GD-2.49$ perl Makefile.PL
srctemp/GD-2.49$ make
srctemp/GD-2.49$ sudo make install

and you should be good to go.

Instant Django 1.5 Application Development Starter, a Review

| Comments

I have been invited by PackPub to review their new Django-based book Instant Django 1.5 Application Development Starter. PackPub’s instant series of books is geared to give new users to the technologies a quick start on development os software usage. The books are short and to the point, a no-frills approach.

In this particular case, I think the book is very well organized, the examples are clear and the pace is correct and, overall, the book gets the intendend result, which is to give an introduction to Django development. The book is written by Mauro Rocco, who is Italian, and sometimes his background on Latin languages can be seen (like in this blog too), but it’s not detrimental to his English or the grammar and syntax in the book.

I haven’t been a Python web developer in my career, and I have started and stopped working with Django (and other simpler frameworks) many times. I always have a good reference at hand, but some are either too advanced for me to just check some basic stuff, or their examples and code are messy and all over the book. With this book, whenever I have to do my yearly web development in Python (and Django), I can use this a start. Coupled with a best practices and a more advanced compendium on the framework, you have a good set to work with. I highly recommend this book.

Circos Data Visualization How-to: A Review

| Comments

From a discussion topic on LinkedIn, I received from Packt Publishing a copy of Circos Data Visualization How-to, by Tom Schenk Jr. In the past couple of weeks I toyed with reading it, until I finally came up to read it all. This is a quite short book, topping at 54 pages of actual content. And as a short book it doesn’t give you a full perspective of Circos capabilities, but it’s a start.

The author is not related to the biological field, so he takes some time explaining part of the biological terminology used in Circos, somtehing that could have been skipped to people in the area. But this does not detract from the usefulness of this book. The chapters cover most part of Circos features, mostly using examples away from biology, which is somewhat useful, as it explains how to create your own diagrams and “chromosomes”. At the same time, maybe if the author had included a couple more chapters about some fine details of Circos, that would have been helpful in the long run.

The book is pleasantly written, there’s no fluff and the test is direct. But in the end you keep asking for more details and information, due to the abreviated size of it. As I said, it’s a good start, covering with somewhat more detail what you find in Circos website, but I definitely wanted a bit more.

Installing Circos on OS X

| Comments

I think one the main topics in the Circos discussion group is how to install it, especially as it requires some specific Perl and C libraries to be in place. This short tutorial gives a step-by-step overview on how to make Circos work on OS X, but it should be similar to other *nix flavours. I just installed on my Mountain Lion box, but, again, it should be identical to previous versions of OS X.

My Circos is installed on

Circos location
1
/Applications/circos-0.55/

and if we try running it

Circos trial run
1
$ perl /Applications/circos-0.55/bin/circos

I get this error, initially

Circos runtime error
1
2
3
4
5
6
7
8
9
10
Can't locate Config/General.pm in @INC (@INC contains: /Applications/circos-0.55/bin/lib /Applications/circos-0.55/bin/../lib
 /Applications/circos-0.55/bin /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12
 /Network/Library/Perl/5.12/darwin-thread-multi-2level /Network/Library/Perl/5.12
 /Library/Perl/Updates/5.12.4/darwin-thread-multi-2level /Library/Perl/Updates/5.12.4
 /System/Library/Perl/5.12/darwin-thread-multi-2level /System/Library/Perl/5.12
 /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level /System/Library/Perl/Extras/5.12 .) at
 /Applications/circos-0.55/bin/../lib/Circos.pm line 53.
BEGIN failed--compilation aborted at /Applications/circos-0.55/bin/../lib/Circos.pm line 53.
Compilation failed in require at /Applications/circos-0.55/bin/circos line 184.
BEGIN failed--compilation aborted at /Applications/circos-0.55/bin/circos line 184.

So, a lot of things going wrong, we need to check what is missing and install. Circos has a couple of commands bundled in its package that help us working through the errors. Best way to run them is to cd into Circos bin directory

Circos: checking modules
1
2
$ cd /Applications/circos-0.55/bin/
/Applications/circos-0.55/bin$ ./list.modules

In my case, running this script gave me a list of the required modules for Circos

Circos requirements
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Carp
Config::General
Data::Dumper
Digest::MD5
File::Basename
File::Spec::Functions
FindBin
GD
GD::Polyline
Getopt::Long
Graphics::ColorObject
IO::File
List::MoreUtils
List::Util
Math::Bezier
Math::BigFloat
Math::Round
Math::VecStat
Memoize
POSIX
Params::Validate
Pod::Usage
Readonly
Regexp::Common
Set::IntSpan
Storable
Time::HiRes

Another script checks for the current status of each module (still from the same dir)

Circos checking modules
1
/Applications/circos-0.55/bin$ ./test.modules

and this finally gives me a list of the current status of each one of the required modules

Circos: requirements
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
ok   Carp
fail Config::General is not usable (it or a sub-module is missing)
ok   Data::Dumper
ok   Digest::MD5
ok   File::Basename
ok   File::Spec::Functions
ok   FindBin
fail GD is not usable (it or a sub-module is missing)
fail GD::Polyline is not usable (it or a sub-module is missing)
ok   Getopt::Long
fail Graphics::ColorObject is not usable (it or a sub-module is missing)
ok   IO::File
ok   List::MoreUtils
ok   List::Util
fail Math::Bezier is not usable (it or a sub-module is missing)
ok   Math::BigFloat
ok   Math::Round
fail Math::VecStat is not usable (it or a sub-module is missing)
ok   Memoize
ok   POSIX
ok   Params::Validate
ok   Pod::Usage
fail Readonly is not usable (it or a sub-module is missing)
ok   Regexp::Common
fail Set::IntSpan is not usable (it or a sub-module is missing)
ok   Storable
ok   Time::HiRes

We have to address everything that failed in the test. In this case, GD, Graphics::ColorObjects, Math::VecStat, Readonly and Set::IntSpan. We leave GD behind for a moment and focus on the other modules (this list might vary for each Perl installation, so you might need to install more or less modules, but the commands are similar).

The easiest way to install module in Perl is to use cpan, the repository of Perl modules. It has an interactive shell that we can use, and we will see how to do that. In order to make sure our installation works we use sudo and call cpan (from any directory)

1
$ sudo cpan

If this is the first time you are running it, just answer yes to all config questions and you are good to go. Now we have to install the five modules required. By using the command install, that can be achieved

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cpan$ install Config::General
-output omitted-

cpan$ install Graphics::ColorObject
-output omitted-

cpan$ install Math::Bezier
-output omitted-

cpan$ install Math::VecStat
-output omitted-

cpan$ install Readonly
-output omitted-

cpan$ install Set::IntSpan
-output omitted-

Now, we deal with the last module and usually the most labourious to install, GD. Ideally you should have all possible library support for GD and for this you have to install additional libraries. We are going to start with two of the most common and see if we need anything else. Usually libjpeg and libpng are required by GD. So, let’s download both of them

Downloading and install libjpeg and libpng
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ mkdir srctemp
$ cd srctemp
srctemp$ curl -O http://www.ijg.org/files/jpegsrc.v8d.tar.gz
srctemp$ tar -xzvf jpegsrc.v8d.tar.gz
srctemp$ cd jpeg-8d
srctemp/jpeg-8d$ ./configure
srctemp/jpeg-8d$ make
srctemp/jpeg-8d$ sudo make install

srctemp/jpeg-8d$ cd ..
srctemp$ curl -O ftp://ftp.simplesystems.org/pub/libpng/png/src/libpng16/libpng-1.6.2.tar.gz
srctemp$ tar -xzvf libpng-1.6.2.tar.gz
srctemp$ cd libpng-1.6.2.tar.gz
srctemp/libpng-1.6.2$ ./configure
srctemp/libpng-1.6.2$ make
srctemp/libpng-1.6.2$ sudo make install

That should do it for now. We will download GD and check if the configuration we have so far is enough. GD’s website is still down, but we can get the source from Bitbucket and use identical commands to install is

Installing GD
1
2
3
4
srctemp$ curl -O https://bitbucket.org/libgd/gd-libgd/downloads/libgd-2.1.0-rc2.tar.gz
srctemp$ tar -xzvf libgd-2.1.0-rc2.tar.gz
srctemp$ cd libgd-2.1.0-rc2
srctemp/libgd-2.1.0-rc2/src$ ./configure

At the end of the configuration run, you should see something like this

1
2
3
4
5
6
7
8
** Configuration summary for gd 2.1.0:

   Support for PNG library:          yes
   Support for JPEG library:         yes
   Support for Freetype 2.x library: yes
   Support for Fontconfig library:   yes
   Support for Xpm library:          no
   Support for pthreads:             yes

In my case, I’m good to go. But if in your case Freetype and Fontconfig are missing, you would have to download, configure, make and install them, just like libpng and libjpeg. So now

1
2
srctemp/pierrejoye-gd-libgd-5551f61978e3/src$ make
srctemp/pierrejoye-gd-libgd-5551f61978e3/src$ sudo make install

We are almost there. The last step is to install GD in Perl. Normally, if we use cpan to install it on OS X, it fails. So, we will have to do it by hand. We go to CPAN website and download the latest Perl’s GD implementation and with similar commands to above we install it.

Installing GD on perl
1
2
3
4
5
6
srctemp$ curl -O http://www.cpan.org/authors/id/L/LD/LDS/GD-2.49.tar.gz (if curl fails copy and past on your browser)
srctemp$ tar -xzvf GD-2.49.tar.gz
srctemp$ cd GD-2.49
srctemp/GD-2.49$ perl Makefile.PL
srctemp/GD-2.49$ make
srctemp/GD-2.49$ sudo make install

you should see some output, maybe some warnings, but if you followed all the steps above the installation worked. You can run the test.modules script in order to check.

Leave a comment, if you get any errors, or send me an email.

New Blog

| Comments

Welcome to the new home of Beginning Python for Bioinformatics and Blind.Scientist: Zientzilaria, scientist in Basque. Most posts of the old blogs are converted here, now in an Octopress installation. Slowly, I will start fixing the small mistakes of the Wordpress migration and getting everything up to speed.

In this new home I will try to be more optimistic, bring more news and criticize less. Cheers to a new beginning.

Scientists Are Human Beings

| Comments

Sometimes I wonder if non-scientists perceive us a lot that only does good things, has no ego or is only searching for the betterment of Science, with the exception of climate scientists, clearly. But sometimes, I also forget that scientists are human beings too, are one aspect of being an human is to be driven by hype. Doesn’t matter where you come from, hype is an important part of society, product development, PR departments, economy and Facebook groups.

When that happens, I remember Zotero. A nice, nifty little application that does some wonderful things with references. It’s open source, you can modify it, make it better, report bugs, develop with them, donate if you wish. And even better, it’s developed by real scientists, not by spoon-fed intercontinental jet-setters backed by corporations. Zotero developers are support by grants, by people like you and me, like your parents and in-laws, and it’s an application that anyone can benefit from. Did I mention that it’s free?

So, when I forget about hype and scientists, I remember Zotero and I get sad with the lack of traction in Academia when compared to other similar applications. And then I remember, Zotero has no PR department, we’re its PR department and we suck on being PR. After all, we’re human beings too.