Zientzilaria

Covering the bioinformatics niche and much more

Managing Your Biological Data With Python: A Review

| Comments

In an effort to bring this blog and website back from the realms of the internet archive only, I am trying to setup a series of quasi in-dept review of different Bioinformatics and programming books recently (or not so) published.

I would like to start the series with a title that caught my attention since I first saw it on Amazon: Managing your Biological Data with Python by Via, Rother and Tramontano, published by CRC Press. I had high hopes for this book and thought it would had become a good addition to my arms-reach bookshelf for programming problems on a daily basis. But I was wrong, very wrong.

Overall, it is a merely a mediocre book. The text feels rushed, and the code depicted seems careless designed, with no or barely none technical edition by the publishers and editors. There are many small mistakes in the listings, and sometimes approaches use different lines to obtain the same result without any warning from the authors. The quotes on the backcover mention the discussion of “Python-driven applications” for beginned and intermediate programmers of packages like PyMol and BioPython. The latter is one of the most comprehensive Python modules for Bioinformatics applications is only 40-plus pages long, I don’t think it’s enough even to introduce BioPython. Comparing to the classic (for me) Python for Bioinformatics, where more than half of its pages are dedicated to BioPython and you get an idea.

Some quick examples of the lack of care in preparing the book:

  • on page 64 the authors present a table with operators used in if conditions. Strangely a very Pythonic comparison is missing: A < variable < B.

  • on page 70 a code listing shows line[0:1] for list slicing. Next page similar listing shows line[0]. It is strange enough to slice a list in Python to get one item …

  • some code examples have wrong identation.

  • on page 85 there’s a dictionary search for a key. The has_key method is not used, introduced or mentioned.

  • when dealing with regexes, ‘^>’ is considered complex. Again, I understand that the book has a beginner demographic, but this regex is far from being complex.

  • easy_install is used in the book as the to-go method to add third-party libraries. But a 2014 book not even touching on pip is quite strange.

  • Matplotlib, one of the most complete and complex Python’s plotting libaries, deserves less than 12 pages.

  • There’s a Python-R chapter, but no Pandas.

  • There are 15 pages about Classes.

  • it’s 2014, and os.system() calls are still used in the book. There’s a discussion from 2008 that the method was going to be deprecated

And this is just a short list of things that really caught my eye when skimming through the book, after reading the first four or five chapters with some attention. Another extremely annoying aspect of the book is the line

Source: Adapted from code published by A.Via/K.Rother under the Python License

in every code listing. There are extreme examples, where a couple lines of code require the quote above. This is either a reason to believe that us scientists are too far removed from what happens in the world around us, or some worries authors that their groundbreaking Python code will be used somewhere else without attribution.

In conclusion, make yourself a favour and do not buy this book. But if you really want to buy it, my used-as-new copy is for sale.

Comments