!nfoPlatter has moved to a new location!

You should be automatically redirected in 6 seconds. If not, visit
http://infoplatter.wordpress.com
and update your bookmarks. Thank you!!

Saturday 26 October 2013

Your eLearning Directory of Bioinformatics Essentials!

In the previous section we discussed "What makes you a Good Bioinformatician?" and concluded that one requires to take a slice from the three major pillars of bioinformatics i.e. biology, mathematics and computer sciences. It is required to gain expertise in any of the two, and learn the fundamentals of the third one. We also discussed that there are range of options that one can opt for, from amongst the three. Here, I have hand-picked a list of online resources that can provide a PRIMER on some of these options.


Note: This page will update time to time! You might want to save this page for the current listings or Bookmark it for further updates in this catalogue.


Biology                                                 

Mathematics                                        
Computer Sciences                                

Cross Platform Courses                            


This catalogue is intentionally kept smaller, in spite of dozens of other publicly available courses. The intention is NOT to trigger a "Decision Fatigue", a very common fallacy arose when mind is subjected to multiple options (as very well described by Rolf Debelli), but to facilitate the selection.

If you want to give a try :-) visit:
http://www.coursera.org
http://ocw.mit.edu/courses
http://nptel.iitm.ac.in/
http://videolectures.net/


If you find any alternative and better course for the given topics, please care to use the following comment box. I will update this list whenever possible.

Sunday 20 October 2013

What Makes You a Good Bioinformatician?

Does knowing contemporary data analysis tools and keeping a set of persuasive Unix commands at your fingertips make you a good bioinformatician? Or, is it that your stunning programming skills that transform you into a big tool production house make you a good bioinformatician?

I, recently, performed an extensive literature mining and considered the opinions of pioneering researchers and scientists in the field of bioinformatics on this topic.

Disclaimer:
  1. If you are an established researcher and looking from a professional angle then this article is not for you (Although we are going to discuss the foundation here). I would be happy to redirect you to "10 Steps to Success in Bioinformatics"
  2. If you are not from the category 1, then you are on the right page.

While mining I came across various raging discussions regarding, what bioinformaticians are meant to do? Some people were adamant on the notion that experts from various fields should collaborate together instead of one person doing all the things poorly! It is crucial to understand that a bioinformatician is different from mathematician/statistician and he/she is also different from a biologist in regards to their ability to establish an interface for research and also channel the requirements and findings from both the directions.

A disparity between a biologist, a mathematician and a bioinformatician!

One thing I observed is that the definition of bioinformatics is not universally agreed upon. Generally speaking, we define it as the development and application of information technology for understanding the problems in biology, most commonly molecular biology (but increasingly in other areas of biology as well). As such, it deals with methods for systematic storage, retrieval and analysis of biological data including but not limited to nucleic acid and protein sequences, their structures, functional roles, regulatory pathways and biophysical interactions.

Some researchers interpret bioinformatics more narrowly and include only those issues dealing with high throughput sequencing data along with the  integration and the analysis of OMICs. A few construe bioinformatics more broadly and include all areas of computational biology, including population modelling and molecular simulations. Others construe bioinformatics as the development of necessary tools and databases for the analysis of biological data to draw meaningful conclusions.

In spite of the diversity in the bioinformatics domains, the foundation of which can be summed up in three major pillars of education:
  1. Biology
  2. Mathematics and Statistics
  3. Computer Science
It is ideal but at the same time ambitious to be able to grasp all the three pillars of bioinformatics. A good bioinformatician would possess a thorough knowledge of any two pillars but AT THE SAME time should also be aware of the fundamentals of the third one.

If you have trouble deciding, then let your INTEREST pick your two pillars, based on which you will develop a bioinformatics flavor, right from sequence analysis to molecular dynamic simulations.

These pillars in themselves are very diverse if we dive into them. For example, when we talk about biology, it comprises of molecular biology, biochemistry, evolution, ecology, behaviour etc. Similarly, in mathematics and statistics, you have a range of options right form probability theory to analytical combinatorics. In soft skills you have options from programming languages to learning systems.

You can browse through some of these streams and learn more about them here at "Your eLearning Directory of Bioinformatics essentials".

One becomes a bioinformatics domain expert based on what slice he/she takes from the stack. You may chose molecular biology, differentials and integrals, and Matlab or equivalent, and be a good bioinformatician in the domain of molecular dynamics and simulations. Some people might argue that this is a very naive theory and one would require to know much more than that. Well naturally there is no upper limit. If you keep your knowledge stagnant for a long, you will soon realise that you have stopped growing. The pace at which advancements are happening in the field of biology and technology, it has become vital to upgrade our knowledge, skills and techniques but at the same time it is absolutely essential to stick to your stream, because success comes from experiences and experiences comes from dedicated efforts.

Your integrity is like a tip of an iceberg. The same tip, from outside may appear smaller than a frozen lake but when it comes to summer, the tip is all that remains since it has a strong foundation which no ordinary summer can melt.




Tuesday 15 October 2013

Extracting Specific Fasta record/s from a Multi-fasta File

While dealing with multi-fasta files, it is often required to extract few fasta sequences which contain the keyword/s of interest. One fast way to do this, is by awk.

For example:

Input file: hg19_genome.fa

    >Chr1
    ATCTGCTGCTCGGGCTGCTCTAT...
    >Chr2
    GTACGTCGTAGGACATGCATCG...
    >MT1
    TACGATCGATCAGCTCAGCATC...
    >MT2
    CGCCATGGATCAGCTACATGTA...



We would like to extract the sequence for Chr2 from hg19_genome.fa. Use the following command:


$ awk 'BEGIN {RS=">"} /Chr2/ {print ">"$0}' hg19_genome.fa


Output:

    >Chr2
    GTACGTCGTAGGACATGCATCG...

Note that, the search keyword (here 'Chr2') doesn't have to be an exact match. If you use 'MT' instead, you will get the third and fourth entry, since 'MT' is a sub-string of the third and fourth fasta record.

Now lets break down the command so that we don't have to mug it up or we could mold it and use it in variety of other places.


  • awk -- This is the main command (Or more of a very powerful programming language)
  • '' -- We write every bit of awk code inside these single quotes
  • BEGIN -- This tells the awk to execute the immediately following code in curly brackets at the beginning.
  • {RS=">"} -- Record separator  (If we look at the file, we can observe every sequence starts with a ">" sign. This helps us to separate two fasta records)
  • /Chr2/ -- keyword to search in the entire record
  • {print ">"$0} -- Here $0 is the current record (From "Chr2" to the entire sequence till next ">"). We added ">" at the beginning just to get the standard identitifer which is not included in $0.
  • hg19_genome.fa -- This is the input multi-fasta file that we have used.

Now,
Suppose we are interested in more that one keyword then two possibilities arise:


You want BOTH the keywords present,
awk 'BEGIN {RS=">"} /Chr2/ && /MT/ {print ">"$0}' hg19_genome.fa

You want EITHER of the keyword present,
awk 'BEGIN {RS=">"} /Chr2|MT/ {print ">"$0}' hg19_genome.fa



Note: If you are using Windows, you can download and install 'gwak' and use similar command. In zsh shell you might need to use an escape character for | (pipe).


I am sure many of you might have different flavors to do the same. If you think it is worth sharing then the comment box is all yours.

Happy Coding !!