Who this is for
Anybody that is brand new or just beginning into the world of Bioinformatics and wants to remove those super tedious things you do over and over or maybe just somebody new to Python.
Why should you care about Python, you are a biologist?
Since the start of Next Generation Sequencing, biology has entered the realm of big data science. That is, huge files of data are being generated to study biology. Think of the problem like this, huge files filled with lines of text that need to be processed into something useful that tells you something. This either requires you to manually open the file and sift through the files looking for things(and probably making mistakes) or you can spend up font time and write a small python script to do it for you.
Spend X amount of time now to save X*Y amount of time later.
If you have Windows then go download python and install it.
If you have Linux or Mac then you already have Python!
I know this will probably upset some, but it is probably best not to use Windows for programming(Not true in all cases, but for Bioinformatics, using Linux or Mac has many advantages)
For this post I will just focus on the interpreter though to get you used to simple Python expressions
Quick familiarization with the Interpreter
How do you open the interpreter?
Open a terminal and type python and press enter
Start -> Programs ->Python -> Python (command line)
What the heck is an interpreter?
Think of it as a translator. It translates text you type into 1’s and 0’s that the computer can understand. Very similar to spoken language translator.
Your first Python Instruction
Open the python interpreter.
For the rest of this post any time you see »> assume it is a python interpreter command
So lets just do a simple command
>>> print "Hello World"
This should then print Hello World to the screen just below that command
You are now all setup to start exploring
The super quick tutorial. a.k.a The basics
Types and Variables
Variables are simply something that holds a value. Types are simply what type of value is being held. Thus, every variable has a type.
>>> a = 5 # a is the name of the variable and is of type integer >>> b = 5.0 # b is of the type float >>> c = "5" # c is of the type string(Text) >>> d = True # d is of the type boolean and has the value True >>> def testfunction(): >>> print "Hello" >>> e = testfunction # e is of type function and has the value of the testfunction reference.
Don’t worry about that last bit too much, I’m just trying to show you that no matter what, every variable has a type and value
I leave it to you to read up on all the built in types and what type of data they hold. You have to get a good feeling for how variables and types work before you can really progress in any programming language. Just know that there are only a few basic types in any language.
- Integer – Holds non decimal numbers
- Float – (Also called Double in some languages) Holds a decimal number
- Boolean – True or False only!
- Character – A single text letter
- String – Composed of a list of characters
All the built in Python Types
Use a module
>>> import <module>
Usage of modules varies and can be frustrating to figure out how to use them. All I can say is try to read the documentation for that module.
Write a function
>>> def functionname( parameter1, parameter2 ): >>> python code that belongs to the function >>> more code that is indented belongs to the function
>>> x = 5 > 3 >>> if x: >>> print "5 is greater than 3" >>> else: >>> print "You won't get here unless x is True"
While Loop(Think of it as an if statement that happens over and over). Use a while loop if you are unsure how many times you will want to loop. The value of x, in the example below, could have easily came from user input.
>>> x = 0 >>> while x < 10: >>> print x >>> x = x + 1
For Loop(Think of it as a fixed amount of loops)
>>> for x in range( 0, 10 ): >>> print x
The Example. Print all fasta identifier lines
>>> fh = open( 'myfastafile.fasta' ) >>> for lineinfile in fh: >>> if lineinfile.startswith( '>' ): >>> print lineinfile >>> fh.close()
I have no idea where each of those commands came from or how to understand them
Good! You are not an android.
This is a perfect exercise for you to learn how to look stuff up.
open command – http://docs.python.org/2/library/functions.html#open
file handle(fh) – http://docs.python.org/2/library/stdtypes.html#bltin-file-objects (Look under file.next())
Now that you have an idea of our example, maybe the though went through your head that we would likely do that operation again, but on another file. Lets apply the DRY principal and convert it to a function.
>>> def printFastaIdentifiers( fastafile ): >>> fh = open( fastafile ) >>> for lineinfile in fh: >>> if lineinfile.startswith( '>' ): >>> print lineinfile >>> fh.close()
That was easy right? Just generalized that piece of code. To use the function just simply call it like this
>>> printFastaIdentifiers( 'myfastafile.fasta' )
Lets make it a module so we can import it later on and use it.
Save the function(Don’t put the »> in the beginning of the lines!) in a file called fastautils.py
Now, while in the same directory as your fastautils.py open the python interpreter.
>>> from fastautils import printFastaIdentifiers >>> printFastaIdentifiers( 'myfastafile.fasta' )
Now you are really rolling in the right direction. I leave the rest for you to explore.
A few tips
- Think of Python as an instructional language, well all programming languages are basically that, but lets focus on Python.
- Each line you put in is a single instruction. Put a bunch of instructions together to make a script(Module). Put a bunch of modules together and build an Application.
- Any time you find yourself repeating code or even if the code is not 100% the same but close think of the DRY principal.
- Another good principal to follow is the Do One Thing and Do it Well. That is, don’t over complicate your scripts.
- Make sure what you are doing isn’t already done. Check here and here for modules that may doing what you are trying to do.
- Functions, functions functions. Write a function for every piece of the script that does a single task. Part of the DRY principal.
You may feel that this post really isn’t that helpful. There is still so much you may not understand.
Python is pointless unless you explore with it. The best way to start learning is to to just struggle through until it starts to click with you. Every time you find yourself repeating some manual operation, think about exactly what you are doing and what steps are needed. Then try to do it using some programmatic method. In the beginning it will take you much much longer to write a script to automate a simple task, but in the beginning it is not about saving time, it is about investing in your future. Spend the time now and trust me, the payoff will be huge.
Frustrated to no end? Leave a comment and I will respond.