Sorry for the delay between posts. I really intend to post every week but its summer and I have a lot of important drinking while sunbathing by the pool to do!
But dont worry this post is totally worth the wait!
Ok on to the python and bioinformatics. This post builds on my last post so I would recommend checking it out here before reading this one.
In python “True” and “False” (not strings; no quotes when using them to code) conditions are Boolean (or bool) values. Boolean values are a third type of data value (or variable). In the my last post I introduced strings and numbers as data types. Booleans can only be True or False
You can store Booleans in variables
Boolean values are really useful in conditions and loops (if and while statements—coming up!).
Boolean Operations (also not strings- don’t use quotes)
-“or” checks if at least one of the arguments is true. “or” will only evaluate the second argument if the first argument is False (short-circuit operator)
>>>X or Y
-“and” checks to see if both statements are true. “and” will only evaluate the second argument if the first argument is True (also a short-circuit operator)
>>>X and Y
-“not” gives the opposite of the statement
Booleans also have an order of operations (like math) “not” is evaluated first, followed by “and” and lastly “or”
You can play all kinds of “fun” logic games with Boolean values and their operations
>>>True and not True or False
“not” is evaluated first, not True becomes False
“and” is evaluated second, True and False becomes False
“or” is evaluated third, False or False remains False
for more examples check out the screen shot below
Conditions are if/else statements, they allow you to write a script that chooses between two or more actions. Conditionals are multiline scripts so they need to be written in a script editor such as IDLE or Textwrangler rather than in the basic python shell
This also brings up the importance of whitespace in python. Whitespace, is the computer character for space between words and is used to structure code in python (but not in all programming languages).
When your code is not formatted correctly/the whitespace is off you will get the following error:
IndentationError: expected an indented block
In python a conditional statement is written with the if statement first ending with a colon (i.e. if a==5:) on the next line, tabbed over one space is what should occur if the if statement is true (i.e. print “a equals five”). On the next line, without indenting (so that it is even with the if statement) is the else statement followed by a colon (i.e. else:). On the next line, tabbed over one space is what should occur if the statement is not true (i.e. print “a does not equal five”).
You can add other conditions between the if and the else statement. These are elif statements. So in the above example you could add an elif statement (i.e. elif a==6:). The elif statement should also be even with the if and else statements and the next line should be tabbed over one (like the if and else statements) saying what occurs if the elif statement is true (i.e. print “a equals six”).
>>>if fav_food=='candy': print “I love candy!” elif fav_food=='fruit': print “I love fruit!” elif fav_food=='vegetables': print “I’m a weirdo!” else: print “I live on sunshine and air!”
Remember one equals sign is used to assign a value to a variable, while two equals signs are used to denote equality
***Just learned this in the process of writing/editing this post (shout out to the awesome Cassie Ettinger, @cassetron on Twitter)
You can also write conditionals with multiple if statements
>>>if x%2==0: print "x is even" if x%3==0: print "x is divisible by 3" if x%4==0: print "x is divisible by 4" else: print "x is not divisible by 4"
The second if statement will be evaluated whether or not the first if statement is true. If it was replaced with an elif statement it would only be evaluated if the first statement was false. Likewise the final else statement is only evaluated if the statement above it is false.
In the above example if x=18 the code would print x is even, and x is divisible by 3. In the example below (with elif statements), if x=18 the code would only print x is even. It would not evaluate the following statement. This distinction isn’t super important (I’ve successfully written multiple scripts without knowing it) so don’t worry if it is a little confusing.
Conditional with elif statements
>>>if x%2==0: print "x is even" elif x%3==0: print "x is divisible by 3" elif x%4==0: print "x is divisible by 4" else: print "x is not divisible by 4"
For more info on if/else statements you can check out the python documentation
or the wiki
or this other cool site
Solution to the first bioinformatics problem!
The following problem was posted at the end of the last post (here)
Write a program to count the number of each base (ATCG) and the number of ambiguities(N) in the given nucleotide sequence
Here is my solution, remember there are lots of different ways to get to the correct answer in programming, if your syntax doesn’t match mine it isn’t necessarily wrong.
print ‘A ‘ + str(nt.count(‘A’))
print ‘T ‘ + str(nt.count(‘T’))
print ‘C ‘ + str(nt.count(‘C’))
print ‘G ‘ + str(nt.count(‘G’))
print ‘N ‘ + str(nt.count(‘N’))
To count the number of bases and ambiguities in the sequence, I transformed the given sequence into a string by putting it between quotes and setting it equal to nt.
“print nt.count(‘A’)” alone would have returned the number of ‘A’s in the sequence but printing the ‘A ‘ documents which base I was counting. However the ‘A ‘ is a string while nt.count(‘A’) is an integer and they can’t be concatenated together so I used the str() command to change the integer to a string.
David States (@statesdj) pointed out something I overlooked in my code. “Real” sequence data has multiple codes for different types of ambiguities (R, Y, S etc–for all of them check out the IUPAC nucleotide ambiguity code page ). To check for these you can compare the sum of the base counts to the sequence length using the len() command.
New Bioinformatics problem!
Create a sequence that is the complement of the given sequence
Use the sequence in this file