Session+4.1

=**Functions and Modules**=

**Topics:**

 * Function definitions
 * Function documentation
 * Functions within functions
 * Modules
 * Modules of Interest: sys, math, collections
 * Making your own libraries

**Introduction**
This afternoon we'll concentrate on our last fundamental programming concept for the course. To date, we've been writing all of our program logic in the main body of our scripts. And we've seen how built-in python **functions** like **raw_input** are used to operate on variables and their values. In this session, we'll learn how to write **functions** of our own, how to properly document them for ourselves and other users, and how to collect them into **modules**, and make our own local repositories, or **libraries**.

If you properly leverage a well-designed function, writing the main logic of your programs becomes almost-too-easy. Instead of writing out meticulous logical statements and loops for every task, you just call forth your previously-crafted logic, which you've vested in well-made **functions**.

**Functions**
Functions are the basic means to manage complexity in your programs, allowing you to avoid nesting and repeating large chunks of code that could otherwise make your tasks unmanageable. They allow you to bundle code with a defined input and output into single lines, and you should use them frequently from now on.

We will start with the syntax:

code format="python"
 * 1) !/usr/bin/env python

def hello(name): greeting = "Hello %s!" % (name) return greeting
 * 1) define the function

functionInput = 'Zaphod Beeblebrox' functionOutput = hello(functionInput) print functionOutput
 * 1) use the function

code

To define a function, you use the keyword **def**. Then comes the function name, in this case **hello**, with parentheses containing any input **arguments** the function might need. In this case, we need a name to form a proper greeting, so we're giving the **hello** function a variable **argument** called **name**. After that, the function does its thing, executing the indented block of code immediately below. In this case, it creates a greeting "Hello !". The last thing that it does is **return** that greeting to the rest of the program.

Technically speaking, a function does not //need// to explicitly return something, although it's uncommon that you'll write any that don't. If you don't **return** something explicitly, Python will nevertheless return the special object **None**. None is logically false (for if statements), and **print**ing None will result in nothing being printed (although None is not the empty string). It's easy to forget to **return** a value, so this is an easy first thing to check in case your functions don't work as expected.

Note that the variable names are different on the inside and the outside of the function: I give it **functionInput**, although it takes **name**, and it returns **greeting**, although that return value is fed into **functionOutput**. I did this on purpose, as I want to emphasize that the function only knows to expect something, which it internally refers to as **name**, and then to give something else back. In fact, there is some insulation against the outside world, as you can see in this example:

code format="python"
 * 1) !/usr/bin/env python

def hello(name): greeting = "Hello %s!" % (name) testVariable = """The hotel room is a mess, there's a chicken hangin'                  out, somebody's baby is in the closet, there's a                   tiger in the bathroom that Mike Tyson wants back, Stu                   lost a tooth and eloped, and Doug is missing.""" print 'Inside of the function:', testVariable return greeting

testVariable = "What happens in Vegas stays in Vegas." grt = hello("Stu Price") print 'Outside of the function:', testVariable code

Even though the epic story of a bachelor party gone horrifically awry was assigned to a variable called **testVariable** inside the function, nothing happened to that variable outside the function. Variables created inside a function occupy their own **namespace** in memory distinct from variables outside of the function, and so reusing names between the two can be done without you having to keep track of it. (Refer to this article about **namespace** for more information.) That means you can use functions written by other people without having to keep track of what variables those functions are using internally. Just like a sleazy town in Nevada, what happens in the function stays in the function. (An important exception lies with lists and dictionaries, which you will examine in the exercises.)

Let's have another example, returning to a more pressing subject:

code format="python"
 * 1) !/usr/bin/env python

def whichFood(balance): if balance < 10: return 'ramen' elif balance < 100: return 'good ramen' elif balance < 200: return 'better ramen' else: return 'ramen that is truly profound in its goodness'

print whichFood(14)

code

Here we've made a slightly more complicated function-- it contains some control statements, and there is more than one way for it to **return**. We also never explicitly create an input variable (as we did with **functionInput** in the first example), and we don't store the output to a variable either (as we did with **functionOutput**).

Finally, we've shown examples with one input variable and one return value, but functions can accept zero input variables, one input variable, or multiple input variables, and functions don't necessarily need to return variables back to the program, but they are also capable of returning multiple variables. They can even have other functions nested inside them!

Here are a few more examples of the syntax used with functions:

code format="python"
 * 1) !/usr/bin/env python


 * 1) functions can do their thing without taking input or returning output

def useless: print 'What was the point of that?' print

useless

def countToTen: for i in range(10): print i

countToTen

code

code format="python"
 * 1) !/usr/bin/env python


 * 1) functions can also take multiple items in and return multiple items out

def doLaundry(amtDetergent, dirtyClothes): cleanClothes = [] for load in dirtyClothes: amtDetergent -= 1 cleanLoads.append(load) return (amtDetergent, cleanClothes)

amtTide = 5 dirtyLaundry = ['socks','shirts','pants'] (amtTide, cleanLaundry) = doLaundry(amtTide, dirtyLaundry) print "Amount of Tide left:", amtTide print cleanLaundry

code

Above, in **doLaundry**, I returned a **tuple** of the two variables enclosed in parenthesis. You could also return a **list**, which works much the same way. (Some more information on the distinctions between tuples and lists can be found at this link.) You could return other objects as well, like **dictionaries**.

code format="python"
 * 1) !/usr/bin/env python

def returnStuff: a = '>Gene1' b = 'ATGGTGGG' return [a,b] # returns the output as a list

print type(returnStuff)

print returnStuff[0] print returnStuff[1]
 * 1) We can index the output the same as any list

(name, seq) = returnStuff # stores output to the variables x & y,                           # so you can access x and y directly print name print seq

both = returnStuff # stores the output to the variable both # which will be a list print both

dictOfStuff = {} dictOfStuff[returnStuff[0][1:]] = returnStuff[1] print dictOfStuff

code

So how do functions make our lives easier? We can exploit functions to break difficult tasks into a number of easier tasks, and then these easier tasks into ones easier still, and so on. Large code blocks, with a few function calls, are only tens of lines long, and many functions are only a handful of lines. This allows us to program in large, structural sweeps, rather than getting lost in the details. This makes programs both easier to write and easier to read:

code format="python" def publishAPaper(authors,topic,journal): data = doWork(topic) figures = analyze(data) paper = writePaper(data,figures) submit(authors,paper,journal)

code

And, a big part of that ease comes with the use of:

**Modules**
In all of the examples above, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without tiresomely writing them all out again. But wouldn't it be nice to share functions across programs, too? For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.


 * Modules** solve these problems. In short, they're collections of functions and variables (and often objects, which we'll get to towards the end of the course) that are kept together in a single file that can be read and imported by any number of programs.

**Using a module: the basics**
To illustrate the basics, we'll go through the use of two modules, **sys** and **math**, one of which we use almost all the time. In fact, it's a very rare program indeed that doesn't use the **sys** module. **sys** contains a lot of really esoteric functions, but it also contains a simple, everyday thing -- what you typed on the command line. To illustrate, if we were to create a new program called //testprogram.py// and type the following commands into Terminal:


 * $ ./testprogram.py argument1 argument2 argument3**

then the **sys** module would contain a list of strings called **argv** composed of the following: ['./testprogram.py', 'argument1', 'argument2', 'argument3']. We can access the list **argv** from our program by importing the module **sys**.

code format="python"
 * 1) !/usr/bin/env python

import sys # gaining access to the module

commandLine = sys.argv
 * 1) you can access variables stored in the module by using a dot
 * 2) to get at the variable 'argv' which is stored in 'sys', type:

print commandLine code



Conveniently, we can access functions stored inside modules. To demonstrate this, I'll use the module **math**.

code format="python"
 * 1) !/usr/bin/env python

import sys import math


 * 1) sys.argv contains only strings, even if you type integers.
 * 2) And, remember, the first element is the command itself-- usually
 * 3) not very useful.

x = float(sys.argv[1]) # argv stores the command line arguments as                      # strings, but python isn't especially clever, # so we can't do math with strings logX = math.log(x)

print logX code

There's actually a really great module that lets you call your program really easily from the command line, without having to manually parse out what each of the arguments does. I'll show you how to use that next week.

Great! Not so hard.

Modules have more than just functions: The collections module
We already knew this: sys.argv is a list. Another thing that modules often contain is datatypes. Just as Python has some built-in datatypes (like int, list, str, and dict), it's also possible (although outside the scope of this course) to create full-fledged data types of your own.

One of the more useful of these is the collections module. It has a bunch of new data types that are, as you might guess from the name, collections of other things. There are two of them that I use with some regularity: Counter and defaultdict. Let's start with Counter, which counts things.

code format="python" import collections

my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus', 'Lactobacillus', 'Oryza', 'Wolbachia', 'Oryza', 'Rattus', 'Lactobacillus', 'Drosophila']
 * 1) A "Hello world" acrostic

c = collections.Counter(my_list) print c


 * or

d = collections.Counter

for genus in my_genera: d[genus] += 1

print d

code

Now, we could do this same thing with a dictionary:

code format="python" import collections

my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus', 'Lactobacillus', 'Oryza', 'Wolbachia', 'Oryza', 'Rattus', 'Lactobacillus', 'Drosophila']

c = {}

for genus in my_genera: if genus not in c:       c[genus] = 0 c[genus] += 1 code

But using a Counter is faster to write, shorter to read, and makes it more obvious that we are counting, as opposed to a dictionary, which could be used for almost anything. Another big advantage of the Counter type is that it makes it really easy to sort by frequency:

code format="python" import collections

my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus', 'Lactobacillus', 'Oryza', 'Wolbachia', 'Oryza', 'Rattus', 'Lactobacillus', 'Drosophila']
 * 1) A "Hello world" acrostic

c = collections.Counter(my_list)

print c.most_common

code

The other collections type I really like is the defaultdict, which is like a dictionary, but has a default type for a key that we haven't seen before (with a normal dict, if you try to read something where the key isn't in the dict, then you get an error). Let's think about how we'd make a dictionary where each key is a genus, and the value is a list of species in that genus:

code format="python" import collections

my_species_list = [('Helicobacter','pylori'), ('Escherichia','coli'), ('Lactobacillus', 'helveticus'), ('Lactobacillus', 'acidophilus'), ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'), ('Oryza', 'glabberima'), ('Rattus', 'norvegicus'), ('Lactobacillus','casei'), ('Drosophila','melanogaster')]

d = {}

for genus, species in my_species_list: if genus not in d:       d[genus] = [] d[genus].append(species)

print d code

With a defaultdict, we can once again save the line in the for loop where we check for a non-existent key: code format="python" my_species_list = [('Helicobacter','pylori'), ('Escherichia','coli'), ('Lactobacillus', 'helveticus'), ('Lactobacillus', 'acidophilus'), ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'), ('Oryza', 'glabberima'), ('Rattus', 'norvegicus'), ('Lactobacillus','casei'), ('Drosophila','melanogaster')]

d = collections.defaultdict(list)

for genus, species in my_species_list: d[genus].append(list)

print d code

One thing to look at is the line where we actually declare the defaultdict: here we've given it another type, and if we use a key that's not in the dictionary already, it will initialize it to be an empty variable of that type. Most often, this will be a list, but you could imagine uses for other types, like a string, an integer (here "empty" actually would mean 0), or even another dict. It's possible to even have a [|defaultdict of defaultdicts], but doing so would require covering more than we have time for.

It turns out that it's easy to write our own modules too:

**Making a module**
Any file of python code with a //.py// extension can be imported as a module from your script. When you invoke an import operation from a program, all the statements in the imported module are executed immediately. The program also gains access to names assigned in the file (names can be functions, variables, classes, //etc.//), which can be invoked in the program using the syntax **module.name**. Go ahead and make your first module by pasting the following code into your text editor and saving as //greeting_module.py//:

code format="python" print 'The top of the greeting_module has been read.'

def hello(name): greeting = "Hello %s!" % name return greeting

def ahoy(name): greeting = "Ahoy-hoy %s!" % name return greeting

x = 5

print 'The bottom of the greeting_module has been read.'

code

Now make a new program called //test.py// with the following code and include your first name as an argument in the Terminal command line when you execute it: code format="python"
 * 1) !/usr/bin/env python

import greeting_module

hi = greeting_module.hello('Peter') print hi print greeting_module.x


 * 1) What happens if you try 'print x' here?


 * 1) Remember how to access argv?

import sys

print greeting_module.hello(sys.argv[1])
 * 1) This will take your Terminal argument as input for the greeting
 * 2) module's hello function

code

And that's it! See-- no more messy function declarations at the beginning of your script. Now if you need any other program to say hi to you, all you need to do is import the greeting module.

**Using modules: slightly more than just 'import'**
Although creating a basic module is easy, sometimes you want more than just the basics. And although using a module in the most basic manner is easy, it's best to get a more thorough picture of how modules behave.

First, what if you only want one function from a given module? Let's say, as an Alexander Graham Bell loyalist, you really only dealt in 'ahoys' rather than 'hellos.' We need to use a modified syntax for retrieving //only// the **ahoy** function from the module, without cluttering things up by loading the newfangled **hello** function preferred by T.A. Edison's entourage.

Change the code in //test.py// to the following:

code format="python"
 * 1) !/usr/bin/env python

from greeting_module import ahoy

hi = ahoy('everybody') print hi
 * 1) if you grab a function from a module with a 'from' statement,
 * 2) you don't need to use the . syntax

code

We see that we can now write **ahoy('everybody')** directly, instead of having to write **greeting_module.ahoy('everybody')**. And if we wanted to access both functions this way, we could import them both in one statement by changing the import line in //test.py// to the following:

code format="python" from greeting_module import ahoy_hoy, hello code
 * 1) !/usr/bin/env python

Or, what if there were a lot of functions from the **greeting_module** we wanted to use, but didn't want to write out the full name? Rather than writing out all of the function names to import individually (there could be a lot of them), we can use the asterisk wildcard (*) symbol to refer to them.

code format="python" from greeting_module import *
 * 1) !/usr/bin/env python

hi = ahoy('everybody') hi2 = hello('everybody')

code

While this may be useful if we are familiar with the contents of the **module**, including all of the **names** inside, there are a few reasons to be careful about using the **from modulename import** * syntax. First, if the module contains a lot of variables that we don't need to use, we will needlessly allocate memory to storing the information. Second, and perhaps more importantly, if the **module** being imported contains variables with the same names as those inside your program, you will lose access to the original values of those variables. For example, would might have a problem if both //yourprogram.py// and //yourmodule.py// each define distinct functions called hello. If instead you use the syntax **import yourmodule**, then you can call the function in //yourprogram.py// using hello and you can call the function in //yourmodule.py// using yourmodule.hello. If you want to import a whole module, but don't want to type out it's full name every time, you can use the syntax: **import a_long_module_name as mname**.

Finally, you can also import variables from modules and assign them new names in your program using the syntax **from modulename import variablename as newvariablename**.

Where to Store Your Modules: using PYTHONPATH
Over time, you'll end up accumulating lots of these modules, and they'll tend to fall together in meaningful collections. For example, you might have a module for all your functions related to reading and parsing files, called files_tools.py. You might have another for common sequence-related tasks, called sequence_tools.py. Python keeps its modules installed in a system directory that you may or may not have access to on a remote server. Therefore, it's useful and simpler to just create your own python modules directory and then let your operating system environment know about it. In MacOS, I accomplish this by placing my modules in **/Users/nathaniel/Library/Python/Modules** and then adding a few lines to my **.bash_profile** file in my home directory with the following terminal commands:


 * $ echo 'PYTHONPATH=~/Library/Python/Modules' >> .bash_profile**
 * $ echo 'export PYTHONPATH' >> .bash_profile**
 * $ source .bash_profile**

NOTE: ~ is a shortcut to your own full home path: //e.g.// '/Users/pacombs/'. (Remember, you can see what your home path is called in Terminal by typing pwd from your home folder.)

And with that, any .py file that ends up in this directory will be treated as a module by Python. And though this is a good final resting place for your polished modules, you can also prototype them by simply saving them in your current working directory, and moving them over when you're happy with them.

=**Exercises:**=

**1: Practice with functions**
Make a function that:


 * A) Takes an integer x as input and prints x *** 2.

B) Takes integers x and y as input and prints x * y.

C) Takes a list xs as input and prints xs[0] * xs[1].


 * D) Modify the above programs so that the function returns the result instead of printing it, then the output is printed from program that called the function. **

**2. What happens in functions doesn't always stay in functions**
As promised, most things that happen in functions stay in the functions, but there are important exceptions. Make the following functions, which should illustrate this property:

A) The function takes an integer as input and increments the integer by one using the '+=' operator. Print the value of the integer before and after the function is called.

B) The function takes a list as input and changes the first element of the list to the string 'x'. Print the value of the list before and after the function is called.

C) The function takes a dictionary as input and adds the key 'x' with value 'y' to this dictionary. Print the dictionary before and after the function is called.

**3. Reverse Complement**
A) Write a function that takes a DNA sequence as an argument, ensures that it the sequence is in capital letters, and then returns the reverse complement of the sequence.

B) Modify the function to ensure that only the characters A, T, G, C and N (for unknown nucleotide) are in the input sequence.

**4. Making a module**
Create a directory in your PythonCourse directory called pylib, then add it to your PYTHONPATH. Create a module in this directory called //exercises.py//. Put your functions from Exercise 1 into this module. Now write two programs that import and call all of the functions in the module both of these ways:

A) A program that uses the line **import exercises.**

B) A program that uses the line **from exercises import** *. What happens if you have print statements in //exercises.py//? Are they printed when you use the **from** statement?

C) Add your reverse complement function from Exercise 3 to this module.

**5. Make a FASTA parser**
Starting with your script from this morning, make a function that takes a FASTA file as input, reads through the file using open, distinguishes between ID-containing lines and sequence-containing lines, and returns a dictionary with gene IDs as keys and sequences as values. Put this function along with your reverse complement function into a //sequence_tools.py// **module.**

Copy and paste the following lines into a file called //testFasta.fa//. Create a program that imports the //exercises.py// module and prints the sequence corresponding to the gene ID 'gene3.' code >gene1 ATGAGACGTAGTGCCAGTAGCGCGATGTAGCG ATGACGCATGACGCGCGACGCGCGAGTGAGCC ATACGCACGCATTGGCA >gene2 ATGTTCGACGCATACGACGCGCAGTACCAGCA ATGACGCACCGGGATACACGACGCGGATTTTT ACGCACCGAGATAGCATAAAAGACCATTAG >gene3 TTATGGCACCCACTAGAGCCAGATTATTTTAAA AGATGGGGG code

code format="python" from exercises import fastaParser
 * 1) !/usr/bin/env python

geneDict = fastaParser('testFasta.fa') print geneDict['gene3'] code

**6. (Bonus) Create an ORF finder**
For our purposes, we will define an open reading frame (ORF) as a start codon followed at some distance by a stop codon in the same frame. This program should take a dictionary from a parsed FASTA file as in (5) as input and outputs dictionary of gene name->ORF sequence key-value pairs. If the sequence does not contain an ORF, then the gene name should not be in the dictionary.

7. For This and Giggles.
Try out the following code:

code format="python"
 * 1) !/usr/bin/env python

import this code

code format="python"
 * 1) !/usr/bin/env python

import antigravity code

=Solutions=

**1) Practice with functions:**
code format="python"
 * 1) !/usr/bin/env python


 * 1) a) Takes an integer x as input, prints x*2 (x multiplied by 2)

def timestwo(x): print '%.0f multiplied by 2 is %.0f' % (x, x * 2)

print num = float(raw_input('Input number to multiply by 2: ')) x = timestwo(num) print


 * 1) b) Takes integers x and y as input, prints x * y

import sys commandLine = sys.argv print 'You entered the numbers', commandLine[1:], 'into the commandline.'
 * 1) Below I'll generate the list using command arguments, since
 * 2) we learned that today, but you could write them into the
 * 3) script instead

def product(x,y): print "The product of the first two numbers is %.0f." % (x*y)

numToMultiply1 = float(sys.argv[1]) numToMultiply2 = float(sys.argv[2]) multiplied = product(numToMultiply1, numToMultiply2) print


 * 1) c) Takes a list xs as input, prints xs[0] * xs[1]

listOfNumbers = [2,3,3,4]

def product(xs): result = xs[0] * xs[1] print 'You supplied the list: %s' % (xs) print 'The product of the first two numbers in the list is %.0f.' % (result)

multipliedNumbers = product(listOfNumbers) print multipliedNumbers # returns None print


 * 1) d) Modify the above programs so that the function returns
 * 2) the result instead of printing it. This result is then
 * 3) printed by the program that called the function.

listOfNumbers = [2,3,3,4]

def product(xs): result = xs[0] * xs[1] print 'You supplied the list: %s' % (xs) return result

multipliedNumbers = product(listOfNumbers) print 'The product of the first two numbers in the list is %.0f, ", print 'but this time we returned the result', print ' from the function.' % (multipliedNumbers) print code

**2. What happens in functions doesn't always stay in functions.**
code format="python"
 * 1) !/usr/bin/env python


 * 1) a) The function takes an integer as input, and it increments that integer
 * 2)    by one using the '+=' operator. Print the value of the integer before
 * 3)    and after the function is called.

def increment(numberToIncrement): numberToIncrement += 1

numberToIncrement = 5 print 'The number to increment was', numberToIncrement increment(numberToIncrement) print 'The number is still', numberToIncrement print


 * 1) b) The function takes a list as input, and it changes the first element
 * 2)    of the list to the string 'x'.
 * 3)    Print the value of the list before and after the function is called.

def modifyList(x): x[0] = 'overwrite' return x

stringlist = ['1', '33', '5', 'dog'] # could have used list of integers, # or any type of list print 'The list was', stringlist modifyList(stringlist) print 'Now the list is', stringlist print


 * 1) c) The function takes a dictionary as input, and it adds the key 'x'
 * 2)    with value 'y' to this dictionary.
 * 3)    Print the dictionary before and after the function is called.

def appendToDict(Dict_with_a_new_name): Dict_with_a_new_name['x'] = 'y'

Dict = {} Dict['0'] = 'zero' Dict['1'] = 'one' Dict['2'] = 'two' print 'Before:', Dict

import sys commandLine = sys.argv

appendToDict(Dict) print 'After:', Dict print code

**3. Reverse Complement**
code format="python"
 * 1) !/usr/bin/python

def revComp(seq): seq=seq.upper # Makes seq uppercase seq=seq[::-1] # Reverses seq seq=seq.replace('A','t') # Replace ACGT with lowercase complement seq=seq.replace('C','g') seq=seq.replace('G','c') seq=seq.replace('T','a') seq=seq.upper # Make seq uppercase again

isitempty=seq isitempty=isitempty.replace('A',"") isitempty=isitempty.replace('C',"") isitempty=isitempty.replace('G',"") isitempty=isitempty.replace('T',"") isitempty=isitempty.replace('N',"") if isitempty != "": print "Careful, improper characters!"

return seq


 * 1) Iterative method
 * 1) Iterative method

def revCompIterative(watson): complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C', 'N':'N'} watson = watson.upper watsonrev = watson[::-1] crick = "" for nt in watsonrev: crick += complements[nt]

return crick

print revComp("aTNrg")

code

**4. Making a module.**
code format="python"
 * 1) !/usr/bin/env python


 * 1) Make a directory in /Users/[username]/PythonCourse/pylib
 * 2) Open a new terminal window and type the following:
 * 3) cd ~
 * 4) echo "PYTHONPATH=~/PythonCourse/pylib" >>.bash_profile
 * 5) echo "export PYTHONPATH" >>.bash_profile
 * 6) source .bash_profile
 * 7) Create a file called exercises.py in the pylib folder,
 * 8) copy in your timestwo function
 * 9) To verify it worked, try part a

import exercises print exercises.timestwo(4) # or whatever your function was called
 * 1) Part a

from exercises import timestwo print timestwo(6)
 * 1) Part b --note, this should be run separately from part a


 * 1) Part c
 * 2) Copy the reverse complement function from problem 3 to
 * 3) PythonCourse/pylib/exercises.py

code

**5. Make a FASTA parser**
Below is the module called //exercises.py// where we have stored our functions. code format="python"
 * 1) !/usr/bin/env python

def fastaParser(filename): current_gene = "" genes = {} fh = open(filename, 'r')

for line in fh: line = line.strip if line.startswith('>'): current_gene = line[1:] genes[current_gene] = '' else: genes[current_gene] += line

return genes

def timestwo(x): print '%.0f multiplied by 2 is %.0f' % (x, x*2)
 * 1) Takes 1 integer x as input, prints x*2

def product1(x,y): print "The product of the first two numbers is %.0f." % (x*y)
 * 1) Takes 2 integers x and y as input, prints x * y

def product2(xs): result = xs[0] * xs[1] print 'You supplied the list: %s' % (xs) print 'The product of the first two numbers in the list is %.0f.' % (result)
 * 1) Takes a list as input, prints xs[0] * xs[1]

def product3(xs): result = xs[0] * xs[1] print 'You supplied the list: %s' % (xs) return result code
 * 1) Same as product2 except this function returns the
 * 2) result instead of printing it. This result can then
 * 3) be printed by the program that called the function.

Below is the script called //Exercise5.py// that will import functions from the module //exercises.py//. code format="python"
 * 1) !/usr/bin/env python


 * 1) a) A program that uses the 'import exercises' line.

import exercises x = exercises.squareNum(12)


 * 1) b) A program that uses the 'from exercises import *' line

from exercises import product1 product1(2,3)


 * 1) c) Add your reverse complement function from Exercise 3 to this module.

from exercises import fastaParser

x = fastaParser('seq.FASTA') print x code


 * 7. (bonus) Create an ORF finder**

code format="python"
 * 1) !/usr/bin/env python

import exercises

def find_orfs(sequence): """ Finds all valid open reading frames in the string 'sequence', and returns them as a list"""

starts = find_all(sequence, 'ATG') stop_amber = find_all(sequence, 'TAG') stop_ochre = find_all(sequence, 'TAA') stop_umber = find_all(sequence, 'TGA') stops = stop_amber + stop_ochre + stop_umber stops.sort

orfs = []

for start in starts: for stop in stops: if start < stop and (start - stop) % 3 == 0: # Stop is in-frame orfs.append(sequence[start:stop+3]) # the +3 includes the stop codon break # break out of the inner for loop # when we hit the first stop codon return orfs

def find_all(sequence, subsequence):  Returns a list of indexes within sequence that are the start of subseq start = 0 idxs = [] next_idx = sequence.find(subsequence, start)

while next_idx != -1: idxs.append(next_idx) start = next_idx + 1 # Move past this on the next time around next_idx = sequence.find(subsequence, start)

return idxs

fname = sys.argv[1] # Read in from the first command-line argument

genedict = exercises.fasta_parser(fname)

orfdict = {}

for gene in genedict: gene_seq = genedict[gene] orfs = find_orfs(gene_seq) if len(orfs) > 0: orfdict[gene] = orfs

print orfdict

code

8. For This and Giggles

 * import this** should print 'The Zen of Python to the Terminal'
 * import antigravity** open's your web browser and points it to the xkcd comic about **import antigravity** (a bit meta, no?)