Session+10.1

=Advanced Data Structures and Parsing Structural Data=

__BLAST searches against the PDB and Homology Models__ We will begin by tying up a loose end from Session 8.2 regarding how to search for structural data relevant to any protein or nucleotide sequence. Let's assume that you have identified a SNP that maps to a missense mutation in a protein coding region, and you would like to see if a structure exists of that particular protein to help give biological context to that particular SNP (there are currently ~83,000 structures in the PDB, so it's always worth looking). This task is easily accomplished by downloading the PDB amino acid or nucleotide database from NCBI: [|BLAST databases available for download] The databases that we'd want would either be pdbaa.tar.gz or pdbnt.gz, and we could run the BLAST search locally as described in Session 6.1. New structures are added to the PDB every week, so it is wise to always run your searches with the most recent BLAST database.

In the case that your protein target of interest is not in the PDB, but a highly (at least 20-30% amino acid sequence identity, higher is better) homologous protein does exist in the PDB, then you can always try the Phyre server to generate a homology model: [|Phyre homology model server]

__Advanced Data Structures in Python__ Today we will learn more about functions in Python. We will also learn how to define classes, which are objects of an arbitrary type. These data structures may seem unwieldly at first, but they can greatly simplify code that requires complicated data structures.

Functions (defined as "def" in Python) are objects that can be passed as variables into other functions at will. To help exemplify this, let's look at a simple function:

code format="python" def arbitraryFunction(a,b): return something involving a and b code We're completely comfortable with a and b being objects of type string, int, float, list, dictionary, etc. The truth is, they can be objects of any type. Functions themselves are also objects, and when we type "def _functionNameHere_(variables), we are initalizing an object of type function. Because of this fact, we can also pass functions as variables:
 * 1) Do something here

code format="python" def numberTimesTwo(x): return x*2 def arbitraryFunction(a,b): return a(b)

print arbitraryFunction(numberTimesTwo,4)

code

This may just seem like a cute trick at first, but it can greatly help to simplify complicated code. Have you ever copy and pasted a function that you've written just to replace the name of one function with another? You could completely bypass the need to do this by passing the function that you're changing as a variable. Let's look at an example of some code that we can clean up by passing functions as variables: code format="python" import os def squareEveryObjectOfAList(tempList): output = [] for each in tempList: toAdd = each**2 output.append(toAdd) return output

def cubeEveryObjectOfAList(tempList): output = [] for each in tempList: toAdd = each**3 output.append(toAdd) return output

def stringOfEveryObjectOfAList(tempList): output = [] for each in tempList: toAdd = str(each) output.append(toAdd) return output code Cleaning up the first two functions could be easily accomplished by passing the exponent as a variable. Incorporating the third function into the same syntax, though, is a bit more challenging. Let's pass a function as an argument to clean up this example: code format="python" def doSomethingToEveryObjectOfAList(whatToDo,tempList): output = [] for each in tempList: toAdd = whatToDo(each) output.append(toAdd) return output

def squareANumber(number): x = int(number) return x**2

def cubeANumber(number): x = int(number) return number**3

def stringANumber(number): output = str(number) return output code These data structures are never worth the effort for simple sets of functions, but can be invaluable when you are dealing with complicated analyses __Defining classes__ Just as we have objects of type string, integer, float, list, dictionary, function etc. in Python, we can also define new classes at will

code format="python" import os

class widget: def __init__(self,a,b): self.x = a self.y = b

def multiplyValues(self): return self.x*self.y

def isAWidget(self): return True

example = widget(2,3) temp = example.multiplyValues print temp code .

Ex. 1 - Write a function that takes in output from the atributeFileParse function from S8.2 Exercise 4 ([|Download here]), a cutoff value and a boolean. If the boolean is true, the function returns a list of the indices of all residues that are greater than the cutoff; if the boolean is false, the function returns a list of the indices of all residues that are less than the cutoff. [|SAS File Here]

Ex. 2 - Re-write your function from Exercise 1 such that it takes an arbitrary function for determining whether the data should be above or below the cutoff, and pass that function in as a variable. Write a function for percentConserved files and another for SAS files, and pass them as arguments to your first function.

Ex. 3 - Write a script to read in SAS attribute files for both the open and closed ring Rho structures, and also read in percent conserved attribute files. Save the information for each file as a list of dictionaries (where position 0 corresponds to Chain A, position 1 to Chain B, etc) keyed by residue number. Generate a list of residues for which SAS>40.0 in every position in both the open and closed ring Rho structures, and all residues that are less than 50% conserved. [|Open SAS File] [|Closed SAS File]

Ex. 4 - Write a class of type residue that is initialized with SAS, % conservation, residue number, chain letter and x/y/z coordinates. 3a - Ensure that it is possible to retrieve any of these values from an object of type residue. 3b - Using the getDist function from S8.2 Exercise 3 ([|Download here]) to write a function that takes another residue as an argument and returns the distance to that residue.

Ex. 5 - Use Mike's code ([|Download here]) to generate suitable labeling pairs and to visualize these pairs in Chimera. What cutoff values make sense to you, and what pairs appear to be best for getting signal differences between the open and closed conformation of Rho?