Session+3.1

=Fancy Data Structures=

Topics:

 * Tuples
 * Sets
 * Lists of lists
 * Dictionaries of dictionaries
 * Lists of dictionaries, dictionaries of lists
 * Nested structures

Introduction
Let's say that you have a dictionary. In fact, let's say that you are a collector of English dictionaries, and you have all of them, from Cawdrey's 1604 A Table Alphabeticall all the way to a hard copy of the ever-contemporary Wiktionary project of Wikipedia. Finally, let us also suppose that you have some interest in the evolution of words over time. After all, why did you go out and buy all those dictionaries in the first place?

Given the number of dictionaries and the number of words, it seems only natural that you would need some knowledge of programming to tackle this project. However, it is here that we run into a problem: how should we best store this immense and complex amount of data? We have many dictionaries, each with many words; could we use what we've learned of lists and strings to store it? We certainly could have a list of strings, but then it would be difficult to associate them with their definitions. This is where the Pythonean dictionary can come in handy!

//dictionaries.py//

code format="python" wiktionary = {}
 * 1) define dictionaries

aTableAlphabeticall = {}

wiktionary['ventricle'] = '''1. (anatomy) One of two lower chambers of the heart.\ 2. (anatomy) One of four cavities in the brain.'''
 * 1) insert words, starting with, naturally, 'ventricle'

aTableAlphabeticall['ventricle'] = 'the stomacke which receiues the meate' code

The trouble is, you've got thousands of dictionaries, and you're trying to make sense of them in relation to each other. This way, you'd have thousands upon thousands of lines of code just devoted to defining dictionaries, and afterwards, you've still got to refer to each of them individually in order to do any kind of analysis. This doesn't sound right. Wouldn't it be great to put all of those dictionaries into some kind of big list, perhaps chronologically arranged, so that you could tackle the data entry and analysis systematically? That's right, it would be great.

Similarly, we very often run into non-trivial data in biology. For instance, you might want a list of binding site positions for each of ten different transcription factors, or we might want to describe the single nucleotide polymorphisms in each of a thousand people. We often want to store several kinds of information about each position in a protein sequence. In order to accomplish these tasks and many others, we need fancier data structures: lists of lists, dictionaries of dictionaries, lists of dictionaries, even lists of dictionaries of dictionaries of lists of dictionaries of lists. Although we try to stay away from the latter.

This morning will be devoted to teaching you how to create fancy-pants data structures and how to combine them with tools that we're already learned, such as loops.

A Bit more on tuples
As we saw yesterday, a tuple is like a frozen list: once you've made a tuple, it is immutable, and you can't change it (although you can make a new one with similar elements. Since the keys of a dictionary need to be immutable, this is useful if you want to key a dictionary based on two things at once. The syntax for a tuple is much like a list, except instead of [square brackets], it's conventional to use (parentheses).

code format="python" mytuple = (1,2) mytuple = 1,2
 * 1) Actually, the parentheses are optional (but helpful for clarity)

mylist = [1,2]

mydict = {} mydict[mytuple] = 'hello' print mydict mydict[mylist] = 'world' print mydict code

Also, if your tuple only has one item, you need to use a comma to make it clear that the tuple is a tuple and not just a value in parentheses:

code format="python" tuple_A = ("Is this a tuple?") tuple_B = ("What about this?", )

print tuple_A, type(tuple_A) print tuple_B, type(tuple_B) code

Tuples actually underlie the syntax for the in-place swap:

code format="python" a = 1 b = 2 a,b = b,a print a, b

a = 1 b = 2 mytuple = (b,a) a,b = mytuple code
 * 1) Is equivalent to saying:

We won't be using tuples extremely heavily, but be aware that if you ever have a function with multiple simultaneous return values, under the hood, it's actually combining them into a tuple.

Sets
A set is another nifty data structure. Like a set in mathematics, it has a bunch of elements with no repeats. To build a set, you pass in a list, and it will automatically remove duplicates

code format="python" myset = set([1,1,2,3,5,8]) print myset

myset = {1,1,2,3,5,8} print myset
 * 1) You can also use the new syntax:

myset = set myset.add(1) myset.add(1) myset.add(2) myset.add(3) myset.add(5) myset.add(8) print myset code
 * 1) or you can build it up as you go along:

Sets make doing Venn diagrams of your data really easy:

code format="python" UCs = set(["UC Berkeley", "UCLA", "UCSF", "UCSB", "UC Riverside", "UC Davis",          "UCSD", "UC Santa Cruz", "UC Irvine", "UC Merced"]) bay_area_schools = set(["UC Berkeley", "Stanford", "UCSF", "San Jose State",          "Santa Clara University", "University of San Francisco" ])

bay_area_UCs = UCs.intersection(bay_area_schools) print "Bay Area UCs", bay_area_UCs non_bay_area_UCs = UCs.difference(bay_area_schools) print "UCs in less awesome places", non_bay_area_UCs bay_area_non_UCs = bay_area_schools.difference(UCs) print "Bay area schools that aren't UCs", bay_area_non_UCs

california_schools = UCs.union(bay_area_schools) code

There are also ways to remove things from a set. Let's say that due to rampant grade inflation and falling academic standards, that school south of here loses it's accreditation:

code format="python" bay_area_schools.remove("Stanford") bay_area_schools.discard("Stanford") code
 * or

The difference between remove and discard is that discard doesn't care whether or not the item is already in the set or not, whereas if the item isn't in the set, remove will give an error.

There are other **[|set]** functions, but this covers most of what I use them for, and should put you in good stead as well.

Lists of lists
Let's create a few lists. Being self-centered, let's make the list of our lives:

code format="python"
 * 1) !/usr/bin/env python

coffee = ['yalis','strada','brewedawakening'] lab = ['stanley','barker','LSA','Koshland','VLSB'] code
 * 1) make a couple lists

As it turns out, you can pretty much put anything into a list that you like, including other lists. Accessing and changing individual elements has a fairly straightforward syntax:

code format="python" importantPlaces = [coffee,lab]
 * 1) make a list of lists!

importantPlaces = 'yalis','strada','brewedawakening'],                  ['stanley','barker','LSA', 'Koshland','VLSB'
 * 1) another way to write it
 * 1) Note that you're allowed to split long lines if it's inside of a parenthesis,
 * 2) bracket, or curly brace

aList = importantPlaces[0] print aList aPlace = aList[0] print aPlace aPlace = importantPlaces[0][0] # aPlace is 'yalis' importantPlaces[0][0] = 'peets'
 * or

anotherPlace = importantPlaces[0][0] code

All of the list operations from this morning still apply:

code format="python" home = ['rockridge','the mission','lake merritt','gourmet ghetto'] importantPlaces.append(home) # important places is now [coffee,lab,home] importantPlaces[2].append('bushrod park') # home now includes 'bushrod park'
 * 1) make a list of lists!

print importantPlaces[2] print home code

Take note of that very last step: although you made a change while naming importantPlaces, you also changed home. That's because they're one and the same: importantPlaces[2] is actually a location in the warehouse, and Python knows when it sees a location to pretend like that thing is inside the list. How would you make importantPlaces carry a copy of home instead of home itself?

code format="python" importantPlaces.append([]) # important places is now [coffee,lab,[]] importantPlaces[2].append(home[0]) importantPlaces[2].append(home[1]) ...
 * 1) make a list of lists!

importantPlaces.append(home[:]) importantPlaces[2][3] = 'alcatel' print importantPlaces print home code
 * or, using a shorthand:
 * 1) replace 'gourmet ghetto' with name of landmark liquor store

This may seem like a subtlety now, and it some ways it is, but keep it in mind when you start coding and start seeing errors. Often a copy is what you want.

Dictionaries of dictionaries
Dictionaries of dictionaries work the same way. Let's return to the example used in the introduction:

code format="python"
 * 1) !/usr/bin/env python

definitions = {}
 * 1) we keep our definitions in a dictionary

definitions['wiktionary'] = {} definitions['aTableAlphabeticall'] = {}
 * 1) and, as above, we have several dictionaries in this dictionary

print definitions

definitions['aTableAlphabeticall']['statue'] = 'an image of wood, or any other matter' definitions['aTableAlphabeticall']['vapor'] = 'moisture, ayre, hote breath, or reaking'
 * 1) and each dictionary is full of definitions

print definitions code

Lists of dictionaries, dictionaries of lists
Can we mix data structures? Of course! The main thing to keep in mind (besides what your data structure actually looks like) is that if you're building something up from scratch, you need to match the square brackets and curly braces:

code format="python" a = [{},{}, ] # a is a list of empty dictionaries b = {'l1': [], 'l2': []} # b is a dictionary of empty lists c = [{]} # c is a Syntax Error! d = {[}] # so is D! code

Finally, everything that we've learned so far can be elaborated to even more complex data structures. After, all, there are often multiple definitions for a word...

code format="python" listOfDictionaries = [{},{}] listOfDictionaries[1]['ventricle'] = [] listOfDictionaries[1]['ventricle'].append('One of two lower chambers of the heart') listOfDictionaries[1]['ventricle'].append('One of four cavities in the brain')
 * 1) let's make the ventricle entry a little more sensible

print listOfDictionaries code

Also, you can put different types of things into the same level of a list or dictionary, but I don't recommend it:

code format="python" list_of_stuff = [[], 1, "Hello", {'name': 'Peter'}]
 * 1) This is totally legal, but just try writing a for loop that handles each one correctly!

for thing in list_of_stuff: print thing code
 * 1) This is about the only thing I can think of that would work on
 * 2) all of the things in that list...

Fancy loops for fancy data structures
You are also allowed to loop over a dictionary. Be aware that it loops over the keys of the dictionary (i.e. the ones you put in brackets to get the values), and that you shouldn't count on any particular ordering to those keys:

code format="python" knights = {'gallahad': 'the pure', 'robin': 'the brave'} for x in knights: print x   print knights[x] code

$**./loops.py** code format="python" gallahad the pure robin the brave code

It turns out that you can put just about any valid Python code you want inside of a for loop. That happens to include other for loops! This is really useful for our fancy nested structures:

code format="python" liLi = 1,2,3],[7,8,9 for x in liLi: print x   for y in x:        print y    print code

$**./loops.py** code format="python" [1, 2, 3] 1 2 3

[7, 8, 9] 7 8 9 code

1. A dictionary of dictionaries.
a) You want to store sequence information from human, mouse, and rat. Each species has a set of gene names corresponding to a set of sequences-- make a dictionary that has keys 'human', 'mouse', and 'rat', with each key having as a value a dictionary with gene names as keys and sequences as values.

The data are: Human genes: 'TallnessGene' has sequence 'AATAGCAG' 'SmartnessGene' has sequence 'TGACGGA'

Mouse genes: 'FuzzynessGene' has sequence 'CCCCCCA' 'BeadyLittleEyesGene' has sequence 'ATAGCGC'

Rat genes: 'FuzzynessGene' has sequence 'CCCTCCA' 'BiggerThanMouseGene' has sequence 'GGACAATT'

b) Using the 'keys' method, print the names of the human genes. c) Print out the sequence of FuzzynessGene in rat and mouse. d) Print the third nucleotide of the human tallness gene.

2. A list of lists.
You want to store the results of three different time series experiments, each with four data points. You should do this by creating a list with three elements. Each of these elements is a four element list. There are a few ways to do this, which are outlined in parts (a), (b), and (c) below.

run1: 2,3,5,5 run2: 2,2,4,5 run3: 3,3,4,6

a) Create an empty list. Use the append method three times to make this a list of three empty lists. For each of these empty lists, use the append method four times to populate the list of lists with the data above.

b) Create a list of three empty lists without using either 'append' or '*' (you will run into a complication of using '*' in this context in the last exercise today). For each of these empty lists, use the append method four times to populate the list of lists with the data above-- feel free to copy and paste this part from your solution to part (a).

c) Create your list of lists, complete with data, all in one line.

3. A dictionary of lists.
Here, you have some structural data for several different species-- you want to store the B-factor of each position in several orthologous proteins. I picture this as a dictionary of lists: the dictionary is keyed with the species name, which has as a value a list that relates the position to the B-factor.

Human: 5,4,6,7 Mouse: 8,12,11,14 Rat: 10,11,13,15

a) Create an empty dictionary. Create three key-value pairs with the names as keys and empty lists as values. Use the append method to populate this dictionary of lists with data.

b) Create the populated dictionary all in one line, as in problem 2c.

4. Run Lola Run
Return to the time point data from problem two. Make three lists called 'run1', 'run2', and 'run3', and from them make a list of lists:

run1 = [2,3,5,5] run2 = [2,2,4,5] run3 = [3,3,4,6]

listOfRuns = [run1,run2,run3]

a) Alter the first element of run1. Has anything happened to listOfRuns? b) Specify listOfRuns similarly, but using copies of run1, run2, and run3 instead of the lists themselves. Now, alter the first element of run1 and see what has happened to listOfRuns.

5. Multiplying Lists.
Use '*' (multiplication) to create a list of three empty lists. Use the append method to add the element "i belong first!" to the first list of the list of lists. Print out the list of lists. What has happened? This is hard-- don't worry about getting a TA.

1. Dictionary of Dictionaries.
code format="python" species = {} species['human'] = {} species['mouse'] = {} species['rat'] = {} species['human']['Tallness'] = 'AATAGCAG' species['human']['Smartness'] = 'TGACGGA' species['mouse']['Fuzzyness'] = 'CCCCCCA' species['mouse']['BeadyLittleEyes'] = 'ATAGCGC' species['rat']['Fuzzyness'] = 'CCCTCCA' species['rat']['BiggerThanMouse'] = 'GGACAATT'
 * 1) !/usr/bin/env python
 * 1) set out top-level dictionary
 * 1) add human data
 * 1) add mouse data
 * 1) add rat data

print species['human'].keys print species['mouse']['Fuzzyness'] print species['rat']['Fuzzyness']
 * b)
 * c)

print species['human']['tallness'][2] code
 * d)

2. Lists of Lists.
code format="python" List = [] List.append([]) List.append([]) List.append([]) List[0].append(2) List[0].append(3) List[0].append(5) List[0].append(5) List[1].append(2) List[1].append(2) List[1].append(4) List[1].append(5) List[2].append(3) List[2].append(3) List[2].append(4) List[2].append(6) print List List = [ [], [], [] ] List[0].append(2) List[0].append(3) List[0].append(5) List[0].append(5) List[1].append(2) List[1].append(2) List[1].append(4) List[1].append(5) List[2].append(3) List[2].append(3) List[2].append(4) List[2].append(6) print List List = 2,3,5,5], [2,2,4,5], [3,3,4,6 print List code
 * 1) !/usr/bin/env python
 * 1) part b
 * 1) !/usr/bin/env python
 * 1) part c
 * 1) !/usr/bin/env python

3. Dictionary of Lists.
code format="python" dict = {} dict['human'] = [] dict['mouse'] = [] dict['rat'] = [] dict['human'].append(5) dict['human'].append(4) dict['human'].append(6) dict['human'].append(7) dict['mouse'].append(8) dict['mouse'].append(12) dict['mouse'].append(11) dict['mouse'].append(14) dict['rat'].append(10) dict['rat'].append(11) dict['rat'].append(13) dict['rat'].append(15) dict = {'human':[5,4,6,7],'mouse':[8,12,11,14],'rat':[10,11,13,15]} code
 * 1) !/usr/bin/env python
 * 1) part A
 * 1) part B

4. Run Lola Run
code format="python" run1 = [2,3,5,5] run2 = [2,2,4,5] run3 = [3,3,4,6] listOfRuns = [run1,run2,run3] run1[0] = 500 print listOfRuns tead of two run1 = [2,3,5,5] run2 = [2,2,4,5] run3 = [3,3,4,6] listOfRuns = [run1[:],run2[:],run3[:]] run1[0] = 500 print listOfRuns code
 * 1) !/usr/bin/env python
 * 1) part A
 * 1) the first element of the first element of listOfRuns is now 500 ins
 * 1) part B
 * 1) list of runs remains the same

5. Multiplying Lists.
code format="python" listOfLists = * 3 print listOfLists listOfLists[0].append('i belong first!') print listOfLists code
 * 1) !/usr/bin/env python
 * 1) the multiplication operation has taken a single empty list and
 * 2) duplicated it-- it's now three copies of the same empty list that are all
 * 3) equivalent to each other.  By changing one, we change all of them.