Sunday, December 14, 2008

Quick primer on sqlite

The sqlite database I created of the US Mayors projects might be a little opaque to anyone who is not familiar with databases, but it's actually quite simple to access. I work almost exclusively in python these days so I'll give you a little example for loading the file and getting on with the querying.

Example python session:

$ python
>>> import sqlite # Get access to the sqlite library
>>> conn = sqlite.connect('mayors-data') # Connect to the database file
>>> curs = conn.cursor() # Get a database cursor

>>> # Execute a query against the database
>>> curs.execute("select pr.name, count(*) from programs pr join projects
pj on pr.program_id = pj.program_id group by pr.name")

>>> rows = curs.fetchall() # Retrieve all values from the query
>>> histogram = dict(rows) # Turn the tupled values into a dictionary
>>> print histogram # Show the result
{'Airport': 366.0,
'Amtrak': 41.0,
'CDBG': 2412.0,
'Energy': 942.0,
'Housing': 431.0,
'Public Safety': 979.0,
'Schools': 652.0,
'Streets/Roads': 2404.0,
'Transit': 628.0,
'Water': 2536.0}

That works, but let's go one step better and turn those decimal values into integers, since there is really no reason to consider a fraction of a project.

>>> histogram = dict((name,int(count)) for name, count in rows)
{'Airport': 366,
'Amtrak': 41,
'CDBG': 2412,
'Energy': 942,
'Housing': 431,
'Public Safety': 979,
'Schools': 652,
'Streets/Roads': 2404,
'Transit': 628,
'Water': 2536}

You now have a dictionary of the 10 program types. The name of each type maps to the number of projects of that type.

You could print one of them out like this:

>>> print histogram['Schools']
652

Happy hacking!

No comments: