Above some arbitrary threshold, each police department reports public statistics about crimes reported within their jurisdiction. The Hyattsville City Police are no exception and will mail out a synopsis of each crime in a weekly e-mail. This e-mail is usually about a week behind the actual events and comes out in text and PDF formats. From what I can see, police departments have varying levels of sophistication when it comes to publishing their crime blotter, but a common theme is that none of them seem to publish it in a programmatically interesting way.
When I moved to Hyattsville in 2003, I was a little concerned about the crime in PG County. I started poking around the electronic services available for the city and soon found out how to get the crime blotter e-mailed to me. Having seen the Chicago Police Department's google mashup, I figured it was time for Hyattsville to get the same thing. It works great (notwithstanding persistent TIGER geocoding problems) and judging by my logs I serve quite a few people who are visualizing their local crime situation. But, I'm sad to report, it's currently broken and I'm behind on at least 3 or 4 crime reports.
I'm screen scraping. Screen scraping is the lowest of the low, the option of last resort, when you are trying to acquire data from some source. Screen scraping is essentially the practice of writing a program that pretends to be human and "reads" the screen, just as a human would while sitting in front of a browser window. It's rife with problems and prone to failure. It's brittle. Somewhere in the chain from Hyattsville PD and my e-mail inbox, someone adjusted some white space or something about the format of the reports earlier this month and that broke the whole process.
Why do I screen scrape? It's a matter of structure. The crime reports have structure and humans are very good at parsing arbitrary structures. But ill-defined structures are very difficult to properly capture in a program. There are tricks to defend against failures due to subtle changes, but the real solution is to define an interface between the two endpoints.
Someone has done this. CrimeReports.com currently provides an interface for law enforcement to publish their crime reports, and they'll essentially handle the rest. It's an attractive idea for small time departments who don't have the budget to run a web site but have citizens who find these services essential. The trick behind CrimeReports is that they charge the police a fee for this service. They want $50-$200 per month from each individual department.
But this is old school. The Web calls for something more social, more distributed. What we need is to define a standard interface for the crime report and make it easy for police departments to publish a well defined document with the crime information. It's silly for them to pay a web site for this information -- the information is what's valuable! Without crime reports there is no CrimeReports.com. Once reports are published in the same format, probably on some sort of RSS style feed, the inventive programmers of the Web will take care of the rest, all at no recurring cost to the local police.
8 hours ago
3 comments:
Would you be willing to share your data? Colin - SpotCrime.com
For the right cause, sure. I don't know that I have time to make it useful generally (even though I'm essentially arguing for some kind of generally useful crime blotter schema) but I can extract it from my postgres database.
Our project is free to the public, and delivers crime alerts by email. Addtionally, we don't charge the police department for the service. Eventually, we intend to support the project through advertising on the emails.
if you are willing to share the data, we'll map it.
Post a Comment