I'm running a petition on causes.com, and I wanted to do some geographic analysis of the results. Causes.com gives me information about the signers with regard to their name, email address, country and zip code. The country information was useful for doing the first phase of the analysis, which I wrote about in the Lumira space. However, I wanted to do a bit more fine grained analysis, and that means I needed to convert that zip code information into something more directly useful (e.g., city, county and state for the United States). There's 1500 data points though, so I need an automated method. Google Geogoding API to the rescue.
The Google Geocoding API is a REST services, so I created a REST client in PowerBuilder.Net using the following URL:
http://maps.google.com/maps/api/geocode/xml?components=country:USA|postal_code:{zip}&sensor=false
Where {zip} represents the zip code that I'll be passing in as an argument. Most REST services don't have a WSDL you can use to create client data types from. For the PowerBuilder.Net REST client perhaps the simplest way to get it to be able to determine what data types to create is to provide it with a sample response from the service. What I provided was the following:
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
<status>OK</status>
<result>
<type>postal_code</type>
<formatted_address>Azusa, CA 91702, USA</formatted_address>
<address_component>
<long_name>91702</long_name>
<short_name>91702</short_name>
<type>postal_code</type>
</address_component>
<address_component>
<long_name>Azusa</long_name>
<short_name>Azusa</short_name>
<type>locality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Los Angeles</long_name>
<short_name>Los Angeles</short_name>
<type>administrative_area_level_2</type>
<type>political</type>
</address_component>
<address_component>
<long_name>California</long_name>
<short_name>CA</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
<address_component>
<long_name>United States</long_name>
<short_name>US</short_name>
<type>country</type>
<type>political</type>
</address_component>
<geometry>
<location>
<lat>34.2667473</lat>
<lng>-117.8545867</lng>
</location>
<location_type>APPROXIMATE</location_type>
<viewport>
<southwest>
<lat>34.1031590</lat>
<lng>-118.0513950</lng>
</southwest>
<northeast>
<lat>34.3791980</lat>
<lng>-117.6556881</lng>
</northeast>
</viewport>
<bounds>
<southwest>
<lat>34.1031590</lat>
<lng>-118.0513950</lng>
</southwest>
<northeast>
<lat>34.3791980</lat>
<lng>-117.6556881</lng>
</northeast>
</bounds>
</geometry>
</result>
</GeocodeResponse>
Note that the information I want comes in the "administrative_area_level_1" (State), "administrative_area_level_2" (County) and "locality" (City) address component types. I'm also going to grab the approximate lattitude and longitude coordinates from the geometry portion of the response.
With the REST proxy created, I create a WPF application that has a datawindow that reads the CSV data that causes.com gives me and then loops through it adding the information from the Google Geocoding API. The main code of interest is as follows.
long ll_index, ll_count
string ls_zip, ls_lat, ls_long, ls_country, ls_county, ls_city, ls_state
googlegeo_proxy proxy
GeocodeResponse response
ll_count = dw_1.RowCount()
proxy = create googlegeo_proxy
FOR ll_index = 1 to ll_count
ls_country = dw_1.Object.country[ll_index]
if ls_country <> "United States" THEN CONTINUE
ls_zip = dw_1.Object.zip[ll_index]
if Len ( ls_zip ) = 4 then
ls_zip = '0' + ls_zip
end if
response = proxy.GetMessage ( ls_zip )
if response.status = "OK" then
ls_lat = String ( response.result.geometry.location.lat )
ls_long = string ( response.result.geometry.location.lng )
System.Collections.IEnumerator enum
enum = response.result.address_component.GetEnumerator()
GeocodeResponseResultAddress_component address
do while enum.MoveNext()
address = enum.Current
Choose CASE address.@type[1]
CASE "administrative_area_level_1"
ls_state = address.long_name
CASE "administrative_area_level_2"
ls_county = address.long_name
CASE "administrative_area_level_3"
ls_city = address.long_name
CASE "locality"
ls_city = address.long_name
END choose
loop
dw_1.Object.lattitude[ll_index] = ls_lat
dw_1.Object.longitude[ll_index] = ls_long
dw_1.Object.state[ll_index] = ls_state
dw_1.Object.county[ll_index] = ls_county
dw_1.Object.city[ll_index] = ls_city
else
dw_1.Object.lattitude[ll_index] = response.status
end if
// Wait 2 seconds or Google will put us into OVER_QUERY_LIMIT condition
Sleep ( 2 )
NEXT
The zip codes were imported as numbers, so I'm prefixing them with 0 of they are only four digits long. I'm using an enum to loop through the address_components because PowerBuilder.Net doesn't offer a particularly easy way of requesting a specific element in the collection. And referencing address.@type[1] is a bit of a hack, as there are a number of type attributes returned and I'm assuming the one I want is always the first one. That code will break if they don't always come back in the order I expect.
The last thing you might note is that I do a Sleep(2) between calls to the API. Google places some restrictions on calls to their APIs to prevent abuse, often in terms of total number of calls or calls per day. In the case of the Geocoding API, the restriction is that you can't call it more often than once every 2 seconds. The Sleep makes sure that I don't exceed that limit and start getting overy query limit error response.
And here's the results on a state by state basis for the United States.
Unfortuantely, Lumira was unable to recognize enough of the city and county names that Google provided that it make trying to do the analysis on any finer detail rather difficult.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
12 | |
7 | |
4 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
2 |