This blog post is my third and final part of my entry for the 2014 Data Geek Challenge.
Step 1: SAP Lumira Extension: Google Maps
Step 2: Lumira Dataset: Bus Tracker from the Chicago Transit Authority
As described in my previous blog post about the Bus Tracked Dataset (step 2), the City of Chicago in general and the Chicago Transit Authority in particular have made major efforts to leverage IT and deliver a better experience for their customers. In addition, these data are available for developers like us. Let's analyze bus delays in the city of Chicago (the final infographic is available as an attachment).
100% of buses are on-time!
I have collected information from the position and status (delayed / on time) of buses in Chicago over the month of October 2014 (see step 2). My data collection system wasn't really reliable, so the data is pretty spotty. To sum it up, out of 4,261 checks, only 36 buses were delayed. Believe it or not, this represents only 0.85, which means that on average 100% of the buses were on time!
Which routes? What day?
Drilling-down into the available information, we need to figure out if these were exceptions or if any outlier could be found. If you select all the delayed vehicules, group them by route and visualize them in a bar chart, you'll notice that there are no real outliers. This would have potentially helped identify a troublesome intersection or district, but it's not the case here.
However, if you reproduce the same analysis by day, the result clearly identifies Sunday as a sour spot. Looking a little closer, we can see that only 3 out of 7 days are showing. In this particular instance, I believe the reason is that the source data is inconsistent. Since I worked mostly on Sundays on the data source, most data were collected on this day, which doesn't mean most delays were on week-ends.
Where?
I leveraged the Google Maps Custom Extension detailed in step 1 to identify the location of incidents. There's an old saying in the windy city: "There are only 2 seasons: snow and construction". Therefore, I searched for pockets of delays that could be explained by local events. Again, as you can see here, there are no clear clusters of delays in the data I collected.
Why?
Since I couldn't find a clear pattern either in the routes, or in the days, or in the location of the delays, I tried to perform a text analysis using the service bulletins. There is no measurable correlation between delays in bus transit and the events described in the bulletins, but we can easily see here that "Reroute" and "Bus Stop Relocation" are the events with most impact.
Conclusion and Learning Points
Overall, this data visualization exercise didn't reveal unexpected insights, except for the fact that buses in Chicago are almost always on time. That's a pretty good news.
On the other hand, I believe the idea of the Data Geek Challenge is less in the insights than in the learning process, which was my biggest achievement this year. Here are some of my learning points:
I would encourage everybody to give the next Data Geek Challenge a try. It's a great way to have fun while learning.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
5 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 |