It started innocently one spring day, when I noted a mapping data discrepancy. The display showed a bus stop with a shelter that no longer exists. Mapping errors occur in data collection and maintenance on a regular basis, and not everyone notices or feels obliged to right the scales of justice. Or tweak the maps, anyway.
Being between major cases, I thought it would be a trivial matter to supply a tip to the global authorities, not even expecting a reward. My flag was ignored so I took the plunge and began the casework. Little did I know how many bus stops and routes I would encounter trying to right one wrong.
The field location records showed 2 stops, one in each direction, except some duplication or redundancies had crept into the matrix. Ah, I thought, removing duplicate values merely required a unique foreign key, no big deal. Looking closer, the node were not exact copies; one was a “stop” and the other a “platform.” Prior art, or working at cross purposes, I didn’t know. Digging deeper, I found inconsistent record-keeping. For a logical detective, the differences can be clues to the culprit, or at least to repair the road.
Multiple nodes for one location.
For this tale, I am glossing over routes that represent roadways between stops, focusing on the stops or nodes. In theory, all travel stops/nodes have a route coming or going in at least one direction. Two nodes should be there if the routes go in both directions, one on each side. In the case I examined, one of 2 nodes was removed, and the other moved locations. How did I find the duplicate or missing keys? Short answer: export to a spreadsheet, database or other tool.
The transit management entity kindly provides a geographic information system interface through the big mapping service ESRI. After investigating the portal contents, I found the “export” or “ex-filtration” option, picking key fields like name, stop number, routes served, and, well, location. The last element turned out to be painful, using a reference system in meters instead of the vulgar latitude and longitude many use. With data available for my own analysis, I turned on the SQL tap and examined the data more closely.
Generally, I would employ the “having count(*) >=2” clause to determine uniqueness, or lack thereof. The results showed me many duplicates, but also data gaps where a node had no ID. Then there were the typos. And the misplaced points. And the phantom records.
Finally, after recognizing the historic problem scope, I dug back to the official portals, finding a non-GIS portal that listed routes between nodes. I recognized that there were discrepancies in the official data streams, not just in the OpenStreetMap unofficial repository. Now instead of 2 data stores being out of synch, I found 3 that differed among themselves.
To complete the initial case review, I needed to locate a few points in the official records to document the state (no pun intended) as something to ponder for reaction.
The topology of transit routes, being bound by physics, generally has vehicles moving on streets in one direction, sometimes easily shown as “east-bound” or “west-bound”. Finding a westbound stop on an eastbound route is likely an error. One set of nodes I found was near a parking facility, where the route might make a loop rather than continuing in a line. At either route end, there would be a type of loop-back; within the route loops might be into residential or office complexes for passenger convenience. Those tight loops would be documented so the passengers can find valid stops.
In my local data store, I tried to add a uniqueness constraint to the column containing stop serialization. If there were redundancies, this statement should fail. It passed, meaning these data are consistent.
ALTER TABLE "bus-stops-20250611"
ADD CONSTRAINT stop_uniq UNIQUE (field_4)
;
Looking into the routes that cross a node, one error in the source was missing intra-field separators. Mostly commas, but also semi-colons, leading to this quality assurance review query ("find any routes over 3 characters not containing separators"). Not too many, but some errors in this source.
select
distinct field_2
from
"bus-stops-20250611"
where
length (field_2) > 3
and
field_2 not like '%,%'
and
field_2 not like '%;%'
order by
field_2
;
"On top" of OSM data are query tools, and I used Turbo Overpass [ ] to look at crowd-sourced transit data. Unlike the "official" download above, there were redundant records looking at the highway/bus stop column.
node[highway=bus_stop]["ref"]({{bbox}});
for(t["ref"]) {
if(count(nodes) > 1) {
out tags center;
}
}
Very fine image return showing 221 key collisions:
Duplicate node query results
Source of the above logic: https://community.openstreetmap.org/t/overpass-queries-finding-duplicate-values/121436/6
As my first encounter with the grammar of Overpass etc., I liken the structure to Perl and awk, as there are regular expression concepts, and not so much like SQL.
db=> \d "bus-stops-20250611"
Table "public.bus-stops-20250611"
Column | Type | Collation | Nullable | Default
---------+-------------------+-----------+----------+--------------------------------------------------
id | integer | | not null | nextval('"bus-stops-20250611_id_seq"'::regclass)
field_1 | character varying | | |
field_2 | character varying | | |
field_3 | character varying | | |
field_4 | character varying | | |
field_5 | character varying | | |
field_6 | character varying | | |
field_7 | character varying | | |
field_8 | character varying | | |
Indexes:
"bus-stops-20250611_pkey" PRIMARY KEY, btree (id)
"stop_uniq" UNIQUE CONSTRAINT, btree (field_4)
Eastbound or westbound stops would usually be on opposite sides of the road. Unless there is a service loop, like the Park + Ride in these images:
Example stops westbound
Example stops eastbound
While the different directions mainly show eastbound (as Eb) or westbound (as Wb), the service loop stop in both directions says westbound.
"EASTERN BLVD & ESSEX PARK & RIDE wb", "OR, 40, 59, 62, 160", "Yes", 7585, "", -8514327.581, 4765283.076800004,
On the ground, a different perspective. Stop 7597 advertises the 40 stops there.
"EASTERN BLVD & VIRGINIA AVE eb", "OR, 59, 62, 160", "No", 7597, "", -8514337.1545, 4765238.0484
Transit stop sign 7597
Case in progress, now waiting for contacts to collaborate on resolving the case. Until and unless the data structures more closely align with expected content, the visualizations may be skewed like this:
https://overpass-api.de/api/sketch-line?network=BaltimoreLink&ref=OR&operator=
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
10 | |
8 | |
7 | |
6 | |
6 | |
4 | |
4 | |
4 | |
4 | |
4 |