Thunder Bay Transit publishes GTFS schedule and realtime feeds — the same standard behind Google Maps and NextLift. Those apps show where buses are now and discard the data. We store every position, delay, and cancellation as a raw event.
All metrics are measured at timepoint stops (timepoint=true in GTFS) — the 3–5 schedule checkpoints per direction where the timetable is enforced. This matches how agencies compute performance internally.
“Without an explicit SLO, users often develop their own beliefs about desired performance, which may be unrelated to the beliefs held by the people designing and operating the service.”
Google is the company who defined the GTFS transit standard and are known for running complex systems with legendary reliability. The SRE Handbook is very influential to how I think about building and operating software systems.
The handbook calls its metrics service level indicators (SLIs) — the same idea as a KPI, but focused on what the user experiences rather than what the operator reports. An indicator becomes a service level objective (SLO) when stakeholders commit to a target: not just “we track on-time performance” but “we agree 75% is the floor.”
Baseball stats can’t tell you who wins tonight. Transit metrics are the same — they can’t say whether every rider got where they needed to be. What they can do is show whether the system is trending in the right direction over time.
A trip is "on time" if it arrives within 1 minute early to 5 minutes late of schedule. This is the standard window used by most North American agencies.
Early departures are penalized because if a bus leaves a stop before the scheduled time, you miss it — that's worse than a bus running late.
Typical range for a mid-size Canadian city: 65–85%. Above 90% is world-class. Below 60% indicates a systemic problem.
This metric measures how evenly spaced buses are along a route. If buses arrived exactly on schedule, they'd be evenly spaced and this number would be zero. In practice, buses bunch up or leave big gaps — the higher the number, the more unpredictable your wait becomes.
Think of it like darts. The bullseye is the scheduled headway. Each bus arrival is a throw. Low covariance means the darts cluster around the bullseye — buses arrive close to when they should. High covariance means darts scattered across the board — arrivals are all over the place.
Below 0.3 is good — riders perceive the service as regular. A Cv near zero isn’t the goal either — that would be like standing too close to the dartboard, where hitting the bullseye says nothing about your aim. Some variance is inevitable and healthy. Above 0.5, gaps feel random and riders lose trust in the system.
Excess Wait Time seems to be the most important operation metric. Transit managers publish a schedule — that schedule is a promise. EWT measures how many extra minutes you actually wait beyond that promise because buses aren’t arriving at regular intervals. It’s the gap between what was committed to and what was delivered, which is why transit agencies worldwide treat it as a north-star metric (TransitCenter).
If a route runs every 15 minutes, you'd expect to wait 7.5 minutes on average. But if two buses arrive together and then nothing comes for 30 minutes, the average headway is still 15 minutes — yet the experience is far worse. EWT captures exactly this gap.
Why does EWT matter more than average delay? Because it counts people, not just buses. A long gap doesn't just mean one late bus — it means every person who showed up during that gap is standing at the stop, waiting. The longer the gap, the more people accumulate, and the longer each of them waits. EWT weights gaps by the number of riders they actually affect, making it a social metric: it measures total human time wasted, not just vehicle timing.
The two timelines below have the same average headway (15 min), but the rider experience is very different:
The calculation squares each gap, sums those squares, and divides by twice the total time. This gives the wait a random rider actually experiences. Subtract the scheduled wait (half the planned headway) and the remainder is the excess. Bigger gaps get disproportionate weight because more riders accumulate during them — double the gap, quadruple the rider-minutes lost.
Note: EWT and Cv are measuring the same thing.
EWT uses units relevant to transit
Cv uses dimensionless values
They are both measures for how accurately busses arrive compared to schedule. They punish big misses much more compared to simple averages, which better relates to how people will preceive the service.The route finder uses RAPTOR (Round-bAsed Public Transit Optimized Router), an algorithm developed at Microsoft Research for computing optimal multi-leg transit journeys. Unlike graph-based approaches, RAPTOR works directly on the timetable — scanning routes round by round, where each round adds one more vehicle. Round 1 finds all destinations reachable by a single bus, round 2 finds journeys with one transfer, and so on.