NYC Mobility
The MTA O-D dataset is the cleanest public reconstruction of NYC's transit circulatory system that exists, built from probabilistic inference on entry-only turnstile data. The TLC trip records are a separate beast — too big to serve live, distributed as Parquet, and the subject of an ongoing pipeline-design effort documented in the TLC placeholder story.
Stories
The subway tide
Four million weekday riders. The MTA used to know where they boarded but not where they got off — turnstiles only read entries. Then they built an algorithm. The cleanest public view of NYC's transit circulatory system that has ever existed.
The taxi data is coming
1.5 billion rows of NYC taxi trips. The largest mobility dataset any U.S. city publishes — and the first to include the new Manhattan congestion-toll field. Why it doesn't fit our live-Socrata pattern, and what the planned pipeline looks like.
Datasets
MTA Subway Origin-Destination
The MTA's algorithmic reconstruction of where 4M daily subway riders actually go. Turnstiles only capture entries; exits are probabilistically inferred from each rider's next entry. The cleanest public view of NYC's transit circulatory system.
NYC TLC taxi trip records
One-and-a-half billion yellow / green / FHV trips since 2009. Stories use build-time DuckDB aggregates. The Playground tab runs DuckDB WASM in the browser — ad-hoc SQL against remote Parquet, no server required.