topic · 2 stories · 2 datasets

NYC Mobility

The MTA O-D dataset is the cleanest public reconstruction of NYC's transit circulatory system that exists, built from probabilistic inference on entry-only turnstile data. The TLC trip records are a separate beast — too big to serve live, distributed as Parquet, and the subject of an ongoing pipeline-design effort documented in the TLC placeholder story.

Stories

scrollytelling 2026-05-10

The subway tide

Four million weekday riders. The MTA used to know where they boarded but not where they got off — turnstiles only read entries. Then they built an algorithm. The cleanest public view of NYC's transit circulatory system that has ever existed.

article 2026-05-10

The taxi data is coming

1.5 billion rows of NYC taxi trips. The largest mobility dataset any U.S. city publishes — and the first to include the new Manhattan congestion-toll field. Why it doesn't fit our live-Socrata pattern, and what the planned pipeline looks like.

Datasets

MTA · NYS Open Data · Socrata

MTA Subway Origin-Destination

The MTA's algorithmic reconstruction of where 4M daily subway riders actually go. Turnstiles only capture entries; exits are probabilistically inferred from each rider's next entry. The cleanest public view of NYC's transit circulatory system.

livemobilitysql

NYC TLC · Parquet · DuckDB WASM 1,500,000,000 rows

NYC TLC taxi trip records

One-and-a-half billion yellow / green / FHV trips since 2009. Stories use build-time DuckDB aggregates. The Playground tab runs DuckDB WASM in the browser — ad-hoc SQL against remote Parquet, no server required.

livemobilityparquetlargeduckdb