dataset · NYC TLC · Parquet

NYC TLC taxi trip records

1.5 billion yellow / green / FHV trips since 2009, distributed as monthly Parquet files. Stories use build-time DuckDB aggregates. The Playground tab runs DuckDB WASM in your browser — queries execute locally against remote Parquet, no server involved.

No rows returned.

How this works: DuckDB WASM runs a full OLAP database engine in your browser via WebAssembly. The ~2 MB bundle is lazy-loaded when you open this tab.

Parquet queries: The read_parquet('url') function uses DuckDB's httpfs extension to fetch byte ranges from remote Parquet files — it only downloads the columns and row groups it needs. This requires the Parquet server to send CORS headers. The TLC CloudFront CDN does not; production use requires mirroring to an R2 bucket with CORS enabled.

Sharing: The Run button encodes your SQL in the URL as ?q=base64 — shareable links work without a backend.