Handling Large CSV Files in the Browser

Techniques for working with datasets of 100K+ rows in the browser without freezing the UI or exhausting memory.

The Challenge

A CSV file with 100,000 rows and 20 columns contains roughly 2 million cells. Rendering all of them as DOM elements would create a document with millions of nodes, consuming hundreds of megabytes of memory and taking seconds (or minutes) to render. The browser would become unresponsive, scrolling would stutter, and users would leave.

Yet there are legitimate reasons to work with large datasets in the browser: privacy (no server upload), speed (no network round-trip), and convenience (no software to install). The key is using the right techniques to keep the browser responsive while handling data volumes it was never designed for.

Virtual Scrolling

Virtual scrolling (also called windowed rendering) is the most important technique for displaying large datasets. Instead of rendering every row, you render only the rows that are currently visible in the viewport -- typically 20 to 50 rows. As the user scrolls, the visible rows are swapped out and replaced with new ones.

The illusion of a full scrollable list is maintained by a container element whose height is set to the total height of all rows. Only a small window of actual row elements exists at any time, positioned absolutely within this container.

How Virtual Scrolling Works

1. Container: A div with height: totalRows x rowHeight creates the scrollbar.

2. Viewport: A fixed-height container with overflow-y: auto handles scrolling.

3. Window: Only 20-50 row elements exist, positioned at top: startIndex x rowHeight.

4. Scroll handler: On scroll, recalculate which rows should be visible and update the DOM.

Libraries like TanStack Virtual, react-window, and react-virtuoso implement this pattern efficiently.

Overscan

Overscan renders extra rows above and below the visible area (typically 5-20 rows). This ensures that fast scrolling does not briefly show blank space while new rows are being rendered. It is a trade-off between smoothness (more overscan) and performance (less overscan).

Fixed Row Height vs. Variable Height

Virtual scrolling works best with fixed row heights because the position of any row can be calculated instantly: position = rowIndex * rowHeight. With variable heights, the virtualizer must measure each row, which is more complex and slightly slower. For data grids, fixed-height rows (typically 36-40px) are standard.

Streaming Parsing

Before you can display data, you must parse it. Parsing a 50 MB CSV file synchronously on the main thread will block the UI for several seconds. Streaming parsers solve this by processing the file in chunks, yielding control back to the browser between chunks so the UI remains responsive.

PapaParse, the most popular JavaScript CSV parser, supports streaming natively. When streaming is enabled, PapaParse reads the file chunk by chunk, calling a callback for each chunk of rows. This means you can start displaying data before the entire file is parsed.

Streaming Parse Pattern

Papa.parse(file, {
  worker: true,      // Parse in a Web Worker
  step: (row) => {   // Called for each row
    rows.push(row.data);
  },
  complete: () => {  // Called when done
    setData(rows);
  }
});

Web Workers

Web Workers run JavaScript in a background thread, separate from the main UI thread. Offloading CSV parsing to a Web Worker ensures the UI never freezes, even for very large files. The worker parses the data and sends the results back to the main thread via message passing.

PapaParse has built-in Web Worker support: setting worker: true in the config automatically spawns a worker for parsing. For custom processing (sorting, filtering, statistics), you can create your own workers.

Structured Cloning Overhead

Data passed between the main thread and a Web Worker is copied via the structured clone algorithm. For large arrays of strings, this copy can itself take significant time and memory. For datasets over 500K rows, consider keeping the data in the worker and only transferring the visible window of rows when the user scrolls.

Memory Management

JavaScript strings are UTF-16, meaning each character uses 2 bytes. A CSV with 1 million cells, averaging 10 characters each, requires at least 20 MB just for the string data, plus overhead for the array structures. In practice, a 50 MB CSV file will consume 150-300 MB of JavaScript heap memory when fully parsed into arrays.

Strategies for Reducing Memory

Typed arrays for numeric data: If a column is purely numeric, store it as a Float64Array instead of an array of strings. This uses 8 bytes per value instead of ~30+ bytes for a string representation.
String interning: If a column has a small set of repeating values (like status codes or country names), intern them: store each unique string once and reference it by index.
Lazy parsing: Parse only the first N rows immediately and parse the rest on demand as the user scrolls.
Discard unused columns: If the user only needs 5 of 50 columns, offer a column selector and discard the rest.

Sorting and Filtering at Scale

Sorting 100K+ rows in JavaScript is fast -- Array.prototype.sort uses TimSort (O(n log n)) and can sort 1 million simple string comparisons in under 500ms. However, sorting triggers a complete re-render of the virtual list, which can cause a visible delay.

Filtering with Array.prototype.filter is O(n) and typically very fast. For interactive search (filtering as the user types), debounce the input by 200-300ms to avoid re-filtering on every keystroke.

Performance Benchmarks (Chrome, M3 Mac)

Parse 100K rows (PapaParse)~200ms

Sort 100K rows (string comparison)~80ms

Filter 100K rows (substring match)~15ms

Render 30 visible rows (virtual scroll)~2ms

Practical Limits

While the techniques described here push the boundary, there are practical limits to what a browser can handle:

~500K rows: Comfortable limit for most devices. Parsing, sorting, and virtual scrolling work well.
~1M rows: Possible but requires careful memory management. Sorting may take 1-2 seconds. Mobile devices may struggle.
~5M+ rows: Approaching browser memory limits (typically 2-4 GB per tab). Consider server-side processing or a database.

Our CSV Viewer uses TanStack Virtual for virtual scrolling with a fixed row height of 36px and 20 rows of overscan. It handles 100K+ rows comfortably on modern devices.