Database Seeding: Dev & Test Data Guide
Generate SQL seed scripts and JSON fixtures for Prisma, Drizzle, and raw SQL workflows.
Database seeding is the process of populating a database with an initial set of data. Every development team needs it: local development environments need realistic data to build against, test suites need predictable datasets to assert against, staging environments need enough volume to simulate production behavior, and new team members need a single command that gives them a working database.
Without a seed strategy, teams end up manually inserting rows through a GUI, sharing SQL dumps over Slack, or worse — pointing local apps at a shared development database. All of these approaches are fragile, non-reproducible, and create friction every time someone sets up a new environment.
This guide covers the three main approaches to database seeding, shows you how to generate ready-to-use SQL and JSON seed files with our Mock Data Generator, and walks through best practices that keep your seeds maintainable as your schema evolves.
Seeding Strategies Compared
There are three common approaches to generating seed data, each with distinct trade-offs. The right choice depends on your team size, data sensitivity requirements, and how closely your test data needs to mirror production.
| Strategy | Privacy | Realism | Reproducibility | Maintenance |
|---|---|---|---|---|
| Hardcoded Fixtures | Excellent | Poor | Excellent | High effort |
| Generated Data | Excellent | Good | Good (with seeds) | Low effort |
| Production Copies | Poor | Excellent | Varies | Medium effort |
Hardcoded fixtures are hand-written JSON or SQL files committed to your repository. They are perfectly reproducible and contain no real user data, but they become tedious to maintain as your schema changes — every migration means updating every fixture file. They also tend to be small and unrealistic: "John Doe" and "test@example.com" do not exercise edge cases like Unicode names, long email addresses, or varied phone formats.
Generated data uses a library like Faker.js (or a tool like our Mock Data Generator) to produce realistic values programmatically. When you use a deterministic seed, the output is reproducible across runs and machines. Generated data adapts instantly when you add or remove columns — you change the schema definition, not dozens of fixture rows.
Production copies give you the most realistic data but introduce serious privacy and compliance risks. Sanitizing production data is non-trivial: you need to scrub PII from every table, handle foreign key relationships, and ensure the anonymization is irreversible. Most teams that start with production copies eventually migrate to generated data to avoid GDPR and SOC 2 headaches.
SQL Seed Scripts
The most direct way to seed a database is with raw SQL. A seed script typically contains a CREATE TABLE statement (or uses IF NOT EXISTS to be idempotent) followed by INSERT INTO statements with the seed data. This works with PostgreSQL, MySQL, SQLite, and any SQL-compatible database.
Our SQL export format infers column types from your field definitions: UUID fields become UUID, price fields become DECIMAL(10,2), date fields become TIMESTAMP, and text fields become VARCHAR(255). The result is a valid DDL + DML script you can run directly with psql, mysql, or any SQL client.
Users Table
A standard users table with UUID primary keys, unique email constraints, and realistic personal information. This is the most common seed table — nearly every application has one.
Products Table
An e-commerce products table with generated SKUs (using the API key type for unique alphanumeric strings), decimal prices, and category assignments. The SKU field uses a UNIQUE constraint to match real-world product catalog requirements.
You can run these scripts directly against your database. For PostgreSQL: psql -d mydb -f users_seed.sql. For MySQL: mysql mydb < users_seed.sql. For more complex seeding scenarios with foreign keys across multiple tables, see our E-Commerce Test Data guide which covers multi-table relational seeding.
ORM-Based Seeding with JSON
Modern TypeScript ORMs like Prisma, Drizzle, and TypeORM support programmatic seeding where you import a JSON file and pass it to a bulk insert function. This approach is popular because it keeps your seed data in a format that is easy to generate, version-control, and diff. The ORM handles type coercion, connection management, and transaction wrapping.
Start by exporting your schema as JSON instead of SQL. Here is the same users dataset in JSON format, ready for import:
Prisma Seed File
Prisma supports a dedicated prisma/seed.ts file that runs with npx prisma db seed. Import your generated JSON and use createMany for efficient bulk insertion:
import users from './seeds/users.json';
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function main() {
await prisma.user.createMany({ data: users });
}
main()
.then(() => console.log('Seed complete'))
.catch(console.error)
.finally(() => prisma.$disconnect());Add the seed command to your package.json so Prisma knows how to run it:
{
"prisma": {
"seed": "ts-node prisma/seed.ts"
}
}Drizzle Seed File
Drizzle ORM uses a similar pattern. Define your seed script and run it as part of your setup process:
import { db } from './db';
import { users } from './schema';
import seedData from './seeds/users.json';
async function seed() {
await db.insert(users).values(seedData);
console.log('Seed complete');
}
seed();TypeORM Seed Pattern
TypeORM does not have a built-in seed command, but the community convention is to use a migration or a standalone script with the DataSource API:
import { AppDataSource } from './data-source';
import { User } from './entities/User';
import seedData from './seeds/users.json';
async function seed() {
await AppDataSource.initialize();
await AppDataSource.getRepository(User).save(seedData);
await AppDataSource.destroy();
}
seed();Regardless of which ORM you use, the workflow is the same: generate JSON from the Mock Data Generator, save it as a fixture file, and import it in your seed script. When your schema changes, regenerate the JSON — no manual editing required.
Seed Data Best Practices
Make Seeds Idempotent
A seed script should be safe to run multiple times without creating duplicate data or crashing on unique constraint violations. Use INSERT ... ON CONFLICT DO NOTHING (PostgreSQL) or INSERT IGNORE (MySQL), or truncate the table before inserting. In ORM land, use upsert or wrap the insert in a check:
// Prisma upsert pattern for idempotent seeding
for (const user of users) {
await prisma.user.upsert({
where: { email: user.email },
update: {},
create: user,
});
}Use Deterministic Seeds for Reproducibility
When seed data is generated programmatically, always use a fixed random seed so every developer and CI run produces identical data. Our generator uses deterministic seeding by default — the same seed number always produces the same output. This makes failing tests debuggable: you can reproduce the exact dataset that caused the failure.
Respect Foreign Key Ordering
When seeding multiple tables with foreign key relationships, insert parent tables before child tables. If your orders table references users.id, seed users first. For teardown, reverse the order: delete child records before parent records. Most ORMs handle this automatically when you use transactions, but raw SQL scripts need explicit ordering.
Scale Seeds by Environment
Local development needs just enough data to fill a UI — 10 to 50 rows per table is usually sufficient. Staging environments may need thousands of rows to test pagination, search performance, and bulk operations. Use an environment variable to control seed volume:
const ROW_COUNT = process.env.SEED_SIZE === 'large' ? 5000 : 25;Integrate Seeds into CI
Add your seed command to your CI pipeline so every test run starts with a known dataset. This eliminates "works on my machine" issues caused by different developers having different local data. A typical CI step looks like:
# GitHub Actions example
- name: Seed database
run: npx prisma db seed
env:
DATABASE_URL: postgresql://test:test@localhost:5432/testdbCommon Pitfalls
Unique Constraint Violations
The most common seeding failure. If your email column has a UNIQUE constraint and your seed data contains duplicate emails, the insert will fail. Generated data with the unique flag enabled avoids this — our generator guarantees uniqueness within a dataset. For idempotent re-runs, use upsert patterns or truncate before inserting.
Foreign Key Order Mistakes
Inserting an order that references user_id = 'abc-123' before that user exists will throw a foreign key violation. Always seed tables in dependency order. If you have circular dependencies (rare but possible), temporarily disable foreign key checks during seeding:
-- PostgreSQL
SET session_replication_role = 'replica'; -- Disables FK checks
-- ... run your inserts ...
SET session_replication_role = 'origin'; -- Re-enables FK checksTimezone Issues in Date Fields
Seed data with dates stored as strings can behave differently depending on the database's timezone setting. A timestamp like 2025-03-15T14:30:00 without a timezone suffix will be interpreted as UTC by PostgreSQL but as the server's local timezone by MySQL. Always include timezone information in generated dates, or use TIMESTAMP WITH TIME ZONE columns. Our generator outputs ISO 8601 dates with timezone suffixes by default.
Character Encoding Surprises
If your seed data includes names with accents (e.g., "Jose"), emoji, or CJK characters, make sure your database connection uses UTF-8 encoding. SQL files should be saved as UTF-8 without BOM. Most modern databases default to UTF-8, but legacy MySQL installations may still use latin1, which silently truncates multi-byte characters.
Validating Seed Data
Before running a seed script against your database, validate the data format. For JSON seeds, run the file through our JSON Formatter & Validator to catch syntax errors, trailing commas, and encoding issues. For SQL seeds, most databases support a dry-run or explain mode that parses the SQL without executing it.
If you are working with CSV-based seeding (common with PostgreSQL's COPY command or MySQL's LOAD DATA INFILE), check out our Sample CSV Data guide for generating properly formatted CSV files with the correct delimiters and escaping.
Further Reading
- Prisma Seeding Guide
Official Prisma documentation on database seeding with seed scripts and best practices.
- PostgreSQL COPY Documentation
High-performance bulk data loading using PostgreSQL's COPY command for large seed datasets.
- Rails Database Seeding (db:seed)
The Rails approach to database seeding — a well-established pattern that influenced modern ORM seed workflows.