If you’ve ever tried to learn PostgreSQL — whether you’re coming from SQL Server, MySQL, or just getting started with databases — you’ve probably run into the same problem I did: where do you find a good sample database that is easy to understand has data that’s not stale?
Sure, there are datasets everywhere. Kaggle currently lists over 600,000 public datasets, but most of them are static CSV files that you load once and never touch again. Great for a one-time analysis, not so great for learning how a real database behaves over time. The Postgres Wiki lists a few dozen sample databases, too. And shoot, your shiny new AI coding buddy can help you create one if you want to put the time in.
The problem with most of these datasets is that they’re primarily static. If you’re lucky, some of the datasets might produce new data dumps once a month to keep things “current”. But the problem is that you can’t really practice query tuning if your data never changes. You can’t explore vacuum behavior when there are no updates. You can’t test monitoring tools when nothing is happening.
This is why I created Bluebox, and now I’m excited to share the next evolution: Bluebox Docker — a ready-to-run PostgreSQL container that gives you a realistic, continuously-updating sample database with zero setup required.
Pretty cool, right?
What’s in the Box?
The name Bluebox is a play on the US DVD rental kiosk company Redbox, but blue for our favorite PostgreSQL elephant (Slonik, for those keeping track at home 🐘).
Side note: Yes, I realize the Redbox connection means nothing to anyone outside of the U.S., and therefore “Bluebox” probably sounds pretty strange. But let’s be honest, so does “Pagila”… and it’s MySQL grandfather, “Sakila”.
When you spin up the container, you get:
- A fully populated database simulating a video rental kiosk business
- Real movie data from The Movie Database (TMDB) — actual titles you’ll recognize, not “ACADEMY DINOSAUR”
- Geographically realistic store and customer locations across the state New York (thanks to Ryan Lambert’s excellent Geofaker project), with more locales planned soon
- Automated
pg_cronjobs that generate new rentals every 5 minutes - Customer lifecycle events: churn, reactivation, status tracking
- PostGIS enabled for spatial queries
- A boatload of popular extensions pre-installed:
pg_stat_statements,hypopg,pgvector,TimescaleDB, and more
In short, it’s a database that actually does something. Leave it running for a few days and you’ll have meaningful data to analyze. Perfect for training, demos, or just poking around to learn how Postgres works under the hood.
Why I Built This Docker Repository
I’ve been using some version of Bluebox for my own presentations and training for a while now. The schema repository has been public, but getting it set up required multiple steps: installing Postgres, loading the schema, importing the data, configuring pg_cron, and hoping you didn’t miss anything along the way.
That’s a lot of friction when you just want to learn PostgreSQL, especially if you’re coming with experience in an entirely different database like SQL Server that’s mostly UI driven.
With Bluebox Docker, you run one command and you’re done:
./start.sh
That’s it. Pick your Postgres version, pick your port, and a few moments later you have a fully functional database with historical rental data already backfilled and new transactions appearing every few minutes.
Multiple Postgres Versions? Yes, Please.
One of the features I’m particularly happy about is multi-version support. Right now, you can run Postgres 14, 15, 16, 17, 18, or even 19-dev (built weekly from the PostgreSQL master branch).
This means you can do things like:
# Run Postgres 18 on the default port PG_VERSION=18 PG_PORT=5432 docker-compose -p bluebox-pg18 up -d # Run Postgres 17 alongside it PG_VERSION=17 PG_PORT=5433 docker-compose -p bluebox-pg17 up -d
Suddenly you can compare query plans between versions, test new features, or verify that your application works across multiple Postgres releases. All with the same dataset.
The Data Never Sleeps
As you may have picked up on, I don’t like stale data. This is primarily because my entire data career has been related to time-series data in some way. Therefore, one of my early frustrations when learning Postgres, and especially when trying to teach it, was that most sample databases felt… dead. You’d run a few queries, maybe create an index, and then what? The data just sat there, unchanging.
Bluebox solves this with automated pg_cron jobs:
| Job | Schedule | What it Does |
|---|---|---|
generate-rentals | Every 5 min | Creates new open rentals |
complete-rentals | Every 15 min | Closes rentals and generates payments |
process-lost | Daily 2 AM | Marks 30+ day overdue items as lost |
customer-activity | Daily 3 AM | Churns inactive customers, runs win-back campaigns |
rebalance-inventory | Weekly | Redistributes inventory between stores |
analyze-tables | Daily 1 AM | Updates table statistics |
Leave the container running and you’ll accumulate realistic transaction history. This is incredibly useful when you’re learning about:
- Vacuuming and table bloat: With constant updates, you’ll see dead tuples accumulate and VACUUM doing its thing
- Index usage patterns: Monitor which indexes are actually being used with real query traffic
- Query performance over time: Watch how plans change as table statistics evolve
- Monitoring tools: Test out
pg_stat_statements,auto_explain, pganalyze, pgNow, or your observability stack of choice against actual activity
Getting Started
If you want to take Bluebox Docker for a spin, here’s all you need:
- Clone the repository:
git clone https://github.com/ryanbooz/bluebox-docker.git cd bluebox-docker - Run the start script:
./start.sh - Connect with your favorite tool using these credentials:
| User | Password | Purpose |
bb_app | app_password | Application queries |
bb-admin | admin_password | Schema admin |
postgres | password | Superuser (use sparingly!) |
That’s the whole process. No Flyway migrations to run, no data imports to manage, no pg_cron to configure. It’s all baked in.
Windows Users: Docker Desktop is all you need — it includes Docker Compose and will prompt you to enable WSL2 during installation. The setup wizard handles most of the complexity for you. That said, if you’re coming from a pure SQL Server / Windows world and terms like “WSL2” or “containers” feel foreign, I’m planning a dedicated follow-up post walking through the entire setup step-by-step. Stay tuned!
What’s Next?
I consider this a solid v1.0, but there’s always more to do. I’m planning to continue improving the data generation procedures to make rental patterns even more realistic. The schema will continue to evolve as I find new use cases in my training and presentations.
If you find bugs, have feature requests, or want to contribute, the GitHub repository is the place to go. Issues and PRs are welcome!
And if you want to understand the underlying schema in more detail — why certain decisions were made, how the data generation works, the history of evolving from Pagila — check out the Bluebox schema repository.
Give It a Try
Whether you’re a SQL Server professional dipping your toes into PostgreSQL (I’ve been there!), a student learning your first database, or an experienced DBA who just wants a convenient test environment, I hope Bluebox Docker makes your life a little easier.
Pull the repository, spin up a Bluebox container, and let me know what you think!
Happy Postgres’ing! 🐘