For PGSQL Phriday #004, Hettie Dombrovskaya challenged us to think about working with PostgreSQL as code, by ourselves or within teams, and how we keep track of the scripts and changes that are important to our day-to-day work. There are so many facets of this topic we could discuss, and Hettie gave a few really great prompts, but I’ll focus on two main areas: Database DevOps (generally) and then more specifically PostgreSQL scripts that I find helpful (and how to track them).
Let’s hop to it!
What if Joe joins the circus?
At the risk of sounding like a broken record, the longest tenure of my career, 15 years, was at a software company that used SQL Server dating back to the 1990s. (their usage of SQL Server, not the start of my employment! How old do you think I am?!?! 😱👴)
The only real schema management companies had at the time was a host of Entity Relationship tools like ER/Win (what we used), ER Studio, and others. Nobody was talking about Agile, DevOps (or Database DevOps), and application releases took months or more of effort.
Around 2010, Agile was taking hold and many changes were incorporated into “regular” application development processes to improve efficiency and the overall speed of product delivery. Unfortunately, database development didn’t quite follow suit in most organizations. At best, database changes began to be incorporated as ORM migrations, often at the expense of utilizing the database features to their fullest extent. It also meant that developers with little real database experience were modifying production databases through feature migrations without much oversight.
Oddly enough, we had the opposite problem. Our part-time DBA was still using the ER-based changeset process developed in the 1990s using bash scripts. We had to rely on him to incorporate all the changes each release, modify upgrade (manual migration) scripts to catch edge cases, ensure static data was up-to-date, and more. Aside from asking quite a lot of one person, and despite numerous attempts to get others up to speed on the process, it became clear that we were very exposed to a break in this link of the development chain. If Joe
got hit by a bus joined the circus, database change management was going to be a major issue.
It was here, struggling to lead the development team in a growing SaaS business, managing hundreds of client databases, that I knew there must be a better way.
“DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.”Donovan Brown, Microsoft
At the core, the biggest problem we needed to solve was making our database change process visible to the whole team. We had to modify our processes and the expectations we had of individual people within the team to deliver better, more consistent value at the data layer.
The tools we chose for our circumstances don’t apply completely to PostgreSQL because they were SQL Server specific. But the process and benefits of building a “database as code” mindset within the team accelerated our velocity and quality. And Joe didn’t have the weight of the world on his shoulders. 😀
A few principles that we found important and carry through in how I approach/teach database management to this day include:
- Knowing the true database state is invaluable: One of the biggest issues I have with a migration-only process (whether SQL or ORM) is that it’s hard to validate the intended state of the database schema – the “source of truth”. If you’re tooling doesn’t provide a method for easily maintaining a record of the expected state at various milestones, you’ll never really know which schema to trust (or find when a production database has “drifted” from that source of truth)
- All SQL is versioned and committed: Migrations, repeatable scripts, static data, and helper scripts need to be versioned and committed. This provides the same PR-led observability on changes and allows all developers to know where the “newest, best version” of a helper script can be found. When they’re strewn about across developer computers or in browser bookmarks, that’s a recipe for trouble.
- CI/CD is for databases, too: Database changes should be automated and incorporated just like regular application code. Even when committed code that’s not put through a similar CI/CD process will eventually fall short to manual processes. If the toolset you’re using doesn’t help with this, consider moving toolsets. The pain now will reap greater ROI than you can probably see now.
I believe in the process of database development so much that I’ve recently joined Redgate, a leader in database DevOps and tooling for over two decades. Our flagship too, Flyway Enterprise, incorporates both state-based tracking with intelligent migration generation. Although I had been a Flyway Community user before (a migration-only toolset), the added benefits of tracking state, too, provides control over migrations and assurance that there is always a source of truth to verify against.
If your current toolset doesn’t allow you to work towards a state of committing changes and rolling them out in an automated, test-driven, PR-based way, find a tool that does and start taking small steps there. I promise that once you see a script work its way through dev, staging, and production environments, you’ll be amazed that you worked so long without that process.
Helpful PostgreSQL Scripts
Great! Now that you have a process for managing your database changes and scripts that matter – what are those scripts and where do I find them. The answer to that is more tricky than I’d like.
PostgreSQL, the global community, and core “leadership” have put so much effort into documentation and examples over the
years last few decades. Sometimes, however, it can be hard to remember that PostgreSQL is maintained by volunteers, even if some of them are being employed by various companies as contributors to the project. Maintaining pages and repositories of example scripts is challenging for the most committed teams, let alone a globally distributed set of “hackers” focused on turning out some pretty 💯 releases of PostgreSQL every year (for many years running)!!
That said, there have been a few attempts over the years to provide a place for this extra information, a few of which aren’t referred to often enough (IMO). One site that holds lots of information is the PostgreSQL Wiki. Below are a few pages that I’ve referred to over the years, some of which probably need to be updated and moved into some more conspicuous, “official” areas like documentation.
- Don’t Do This: Still (mostly) relevant information about PostgreSQL features, data types, or setup in a “do this, not that” way.
- Index Maintenance Scripts: Common starting place for finding duplicate or unused indexes.
- Developer FAQ: Exactly as it says on the tin, many questions that are repeated often on mailing lists, Slack, and elsewhere. Continues to be maintained.
- Performance Snippets: Database bloat, loose index scan, finding locks, and more!
In a real project, whenever I use one of these scripts as a starting point, we’ll commit it to a “helpers” repository or folder in the current project with comments that references the source, provide a URL, and if necessary, any blog post & Stack Overflow links that were relevant. Without this habit or expectation, many scripts die the death of an accidently closed IDE tab. 😬
Work to be an example
Most of the time I’m the least qualified developer in the PostgreSQL space to provide elegant examples. My Postgres knowledge is hard won (and hard work). But I love the platform, the community, and the constant improvements. Learning how to use data, harness the power of SQL and Postgres, and share it all with others is something I’m trying to get better at.
Whenever I give a presentation at a conference or meetup, I try to make the slides and any SQL available through my GitHub repo. More recently, I’ve started recording a video for each day of the Advent of Code as I’ve completed them, talking through my solutions and what I learned along the way.
I’d encourage you to do the same. Through the 2022 State of PostgreSQL Survey, we learned that many people in the PostgreSQL community prefer reading blog posts and watching videos. The more you contribute, the more likely your work will end up in the comment section of a script that helps a team use Postgres more effectively. 😉