Adventures in PostgreSQL Backups

For PGSQL Phriday #002, Andreas Scherbaum asked us to consider PostgreSQL backups, how we do them, any recommendations we have, and whether pg_dump is a backup tool.

To be honest, I feel pretty unqualified to talk much about backups in Postgres, partially because it’s been a couple of years since I’ve had to manage the uptime of an application. As PostgreSQL has grown in popularity and usage, there is a lot of effort being put into tools like pgBackrest, pgBarman, and some newer solutions like pgcopydb (ok, I realize this isn’t billed as a “backup” tool, but…).

What I can share are a couple of things I’ve learned about restoring clusters and databases, particularly given the extensible nature of Postgres and the variety of tooling.

Which is a good segue to mention…

Restores > Backups

I think this goes without saying on most cases, but if I’ve learned anything by parenting six kids, repeating the basics (almost) never hurts. 😉

Backups are only as good as your ability to restore the data in the business appropriate timeframe.

It’s really that simple. No matter how much you work to dump the data or backup the data files in the appropriate way, if they can’t be restored it was all an effort in futility.

It doesn’t matter if the tool is pg_dump or pgBackrest. If you can’t get back to a state that is appropriate for your business or application, it doesn’t matter. Knowing how to use the tools requires practice, patience, and knowledge of the potential pitfalls.

What Counts as a Restore?

For most of us, talking about “restores” usually means the data only, and maybe security/users included. In my experience there’s one other aspect of Postgres that can significantly impact the restore of your data and the smooth recovery of your application.

Extensions! 😱

One thing that most users don’t take into account when they restore data is that extensions are not backed up with your data. Regardless of how you do the backup (again, pg_dump or a file-level backup), that’s just (usually) the schema and data of your database. If you installed some extensions that are necessary for your application to work, the actual extension files are not backed up. This presents a few problems if you didn’t plan ahead and are caught in a bind.

Extension Availability

First, if you restore your database to a new server and those extensions aren’t installed or available in the contrib packages that ships with the core PostgreSQL installation, you’re application might not work even if all of the data was restored.

Let’s say your application relies on a specific Foreign Data Wrapper, maybe the redis_fdw. If that extension isn’t installed on the new server when you restore the data, your application probably isn’t going to work even though all of the data has been restored.

This is because the command to install the extension is backed up (and in some cases configuration information for the extension to run normally), but when the restore executes something like CREATE EXTENSION redis_fdw;, you’re going to get an error. Assuming the restore continues and all of the data still gets restored, the extension itself still isn’t available.

Therefore, all of the clusters that you want to use as destinations for restoring data must have the same extensions installed and ready to go. Again, if you’re only using extensions that are provided as part of contrib, you’re probably OK most of the time. But due diligence is required when your application and users rely upon you being responsible with their data and availability.

Extension Versions

To make things even more complicated, the version of the extension is important to consider, too. Depending on what the extension is and the functionality being used, you are generally responsible for making sure that the correct version of the extension is available. This is easy to see in a textual pg_dump backup. The CREATE EXTENSION... command doesn’t specify the version, only that the extension needs to be created.

When an extension is upgraded between versions, various scripts can be run if there are catalog or metadata to modify. For example, if my application relies on version 1.0 of an extension and I restore to a server that has version 1.5 installed instead, my application might not contain the various schema changes necessary for that extension to work. That usually means bad things when I try to query data or use (deprecated/deleted) extension functionality.

I don’t want to make too big a deal of this. In many cases, extensions don’t update very frequently and new versions might not have made changes to functions, data, or catalog settings. But any time you rely on a new feature or setup, this is a notable issue to consider.

In short, if you are responsible for data integrity among multiple PostgreSQL clusters, ensure that all extensions are updated to the same versions and document it somewhere along with a changelog. This is helpful for you to remember the state of your database and for others in your organization that need to help maintain the database.

Selective Backup and Restore

The other thing that is particularly helpful with pg_dump and pg_restore is that there are many options to do selective operations.

  • Need to backup everything but that really large table only used for transient ETL data? pg_dump... --exclude-table=large_table
  • Are you trying to create a baseline script for a new database DevOps tool like Flyway and only require the database schema? pg_dump... --schema-only
  • Will this file be used for demonstration purposes and you can’t guarantee the same roles are available on the destination server? pg_dump... --no-owner
  • Conversely, did you receive a backup script and keep getting “role does not exist” warnings when you try to restore it? pg_restore... --no-owner

There are so many runtime options and permutations for each of these tools, there are many examples we could give. Suffice it to say, there’s a lot of power in using pg_dump and pg_restore for doing daily development work and helping other PostgreSQL users try to solve problems. Learning how to use both tools effectively is helpful for any PostgreSQL user.

Bonus: Is pg_dump a Backup Tool?

But seriously, I’m not sure I really want to wade into this discussion. At face value, yes. pg_dump produces a transaction safe backup of whatever you told it to. Assuming you know how to correctly restore that dump file, you technically have a good, working backup.

That said, this isn’t a plan for maintaining a production system which requires fine grained control and responsive replication. And that, is an entirely different post (or series of posts).

In summary, pg_dump is technically a backup tool. But you’re going to be in a lot of pain if you try to use it for a high-availability setup. 😀

The Conversation That Never Ends

I’m excited to see what others talk about as part of this PGSQL Phriday. There are so many aspects of backup and restore that impact our daily lives. There’s no shortage of tips, tricks, triumphs, and horror stories in this area of PostgreSQL database management – and I think we could blog about it every month for a few years and still not cover everything.

Hopefully, shedding a little light on how extensions require just a little bit more attention as part of your backup scheme is helpful. Again, it all comes back to a successful restore, and extensions are part of that success.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.