PostgreSQL for the SQL Server DBA: The First Four Settings to Check

Five years ago, I began my transition back to using PostgreSQL full-time, and it wasn’t the smoothest journey. One of the biggest reasons I struggled to make progress learning PostgreSQL early on was simply not knowing what I should be looking for. In fact, I often have conversations multiple times a month about the differences between SQL Server and PostgreSQL with folks setting out on a similar journey.

My guess is that you’re trying to figure out the same things, which is how you ended up on this series of posts.

Yeah, I’m smart that way. 😉

It doesn’t really matter how you got to PostgreSQL. Whether you did a careful migration over months of planning, used one of the (increasingly) common “one-click migration” tools from major providers, or are starting a new greenfield project from scratch, many of the little things you learned about how to setup and query SQL Server don’t translate directly into PostgreSQL.

So, what’s at the top of the list… the first thing I typically ask new PostgreSQL converts about?

Memory settings. Exciting, right?

Memory Settings and Why They’re Important

The adage in SQL Server has generally been to configure the memory as high as possible for a given server because SQL Server will use as much of it as possible. For the most part, however, SQL Server takes ownership of how all that memory is managed. While there are some advanced levers to reserve memory for specific uses, most guidance focuses on the main amount of memory available to a SQL Server instance so that the server can manage what happens dynamically in the background.

In some ways, PostgreSQL is not different. Queries still process 8Kb pages of data and query plans need memory to execute properly. However, the way that memory is configured and used requires a bit more hands-on attention. Understanding and initially tweaking the settings I discuss in this article, and then checking them over time, is essential to having a performant server and application. If you use a hosted solution like Amazon RDS or Azure Database for PostgreSQL – Flexible Server, most of the settings mentioned here have reasonable starting defaults. However, your data, application, and query patterns might not fit well with those default values and it’s up to you to configure them correctly.

For example, with PostgreSQL, we need to specifically configure the amount of memory that will be used for the shared buffers (ie. ” buffer pool” in SQL Server). However, that is the only memory setting we can effectively hard code for runtime. There is no setting that limits the potential maximum amount of memory that PostgreSQL can use for other processes. Therefore, knowing how each of these memory settings work in tandem with ongoing server and query processing is really key.

Let’s dig in.

A note about “serverless” database options: Some of the offerings by the same providers have done more extensive work to the PostgreSQL codebase and try to dynamically modify most of these settings over time and workloads. In some cases, you can’t even modify these values which may or may not work in your favor.

Still, in my opinion, it’s important for any PostgreSQL user to understand how these settings impact your PostgreSQL experience.

How to Modify Configuration Settings in PostgreSQL

The settings listed here, and hundreds of others, can be set at the server level in the postgresql.conf file. If you’re using a hosted solution like AWS or Azure, then these values are changed through a CLI or forms and parameters they provide to you through the console.

In some cases, these settings can be modified for a specific session which can be helpful if a query needs more memory to perform a specific query. If a setting can be set during the session, I’ve noted it below.

Caching Data Pages

The first setting that needs your attention is shared_buffers. This setting is used to determine the amount of memory allocated for caching data in shared memory buffers. Unlike SQL Server, we get to tell PostgreSQL how much memory to reserve for frequently accessed data to reduce the need for disk I/O operations.

If your server has a lot of memory and you can allocate enough memory to cache the entire database (or databases), great! PostgreSQL will try to use the memory. However, if you have a limited amount of server memory and your database is many times larger than shared_buffers, PostgreSQL will have to do more swapping of data from disk as queries come and go.

There are lots of (generally old) posts and formulas in the wild which talk about how to set this value. Starting at 25% of available server memory is common and reasonable advice. But that’s only a starting point.

The best way to tune shared_buffers is to track the hit cache ratio. This is not a specifically tracked statistic, but one that you can find through a simple query shown below. Unlike SQL Server (where page life expectancy (PLE) is preferred to just hit cache ratio), tracking this value provides the indicator you need to see if you have enough buffer memory allocated.

If the ratio is consistently too low (or drops every time certain jobs run), this indicates that a lot of data moving between memory and disk to satisfy query results. Ideally you want the hit cache ratio to be as close to 100% as possible under normal operation. Anything less than ~80-85% consistently indicates that your application isn’t performing as well as it should and increasing shared_buffers should be considered.

However, if increasing this setting means more than 35%-40% of your total available PostgreSQL memory, then this is an indication that the server is resource constrained and you should probably invest in a bigger instance if application and data changes don’t have a measurable impact.

PostgreSQL Hit Cache Ratio query

SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit) as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
  pg_statio_user_tables;

Main takeaway: we get to tell PostgreSQL how much memory to use for caching data pages. This setting, coupled with the available memory resources and the other memory settings discussed next, directly impacts how efficient PostgreSQL will be with your data.

Query Memory

work_mem is the next most important setting to investigate in PostgreSQL because it directly impacts query planning and performance. It also works very differently from SQL Server and most of what you know about memory usage during query planning.

When PostgreSQL plans a query, it doesn’t pre-allocate memory like SQL Server does. Most of the time SQL Server does a reasonable job requesting a specific memory grant so that the data and query operations can all be held in memory for quick work. If SQL Server requests too much memory, the query might be delayed while it waits for memory to be freed. If it requests too little memory, then we get the dreaded warning icon (⚠️) in the query plan and data spills to disk.

In PostgreSQL, the nodes of the query plan (what SQL Server would call operators) can use up to the amount of work_mem for their specific operation. If a complex query plan has many sorting or hashing nodes, each node, for each currently running query, can use up to this amount.

By default, work_mem is set to 4MB per node operation. If you execute a query that contains two sorting nodes in the plan, then that query plan could potentially use up to 2x work_mem in memory to process the data (ie. 8MB with the default value). If any nodes need more than work_mem, that operation will spill the data to disk and your query will be slower.

The easiest way to monitor for an undersized work_mem setting is to look at the EXPLAIN ANALYZE plan output for larger, slow queries. Any time the plan contains Sort Method: external merge Disk: xxxxkB, the query spilled to disk because it didn’t have enough memory to perform the operation. Remember, PostgreSQL didn’t do anything wrong, it’s simply constrained by the setting.

In this case the query cannot run faster unless it has at least as much work_mem as indicated by the total size of data that needs to be sorted on disk. My recommendation is to find a monitoring solution that tracks slow queries and grabs the execution plan detail. Look over a representative timeframe for how high (and consistent) the external disk requirement was. This is a good starting point for setting the work_mem value.

With this value in place, you can modify it one of two ways; individually for the sessions/queries that are struggling, or you can modify it at the server level which impacts all queries.

The potential problem with raising the work_mem setting at the server level is that PostgreSQL won’t check if there’s enough memory left run a query. If you set work_mem to 256MB and your server only has 4GB of total RAM, you could easily run out of memory with just a handful of connections running queries simultaneously.

If you can change the application code, a viable option is to modify the setting on the fly for queries that need more memory to run quickly and efficiently. Reporting queries, for instance, often need more memory but only run a few times a day. Therefore, as part of the query session (or parameters that some frameworks provide), modify the work_mem value with the SET command:

SET work_mem = '32MB' -- per session value

Again, my biggest caution here is that setting the server-wide value too high will probably waste a lot of available memory and quickly cause “out of memory” errors if you’re not careful. For modern hardware, increasing this to 16MB is probably safe in most cases as a starting value. Another quick back-of-the-napkin check is to ensure that (connection_limit * work_mem) + shared_buffers is less than the total memory available to PostgreSQL.

Main takeaway: work_mem is one of the most overlooked settings for developers coming from SQL Server. In PostgreSQL we get to turn a few more knobs to influence how query planning and execution work. You need to look for external disk sorting activity and adjust this value accordingly, but within the limits of how much memory your server has and the total number of connections that could be running at any time.

Number of Connections

The third setting to look at is how many connections your PostgreSQL server will allow. By default, PostgreSQL limits the available connections to 100. An active application without a load balancer can quickly exhaust this limit.

However, because each connection uses a process, simply increasing this value isn’t the magic fix, either. Remember, every connection could use at least a work_mem amount of memory, and more for each sort/hash node. If you don’t have enough memory in the server, or you’ve increased the server-wide work_mem setting, increasing the connection limit could quickly add to memory exhaustion.

The real answer here is to look at a connection pooler like pgBouncer or pgPool-II. Some frameworks in popular languages like Python have also begun to implement their own pooling capabilities.

Main takeaway: simply raising the connection limit isn’t the first answer. If you find that your application is constantly running out of connections, or you have to adjust the server-level work_mem higher, keep the connection limits in mind. Ensure that your monitoring software keeps track of connections over time and keep an eye on it.

Ongoing Maintenance and Backup/Restores

Like any database, PostgreSQL does background work to maintain the data, schema, statistics, dump/restore data, and more. One of the main reasons it’s important to tune the background workers is because of something called dead tuples.

PostgreSQL uses a process called Multi-version Concurrency Control (MVCC) to maintain transaction isolation and concurrency of data. One of the benefits of this implementation is that readers (almost) never block writers under normal querying. It also allows cool features like transactional DDL statements.

The downside is that every UPDATE is a delete and insert, which creates dead tuples (rows). To reclaim that space (and do some other necessary stuff in the background), PostgreSQL must run a process called VACUUM on a regular basis to keep data clean and the database in tiptop shape!

MVCC and VACUUM are really important concepts in PostgreSQL and worthy of at least a separate blog post. There are thousands of blog posts and conference talks available on the subject and it’s worth understanding it better. I won’t go into more detail in this post (it’s already gotten long!), but just knowing that the process exists is helpful so that we can briefly talk about maintenance_work_mem.

The background jobs that keep your database in shape need enough memory to free the space of dead tuples, update table statistics, modify the visibility map (a future post too!), and more. In my experience, if the memory settings are reasonable for your workload, you probably won’t think about VACUUM for 95% of your tables. It “just works” and you can keep moving forward. 😊

But, under load, or with databases that have heavily updated tables, VACUUM can fall behind and you end up with table bloat (at least), indexes that take too long to create or update, outdated statistics, and other scary things (if things get REALLY out of control).

By default, PostgreSQL sets maintenance_work_mem at 64MB, 16X the value of the default 4MB work_mem setting. Background processes only run one at a time per process type, so a higher memory setting won’t impact server load as much as increasing work_mem (which is multiplied by every connection).

If you have 16GB of RAM or more, consider increasing this to at least 256MB and see how things improve. One suggestion from a reputable PostgreSQL shop is available RAM * 0.05, but beyond a few hundred MBs in most cases, that calculation starts to break down, too. If you’re having issues with jobs keeping up, my recomendataion is to increase the amount by 32MB or 64MB and see if things improve. If not, increase it again.

Unless you have 32GB of RAM or more, I’d probably be cautious of going above 256MB. It might work and be what you need but reserving that memory will have an impact on other memory that’s available to PostgreSQL.

Main Takeaway: PostgreSQL uses background processes to keep the database in good working order. Those processes need enough memory to maintain the database, and different workloads will often require more memory to be as quick and efficient as possible.

Tip of the Iceberg

Transitioning from SQL Server to PostgreSQL can be challenging and rewarding. My frustrations in the journey always reached their pinnacle when I knew that something I was attempting to do should work, but I didn’t have any framework for quickly translating my previous experience into tangible next steps.

In this post I discussed four PostgreSQL settings that will have a direct impact on how efficiently your PostgreSQL instance works. Knowing that they exist and what they do, in relation to what you know about SQL Server setup, should get you one step further down the line.

But as the section title says, these are just the tip of the iceberg. There are hundreds of settings in PostgreSQL, some of which we’ll talk more about in future posts and how they relate to your SQL Server knowledge and expectations. 😉

PostgreSQL for the SQL Server DBA: The First Four Settings to Check

Memory Settings and Why They’re Important

How to Modify Configuration Settings in PostgreSQL

Caching Data Pages

Query Memory

Number of Connections

Ongoing Maintenance and Backup/Restores

Tip of the Iceberg

Related

2 thoughts on “PostgreSQL for the SQL Server DBA: The First Four Settings to Check”

Leave a ReplyCancel reply

Memory Settings and Why They’re Important

How to Modify Configuration Settings in PostgreSQL

Caching Data Pages

Query Memory

Number of Connections

Ongoing Maintenance and Backup/Restores

Tip of the Iceberg

Share this:

Related

2 thoughts on “PostgreSQL for the SQL Server DBA: The First Four Settings to Check”

Leave a ReplyCancel reply