Amazon Linux security mitigations and Postgres performance

Eric Green

Published Feb 9, 2018

Hi, it's me again. You might remember me as the person who broke Aurora Postgres during the beta period because I have too much data for it to index via pg_restore (over 2 billion records in just *one* of my tables).

So anyhow, while Aurora Postgres is broken for me, RDS Postgres until recently worked fine as a temporary spinup. Spin it up, dump several billion records to it over the course of about 12 hours of pg_restore, run some batch jobs against it for a week or so, tear it down. Much more convenient than spinning up my own Postgres servers. Until now.

Last week I spun up a RDS Postgres instance and ran my batch jobs against it and... a batch job that was supposed to run in 12 minutes instead ran in 21 minutes. That is, the Postgres RDS instance was running 43% slower than normal. Things I tried in order to speed it back up:

1) Upgraded the instance size of the RDS instance to two sizes larger than I usually run. No effect on performance -- still 43% slower.

2) Moved to a different availability zone due to possible fail in the availability zone -- spun up a replica in another AZ, failed over to the replica, altered my compute node ASG to use the other AZ to fail over the compute nodes to the other AZ (one that I'm already running things in). No improvement.

3) Upgraded to higher provisioned IOPS. No improvement.

4) Upgraded compute node instances to latest C5 instances. No improvement.

At that point I was done with RDS Postgres since clearly it was not going to be able to run my batch jobs within the designated time frame, so I spun up my own Postgres instance with a c5.2xlarge striping data across multiple EBS volumes in a setup that I'd previously used (all this setup/configuration is puppet-driven BTW, I don't *manually* set up any of this, that'd be insane). *STILL* the 43% performance impact on my batch jobs. And this is a configuration that I'm successfully using on the production cluster that's churning out 12 minute processing times -- still taking 21 minutes now.

At that point I started looking at the Linux security mitigations for SPECTRE and MELTDOWN bugs, which have not been applied to the production cluster's Postgres server because there hasn't been any service windows for our service since Amazon introduced their mitigations. First I disabled the retpoline mitigation via adding retpoline=off to the grub command line. This had minimal impact upon the performance of the Postgres server.

Then I disabled the pti mitigation using pti=off. Immediately my Postgres was running at full speed again.

Recommendations: The MELTDOWN vulnerability allows unvetted software running on a server to access data in other processes running on the server. Unfortunately, the PTI mitigation has severe and drastic impacts upon Postgres performance.

RDS does not require the PTI mitigation for security since RDS servers do not run any unvetted software. Thus it may be worthwhile for Amazon to provide an option to disable PTI mitigation in order to restore performance for Postgres RDS instances. Without that, Postgres RDS instances will simply be too slow for many purposes.

Looking at the Postgres lists, it looks like they acknowledge that certain workloads will make PTI have dire impact upon Postgres servers due to Postgres making heavy use of syscalls where it doesn't really need that many syscalls. Unfortunately any mitigation on the part of the Postgres team will be some time away, and will likely *never* happen in the 9.x series. So if you are running your own Postgres servers with currently supported versions of Postgres and are going to be doing this on AWS Linux, remember this: pti=off is your friend.

To view or add a comment, sign in

Amazon Linux security mitigations and Postgres performance

Eric Green

More articles by Eric Green

Others also viewed

How To Configure a Galera Cluster with MySQL on Ubuntu 18.04 Servers

MongoDB 7 to MongoDB 8 Upgrade Without Downtime on Ubuntu (Full Step-by-Step Guide) 🚀 #mongodb #devops #upgrade

Embed etcd Cluster Inside Golang Program - A MiniCluster Example

A Developers' Guide to Elasticsearch & Kibana setup on Ubuntu

What’s in your Windows Registry? Registry Processing using AWS Part I

Problems with the MongoDB service.

Why containers might make a ding in IT universe ?

Fidel Vetino: Migration Plan: CentOS to Ubuntu, Datacenter to AWS, with Security Measures and Testing Strategy

Unleash the power of Docker Containers - An Illustration

I exhausted my own rate limiter (in container):

Explore content categories

More articles by Eric Green

Postgres uniqueness checking and locking

Some random tips for Elasticsearch

Amazon Aurora Postgres: First thoughts

"So do that with your smartphone, nerd boy."

In which I talk about a common security failing on web sites

El Capitan. Or where Apple breaks the world

Where ignorance meets media at the security circus

The future of enterprise storage is not NAS

We let too many good ideas die