From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

RDS performance optimization

- [Instructor] Relational databases rely on users to carefully design schemas and queries that are optimized for performance. Most performance bottlenecks and relational database are due to long running SQL queries that are tying up system resources. In this lesson, we'll discuss how to monitor, troubleshoot, and remedy performance issues with RDS and Aurora databases. You can enable performance insights for RDS and Aurora databases to get a quick picture of the load on your database and what specifically is the cause of performance issues. It graphically shows the number of sessions running at certain intervals, and the dashboard breaks down the database load by wait events, SQL queries, and SQL users. A wait event causes a SQL statement to wait for a specific event to happen before it can continue running. Every active session is either running on the CPU or waiting. When sessions aren't consuming CPU, they might be waiting for a memory buffer to become free, a data file to be read, or a log to be written to. The more time that a session waits for resources, the less time it runs on the CPU. Another cause of wait events is row locking, which is a normal characteristic of an asset compliant database. When a database transaction is occurring, that is modifying data like insert, update, or delete statements, the database engine will lock access to the rows until the transaction has been committed. This makes other queries wait before they can run, but it assures the returned results are consistent. If this is happening too frequently and causing performance issues, then you may need to adjust when updates are made or else batch them all together. Where wait events show bottlenecks, the top SQL tab of performance insights shows which queries are contributing the most to the database load. For example, many queries might be concurrently running on the database, but one query might be consuming 99% of the database load. In this case, the high load may indicate a problem with the query. Top users shows which database users are contributing the most to the database load. A handy feature of performance insights is to enable the max CPU line on the dashboard graph. This gives you a quick indication of whether requested CPU utilization is exceeding the maximum causing queries to have to wait for execution. Also on the dashboard, you can access the CloudWatch metrics of the database, including CPU utilization, the free memory, query latency, and lock time. When performance metrics exceed certain thresholds, RDS will actually make recommendations to you on how to troubleshoot. In addition to using RDS insights, another thing you should do is enable the logging of slow queries, which will log any queries that take longer than a set threshold to run. To remediate performance issues, usually the first place to start is with optimizing the SQL queries that are contributing to the database load. Relational databases perform well when queries make use of indexes, rather than full table scans. The explain command will show you the query plan and whether large table scans will be happening with your query. To quickly evaluate a query, you can look at a metric known as the query cost, which refers to how expensive the database engine thinks the query is in terms of overall utilization of the CPU memory, disc, et cetera. This example is for our Northwind database running on a Aurora MySQL instance. For MySQL, simple queries generally have a query cost of less than 1,000. First, we explain this query without setting primary keys on the tables, and as you can see, we get a huge query cost. But when we add primary keys to the orders and products table and a compound key on the orders details table, the query cost drops dramatically. That is because while our query needs to retrieve all of the rows in the orders table, the orders details and products tables are going to look up the needed rows in the index and not need to scan the tables. In general, any fields that are used in joins should be indexed in order to minimize table scans. Indexes also help with sorting in your queries because the index is sorted rather than the entire table, so you also want to index columns that you are sorting by.

Contents