Why is my database write IO is so high

Bharath Kotha
4 min readAug 7, 2022

SELECT query might be one of the things causing the high disk write IO under the hood. Yes, reading data from the database can cause writes to disk. However strange it may seem, it happens because the database has to return the data in the format/order that we ask it to. This post sets up a small SELECT query that would cause disk writes and explains which part of the query is causing it.

To set things up, we’ll be creating two tables — books and authors. Postgres is used to do the setup in this post. The scripts might need to be modified little bit to match the other databases. To create the tables, run the following SQL statements.

Once the tables are created, the below script generates the data to be loaded into the tables.

The scripts generates SQL statements required to populate the data and writes them to data.sql file. The number of authors and number of books can be controlled by the variables at the top of the script. Once the file is generated, the data can be imported to the database using the following shell command

psql -q -U username -d database_name -h localhost --password < /path/to/data.sql

The loading and generating of the data might take sometime depending on the size of the data. Also, the tables might need to be truncated if the data was already loaded before and wants to load again.

Now let’s run the ANALYZE on a couple of simple queries and see what happens. Run the following query in the database terminal

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM book INNER JOIN author ON book.author_id = author.id ORDER BY book.id LIMIT 100;

This is the sort of queries that gets run when when admin pages are opened on Django dashboard (and select_related is populated on admin). The above query results in the following query plan

This is how the above query plan is read

Lines 7–8: Scan through books table following primary key
Lines 9–11: Find the author of the book in the authors table by fetching it based on the author id (autho_id of the book is available from lines 7–8)
Lines 5–6: Do above two steps for all the rows in books table
Lines 3–4: Limit the results to 100

In more general words, the query plan says that find the first 100 rows of books table sorted by ID and follow the foreign key to find the authors of those books. More importantly, the query is not writing anything back to DB and using the memory to compute the entire query result. And the query itself is finished in around 25 ms.

Let’s see what happens if the query changes slightly to sort it based on book title alongside id. This could be a query that will be executed when the tables in the Django admin dashboard are clicked to sort by a different column. Execute the following ANALYZE query

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM book INNER JOIN author ON book.author_id = author.id ORDER BY title ASC, book.id ASC LIMIT 100;

And the query plan output looks like following

Wow!! That’s a huge query plan. Let’s dig a little deep into what’s happening here

Lines 6–7: Plan and launch 2 worker threads
Lines 20–24: Using the workers, hash the authors table by sequentially scanning it to prepare it for join.
Lines 15–19: Using the workers, sequentially scan the books table and join it with the authors table using the hash
Lines 9–14: Sort the data using a Top-N heap sort algorithm
Lines 5–8: Join the data from the worker threads
Lines 3–4: Limit the output to 100 rows

A seemingly innocuous query caused the database to sequentially scan the tables, hash and join them. Since the working memory is not sufficient to perform these actions, the database had to write temporary data to the disk. Sequentially scan tables itself is a very expensive operations for large tables. Moreover, having to write the roughly same amount of data back to disk will be more costly slowing the query considerably. A total of 195896 temporary blocks were written while executing the query and the query took almost 26 seconds usually not acceptable in a OLTP systems.

This is an issue we actually faced in our production system. We have seen the database disk write IO start shooting up and the available storage space reduce rapidly. It was very unintuitive for us to identify that a SELECT query was causing the issue. It had a cascading effect on the system performance for quite a while. We were able to debug the issue and resolve it thanks to the monitoring in place. Adding alerts around disk usage, write IOPS will also help identifying such issues on time and keep production systems stable.

--

--