Within software development there’s almost always a thousand ways to solve a problem. Which one we chose depend on complexity, time consumption, quality etc.
A couple of months ago we were facing a problem. We had just developed a new report in our user portal. The report listed transactions to and from a user's account. It had some fancy features like filtering on type, date ranges, pagination, export. The new report worked fine for most of our use cases except one. It was slow for large amounts of data. When customers had many transactions during a time period the report would simply time out.
The reason for the timeout was a database lookup. For the troublesome, worst case, use case we we’re extracting around 300 000 rows (out of about 700 million) and it took 72 seconds.
So we started to refactor, checking indexes etc. Trying to grab the easy wins in making this new report replying faster. We, a team of 3 backend engineers, rewrote the whole backend logic 3-4 times during a couple of weeks and managed to shave off some time. The worst case was now down to 31 seconds. It was a great improvement, however, 31 seconds was the time the user had to wait for the first page of the report in our user portal. No one waits 31 seconds for an ajax call.
What bugged us so much at this point was that the tables were correctly designed, the query plans were perfectly fine, indexes were in place: it should be fast!
Now we were getting desperate, considering massive architectural changes that would take months to achieve. Rewriting an entire micro service, rewriting the supporting services, perhaps introducing a report generator that would generate files and even removing the visible part of the report in the portal.
Then we got a suggestion, how about testing out a repacking tool for the database. We use postgres so the specific tool to try out was http://reorg.github.io/pg_repack/ .
So what does repacking tables mean? It’s pretty simple actually. It does what it sounds like. It changes the way the data in the table is stored on disk. Physically reordering the data. This means that when you query the data based on a certain criteria, if the data is structured on that criteria, you don’t have to fetch it from sporadic locations on the storage medium. Very much like the cluster command in postgres (http://www.postgresql.org/docs/current/static/sql-...) only that a repack tool should not require a full lock on the table while working.
Repacking or clustering is a one-time operation meaning that updates to the table would not be ordered. Fortunately our database structure for this specific lookup allowed us to repack aggregated month tables.
So we got to work, created a repacking script that would repack one month at a time, hooked it up to our test environment and then went home for the weekend.
Checking the result the following monday, 157.. milliseconds! Wait, what, really?
Some more on the repack tool. It does require some disk space while working. While running it creates a copy of the table that is being repacked, the copy then being ordered, the last step is then to swap out the old unordered table with the new. The new ordered table’s indexes will be smaller though so at the end you gain some disk space.
It also requires a primary key or not-null unique key(s) and since it’s a one-time operation it’s best suited for tables that aren’t continuously updated.
So our conclusions? For this particular problem, the repacking tool really helped us. We decided to go with it for now. We might still decide to rewrite our backend services in the future though.