Skip to main content

A strategy to manage large SQL tables

·396 words·2 mins
SQL Databases
Alejandro Duarte
Author
Alejandro Duarte
Alejandro Duarte is a Software Engineer, published author, and award winner. He currently works for MariaDB plc as a Developer Relations Engineer. Starting his coding journey at 13 with BASIC on a rudimentary black screen, Alejandro quickly transitioned to C, C++, and Java during his academic years at the National University of Colombia. Relocating first to the UK and then to Finland, Alejandro deepened his involvement in the open-source community. He’s a recognized figure in Java circles, credited with articles and videos amassing millions of views, and presentations at international events.

Some months ago, I got involved in a project where I needed to generate quite big reports (more than 1 million rows) extracted mainly from an SQL table growing at a very fast pace. This table played a central role in the everyday usage of the system.

In order to avoid losing performance, we scheduled a database backup and a process to empty that large troublesome table every 3 months. But the problem with the reports remained the same. At some point we ended up with a table with more than 3 million rows and sometimes we had to generate reports with more than 1 million rows. Even if the report was small, querying this table was taking way too much time (some times more than 10 minutes).

Here is what we did. First we decided to design kind of an ETL process using only SQL (somebody will argue that this is not ETL, but I think  it is). How? We developed a module to execute SQL scripts defined at run-time every day at 3:00 am.

These scripts were basically “ INSERT INTO table” statements. They took data from some tables and inserted the result into another table. Data in the destination table was more suitable for reporting, so we gained some time when generating the reports by moving processing (expensive JOINs mostly) from working hours to 3:00 am, when nobody,  even the CEO, was using the system. Reports were way faster (around 20 seconds for the biggest one).

A couple of months after going to production with this scheme, we faced another problem. The destination table, which was supposed to be more suitable for reporting, started to be a problem as well. As you may guess, it was too big. We were facing our initial situation again. The solution: Split that large data set into smaller ones. How? Every row in that table had a time stamp, so we divided the data into semesters by making one table per semester. The script executor module was modified to put the data into the correct table according to current date. The reporting application was also updated to allow users to select the semester (so the app was able to query the correct table).

Reports are pretty fast at the time of writing this. This strategy gave us some time to think about implementing a Big Data solution.

Related

Pagination: An old web 1.0 solution
·379 words·2 mins
UI
A few days ago, an Enterprise App user asked me if lazy loading is better (particularly in a buisiness application) than pagination.
Hello GitHub
·180 words·1 min
News
I have been a ProjectLocker and Assembla user for years. They both offer excellent tools for software projects management.
My brand new web site
·150 words·1 min
News
As some of you already know, I will be making a big noise about Enterprise App and InfoDoc Pro next month.