I often noticed that people complain about the LOAD DATA performance when loading the table with large number of rows of data. Even today I saw a case where the LOAD DATA on a simple 3 column table with about 5 million rows taking ~15 minutes of time. This is because the server did not had any tuning in regards to bulk insertion.
Consider the following simple MyISAM table on Redhat Linux 32-bit.
1 2 3 4 5 6 7 8 |
CREATE TABLE load1 ( `col1` varchar(100) NOT NULL default '', `col2` int(11) default NULL, `col3` char(1) default NULL, PRIMARY KEY (`col1`) ) TYPE=MyISAM; |
The table has a string key column. Here is the data file(download here) that I used it for testing:
1 2 3 4 5 6 7 |
[vanugant@escapereply:t55 tmp]$ wc loaddata.csv 5164946 5164946 227257389 loaddata.csv [vanugant@escapereply:t55 tmp]$ ls -alh loaddata.csv -rw-r--r-- 1 vanugant users 217M Nov 6 14:42 loaddata.csv [vanugant@escapereply:t55 tmp]$ |
Here is the default mysql system variables related to LOAD DATA:
1 2 3 4 5 6 7 8 9 10 |
mysql> show variables; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | bulk_insert_buffer_size | 8388608 | | myisam_sort_buffer_size | 16777216 | | key_buffer_size | 33554432 | +-------------------------+----------+ |
and here is the actual LOAD DATA query to load all ~5m rows (~256M of data) to the table and its timing.
1 2 3 4 5 |
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; Query OK, 4675823 rows affected (14 min 56.84 sec) Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0 |
Now, lets experiment by disabling the keys in the table before running the LOAD DATA:
1 2 3 4 5 6 7 8 9 10 11 |
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=314572800; Query OK, 0 rows affected (0.00 sec) mysql> alter table load1 disable keys; Query OK, 0 rows affected (0.00 sec) mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; Query OK, 4675823 rows affected (13 min 47.50 sec) Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0 |
No use, just 1% increase or same…., now lets set the real MyISAM values… and try again…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=256217728; Query OK, 0 rows affected (0.00 sec) mysql> set session MYISAM_SORT_BUFFER_SIZE=256217728; Query OK, 0 rows affected (0.00 sec) mysql> set global KEY_BUFFER_SIZE=256217728; Query OK, 0 rows affected (0.05 sec) mysql> alter table load1 disable keys; Query OK, 0 rows affected (0.00 sec) mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; Query OK, 4675823 rows affected (1 min 55.05 sec) Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0 mysql> alter table load1 enable keys; Query OK, 0 rows affected (0.00 sec) |
Wow…thats almost 90% increase in the performance. So, disabling the keys in MyISAM is not just the key, but tuning the buffer size does play role based on the input data.
For the same case with Innodb, here is the status by adjusting the Innodb_buffer_pool_size=1G and Innodb_log_file_size=256M along with innodb_flush_logs_at_trx_commit=1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
mysql> show variables like '%innodb%size'; +---------------------------------+------------+ | Variable_name | Value | +---------------------------------+------------+ | innodb_additional_mem_pool_size | 26214400 | | innodb_buffer_pool_size | 1073741824 | | innodb_log_buffer_size | 8388608 | | innodb_log_file_size | 268435456 | +---------------------------------+------------+ mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; Query OK, 4675823 rows affected (2 min 37.53 sec) Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0 |
With innodb_flush_logs_at_trx_commit=2, innodb_flush_method=O_DIRECT and innodb_doublewrite=0; it will be another 40% difference (use all these variables with caution, unless you know what you are doing)
1 2 3 4 5 |
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; Query OK, 4675823 rows affected (1 min 53.69 sec) Records: 5164946 Deleted: 0 Skipped: 489123 Warnings: 0 |
Are the rows in primary-key-order? I suspect that might make a significant diff.
No, it is not sorted for any of the column…you can download the load data file from here:
loaddata.tar.gz
Enabling keys after load file is blazingly slow for big tables having more than 30 million rows. That is what my experience saysays.
Also found that after 20-30m rows, doing bulk insert into table takes forever. I am running this process for 12 120m rows for last 15 hours and it is not yet finished. As per the file size, it is only 45% done.
Trick: Tuning MySQL to Speed Up Bulk Inserts…
I’ve been working on a project that requires loading a large dataset into MySQL so I’m using the acts_as_importable plugin that I created. This allows me to generate MySQL bulk insert statements instead of doing the inserts through ActiveR…
I’m working with 20-25m data rows insertion everyday. Before i load the data, i split it into file of 2m rows. Your experiment is useful for me to speed up inserting data 5 times faster. Because the table used for read only, i compressed them until i get 50% to 70% space saving.
hello i have one query..
I have written a stored procedure to delete a particular row based on the input parameter
i am going to pass..
when i tested that procedure using call procedure_delete(in_value)
with the existing values of records its working fine..
if i tried to give a in_value for which there is no records exists in the corresponding table
i got this …
query is ok, 0 rows affected..
for this type of cases how can i write an error handler inside my
stored procedure????
please help me out..
thanks in advance..
thanks and regards
srinivas antarvedi
[…] Here is a good article about LOAD DATA INFILE and its performance. LOAD DATA INFILE – performance case study […]
[…] na primer v PHP-ju, poslediÄno je uvoz precej hitrejÅ¡i. PriporoÄam vam, da si ogledate teste ter ugotovitve, kako pohitriti veliko koliÄino podatkov v MySQL preko LOAD DATA INFILE funkcije ali mysqlhotcopy PERL […]
MySQL – LOAD DATA INFILE performance case study…
Im Blog von Venu Anuganti habe ich einen interessanten Eintrag zur MySQL Funktion LOAD DATA INFILE gefunden. Wer sich schon mal mit der Performance von umfangreicheren importen in eine MySQL Datenbank beschäftigt hat, sollte sich diesen Blog mal ansehe…
increase myisam_sor_buffer_size to repair faster after loading data and enabling keys
Thank you for the tip! Import performance now takes 2hrs instead of 9hrs.
Thanks a lot,
I set
SET SESSION BULK_INSERT_BUFFER_SIZE=256217728;
set session MYISAM_SORT_BUFFER_SIZE=256217728;
set global KEY_BUFFER_SIZE=3019898880;
SET GLOBAL concurrent_insert=2;
and its took 20 minutes to load 50 million records with 4 indexes on table
What are ways to speed up INSERT queries in MySQL?…
It also depends on what engine type you are using underneath. Few points: * Developer point of view: Batch loading is the best option (LOAD DATA or multi row insert statements or Batch Insert in JDBC/ODBC) * DBA point of view: You need enough memory fo…
One definite reason of slowness of process is PK here, a varchar(100), that hit the performance with big hammer. Change it to auto inc and take the varchar(100) column as unique non clustered index and you’ll gain a boost of 60% right away.