Here is a quick comparison of the new InnoDB plugin performance between different compression, row formats that is introduced recently.
The table is a pretty simple one:
1 2 3 4 5 6 7 8 9 10 |
<span class="kwrd">CREATE</span> <span class="kwrd">TABLE</span> `sbtest` ( `id` <span class="kwrd">int</span>(10) unsigned <span class="kwrd">NOT</span> <span class="kwrd">NULL</span>, `k` <span class="kwrd">int</span>(10) unsigned <span class="kwrd">NOT</span> <span class="kwrd">NULL</span> <span class="kwrd">DEFAULT</span> <span class="str">'0'</span>, `c` <span class="kwrd">char</span>(120) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span> <span class="kwrd">DEFAULT</span> <span class="str">''</span>, `<span class="kwrd">pad</span>` <span class="kwrd">char</span>(60) <span class="kwrd">NOT</span> <span class="kwrd">NULL</span> <span class="kwrd">DEFAULT</span> <span class="str">''</span>, <span class="kwrd">PRIMARY</span> <span class="kwrd">KEY</span> (`id`), <span class="kwrd">KEY</span> `k` (`k`) ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8; |
The table is populated with 10M rows with average row length being 224 bytes. The tests are performed for Compact, Dynamic and Compressed (8K and 4K) row formats using MySQL-5.1.24 with InnoDB plugin-1.0.0-5.1 on Dell PE2950 1x Xeon quad core with 16G RAM, RAID-10 with RHEL-4 64-bit.
Here are the four test scenarios:
- No compression, ROW_FORMAT=Compact
- ROW_FORMAT=Compressed with KEY_BLOCK_SIZE=8
- ROW_FORMAT=Compressed with KEY_BLOCK_SIZE=4
- ROW_FORMAT=Dynamic
All the above tests are repeated with innodb_buffer_pool_size=6G and 512M to make sure one fits everything in memory and another one overflows. The rest of the InnoDB settings are all default except that innodb_thread_concurrency=32.
Here is the summary of the test results:
Table Load:
Load time from a dump of SQL script having 10M rows (not batched)
Compact | Compressed (8K) | Compressed (4K) | Dynamic |
28m 18s | 29m 46s | 36m 43s | 27m 55s |
File Sizes:
Here is the size of the .ibd file after each data load
Compact | Compressed (8K) | Compressed (4K) | Dynamic |
2.3G | 1.2G | 592M | 2.3G |
Data and Index Size from Table Status:
Here is the Data and Index size in bytes from SHOW TABLE STATUS and you can see the original data size here rather than the compressed size
Compact | Compressed (8K) | Compressed (4K) | Dynamic | |
Data | 2247098368 | 2247098368 | 2249195520 | 2247098368 |
Index | 137019392 | 137035776 | 160301056 | 137019392 |
Compression Stats:
Here is the compression stats after the table is populated from information_schema.InnoDB_cmp; and you notice that 4K takes more operations and time for both compression and un-compression
Page_size | Compress_ops | Compress_ops_ok | Compress_time | Uncompress_ops | Uncompress_time | |
8K | 8192 | 446198 | 445598 | 73 | 300 | 0 |
4K | 4096 | 1091421 | 1012917 | 463 | 38801 | 13 |
Performance:
Here is the performance of various row formats with threads ranging from 1-512 for both 512M and 6G buffer pool size for both concurrent reads and writes.
Observations:
Few key observations from the performance tests that I performed without looking to any of the sources, as I could be wrong, someone can correct me here. Its hard to draw from these input scenarios, but helps to estimate what is what.
- The load time is almost same except that the 4K compression seems to take longer than the rest; and compression in general is hitting the INSERT/Load performance a little bit.
- Compact or Dynamic, there is no compression; so the data and index file sizes will be almost same
- The SHOW TABLE STATUS for compressed table will have its original Data_Length and Index_Length statistics rather than the compressed statistics (may be a bug or InnoDB needs to extend SHOW TABLE STATUS to show any compressed sizes or other means, right now only option is to view your files manually)
- 8K compression reduced the .ibd file by nearly 50% (1.2G out of 2.3G) and 4K compression reduced the size by 1/4th (592M out of 2.3G); and it could vary based on table types and data.
- 8K compression takes less ops and time for both compression and de-compression when compared to 4K (obvious)
- When there is enough Innodb buffer pool size to act data in memory, the compression is a bit overhead, but you will be saving space
- When there is a overflow from buffer pool (IO bound), compression seems to really help
- 4K compression in general seems to be slower when compared with 8K or any other row_format.
Venu, thanks for sharing the results! Could you provide more details about the testing, such as OS and how long did you run, …?
Thanks,
Calvin
Interesting and helpful, Venu! Thanks … can you tell us something about how the table was populated? We assume this was artificial data, but since compression efficiency is very dependent on data distributions, it would be nice to know what you did. Completely random data tends not to compress much (or at all ;-)) while a string of constants like “aaaaaaaaaaaaaaaaaaaa” would compress very well of course. Is your data somewhere in between?
Since you created the key on k as part of the CREATE TABLE, you are NOT using the more efficient “fast index create” method. If you have time, it would be interesting to see you repeat this test by creating the table with only the primary key, loading the data, and then alter the table to add the index.
We appreciate your testing and thorough reporting of results. Very helpful to all!
Thanks
Ken
Calvin
The test uses sysbench; and the table is pre-populated with 10M rows before running the tests and executed using variable threads. It is on RHEL4 64-bit.
Coming to the timings; it depends on threads. So it varies. I should had another graph with timings. But in general the timings are directly propotional to the transactions performed. It varies from 90sec to 550secs for each iteration.
Venu, thanks for these results and thorough reporting. Very helpful to see this sort of stuff and your analysis.
Your CREATE TABLE includes creating the key for “k”, so this did NOT use the new “fast index create” mechanism, which can be much faster in total for loading and indexing data (and results in more dense and efficient secondary indexes as well). It would be interesting to see the performance if you create the table, load the data, then add the index with CREATE INDEX. If you have time to repeat the tests, the results could be very informative.
Also, can you tell us something more about the test and the data you used? It looks like it might have been sysbench with artificial data. Compression effectiveness is of course very sensitive to the data distributions. Completely random data won’t compress much (if at all), and values like “aaaaaaaaaaaaaaaaaaaa” would of course compress very well indeed. Can you let us know something more about the data?
Lastly, the test with a large buffer pool and 4K pages is somewhat surprising. If your data fits in memory and doesn’t compress well, that’s a poor case for compression. You end up fitting the data in memory twice: once in compressed form (1/4 or 1/2 the size) and once in original uncompressed form. So there is little reason to incur the overhead. Nevertheless, we don’t expect to see such a significant difference between 4K and 8K pages in this case.
Is it possible for you to post (or otherwise make available to us) more information? We’d like to see the contents of the Information Schema tables taken at 1-minute intervals during the run, as well as similar dumps of SHOW INNODB STATUS taken at similar intervals.
Thanks …
Ken
Sorry for the double post … don’t know why I didn’t see that my first post was logged, though it was lost. Oh well …
For some reason, your posts getting into my SPAM 🙂 so manually need to aprove it ..
Coming to answering your questions, I will answer them by repeating the test scenario as suggested (re-creation of index) and will cover all the scenarios … but bit suprise factor for me is 4K; next iteration will add 2K to see how that goes…
Thanks, Venu .. we are also quite curious about that 4K case. So, if you can add the additional reporting (InfoSchema and SHOW INNODB STATUS) that would really help. and only adds a little extra effort to each test.
THANKS!
Ken
In case if you are curious, here is the SHOW TABLE STATUS output from 4K run (after the table has been populated); but I do not have the status after the test completion…(but will capture this time)
*************************** 1. row ***************************
Name: sbtest
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 10000060
Avg_row_length: 224
Data_length: 2249195520
Max_data_length: 0
Index_length: 160301056
Data_free: 2048
Auto_increment: NULL
Create_time: 2008-04-24 17:17:24
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options: row_format=COMPRESSED
Comment:
total 606824
Great results Venu,
Though I should say compression is where using sbtest is pretty bad choice – it is designed to deal with data sizes but it does not really uses strings which would have compression properties any close to real world.
This is one of the challenge of testing system with uses compression – you do not only need to repeat the cardinalities and lengths of rows matching scenario you’re interested in but also compression properties.
Venu,
Also load time looks rather strange – what exactly did you do ? did you do dump with –no-opt so mysqldump would use single value inserts ? Or do you mean something else by “not batches” ?
Peter,
Thanks for the feedback. sysbench, I already tweaked to suite the needs, even though I couldn’t contribute the code back as nobody looks like taking any patches out there.
Anyway,
Batches mean, single inserts here. And mysqldump by default uses –opt and I did not use –no-opt for sure. I was also bit surprised on 4K/2K; but after multiple iterations, the results appear in the same way.. I may file a bug on my findings as well.. Just watch my second post today or tomorrow and let me know if you have any questions on top.
[…] is a continuation post on InnoDB plugin row format performance with few more test scenarios to cover based on the feedback and questions that I got in my […]
Venu,
Very nice work!
The fact that compression to 4K shows nearly 7% compression failures during load is indicative that this table is not a good candidate for compression to 4K. I also tried that and got same results (i.e.: nearly 7% compression failures for 4K and 0.1% failures for 8K). That is why we say in the documentation that if the compression failures are greater then 1% then it is not a good idea to compress such tables.
Performance is so sensitive to compression failures because not only the failures entail split and recompression (which is heavy duty operation) but also if we are hitting failures then we are unlikely to have much space in the page for modification log. Remember that InnoDB, attempts to avoid multiple calls to recompression by keeping a log of changes made to a page in a per page modification log. This micro journal resides in the free space within a page. If a page is compressed to a size where 16K data just barely fits into the page size then not much is left for this log and hence extra calls to recompression.
BTW: if you try to capture innodb_cmp during read/write run of sysbench you’ll see roughly 2% failures for 8K pages and nearly 60% failures for 4K.
regards,
inaam
[…] new feature of InnoDB plugin : table compression. I went for a 8k compressed page size according to Venu Anuganti’s tests and compressed a 6GB table which is one of the most used tables in our application. The table […]
[…] reduction in I/O when using the new indexes and compression on a data set like this. (Others sure have.) But I don’t yet have a sense of how stable it […]