March 28, 2010

Dell MD1120 Storage Array Performance

Here is some file IO performance numbers from DELL MD1120 SAS storage array. Last year I did the same test with HP P800 storage array and numbers were impressive. But when it comes to this high end storage array, few surprises.  Before getting into actual details; lets see the test stats and configuration details.

System Configuration:

  1. DELL R710 with CentOS 5.4
  2. NOOP IO Scheduler
  3. MD1120 with 22 10K SAS disks
    • 20 disk RAID-10 (hardware)
    • 2 hot spares
    • Disk Cache disabled
  4. PERC 6/E RAID controller with BBU
    • Connected to DELL MD1120 using SAS
    • Write Back
    • Read Cache Disabled

Test Configuration:

  1. Sysbench fileio test with variable modes and threads
  2. 64 files with 50G total size
  3. All tests ran in un-buffered mode (O_DIRECT) as most of the workload is InnoDB based.

Test Results:

Number of Threads vs Number of Requests/Sec. Every mode ran with 5 iterations and average is taken.

Random IO:

rndio

Sequential IO:

seqio 

HDPARM Test:

[test~]# for i in `seq 1 3`; do hdparm --direct -tT /dev/sdc1; done | grep Timing
 Timing O_DIRECT cached reads:   2068 MB in  2.00 seconds = 1033.21 MB/sec
 Timing O_DIRECT disk reads:  2146 MB in  3.00 seconds = 715.32 MB/sec
 Timing O_DIRECT cached reads:   2020 MB in  2.00 seconds = 1010.26 MB/sec
 Timing O_DIRECT disk reads:  2162 MB in  3.00 seconds = 720.62 MB/sec
 Timing O_DIRECT cached reads:   2052 MB in  2.00 seconds = 1025.90 MB/sec
 Timing O_DIRECT disk reads:  2128 MB in  3.00 seconds = 709.17 MB/sec
 
[test ~]# for i in `seq 1 3`; do hdparm -tT /dev/sdc1; done | grep Timing
 Timing cached reads:   18920 MB in  2.00 seconds = 9475.34 MB/sec
 Timing buffered disk reads:  3442 MB in  3.02 seconds = 1141.44 MB/sec
 Timing cached reads:   19332 MB in  2.00 seconds = 9681.56 MB/sec
 Timing buffered disk reads:  3478 MB in  3.00 seconds = 1159.24 MB/sec
 Timing cached reads:   18012 MB in  2.00 seconds = 9019.50 MB/sec
 Timing buffered disk reads:  3492 MB in  3.02 seconds = 1155.53 MB/sec

Analysis:

  1. Overall the numbers are not bad when it comes to writes, but few surprises when it comes to reads. When compared with HP’s P800 storage array, the numbers still dropped by 20%.
  2. Radon IO:
    • Random write requests ranges from 3200-5000 per sec; due to write back mode (512M cache)
    • Writes are linearly scaling well with the threads, good sign that controller is able to manage the cache efficiently
    • Random reads and writes (rndrw) is also scaling linearly with the threads load, means the IO distribution and cache burst to satisfy reads seems be efficient as it needs to flush the data from controller cache to disk before the read can be satisfied.
  3. Sequential IO:
    • Writes seems to be scaling well even in sequential mode without much overhead
    • When it comes to reads, big surprise is drop from 5626 requests/sec to 615 from one thread to two threads. Which is really odd. Worst case it should be ~2000-3000 requests/sec; not sure where the overhead is. I can’t believe it could be thread scheduling as there is only 2 threads.
  4. During 100% IO, on and off I noticed IO serialization with higher queue waits, which indicates that there is some degree of serialization overhead in OS; but not able to track which layer is triggering this. Tried with cfq/deadline, still the same.
  5. Next attempt will be replacing 3Gb/s SAS to fiber channel HBA or 6Gb/s SAS (PERC H800) to see how it performs along with combination of HW and SW raid instead of only depending on controller.

March 21, 2010

Hyper Threading Performance

Its been a while anyone talked about Intel’s Hyper-Threading performance when it comes to databases. There were enough posts about disabling Hyper-Threading  completely when it comes to MySQL/InnoDB workloads way back when we had enough issues with scalability of InnoDB on multi-core systems. But things has changed quite a bit in the fast year or two in terms of multi-core support (thanks to Innobase/Mark Callaghan/Google and Percona). I still see lot of production servers running with HT disabled completely either in BIOS (append noht to kernel parameter) or manually disabling the CPUs (echo 0 > /sys/devices/system/node/node[0-1]/cpu[2-..]/online ).

Last few weeks or so; I ran quite a few tests on new production launch with 8-core CPUs and decided to go ahead with HT enabled (along with Intel Turbo mode) as it seems to give close to 15-20% performance gain. Ran tests which are completely InnoDB CPU/memory bound and HT seems to enable much better performance in every test mode.

Also tried with different thread schedulers with IO bound loads; still HT seems to give much better results; especially when most of the workload seems to be with 8-50 threads; and HT seems to give much better boost in that range; so no point to disable HT.

I tested both Intel Xeon X5570 and E5530 and here is the test results for CPU bound work load.

e5530

x5570

It is a clear indication that HT is not a bottleneck. In both the cases, turbo boost is enabled though.

As you can also notice; Intel Xeon X series CPUs yield much better results than E series (obvious as X-series is meant for performance where as E is for economy), which is close to $1000 extra, but yields ~15-20% performance for InnoDB workloads. You can find the comparison between these CPUs from Intel. If you are buying any new system and if you can spare extra $$; then always go for X series as its worth the price for performance

March 8, 2010

When indexes are created in internal temporary tables

During my previous post on how to improve derived tables performance, I patched the code to add indexes forcefully on internal derived table results, which made a huge difference in the performance. It was just an experiment and a thought to see if it really works without re-writing the queries, so that the logic can be pushed towards the engine rather than query re-write. \

But I got few emails in my inbox today asking whether MySQL really create any keys on internal temporary tables.

The answer is YES; and MySQL does create two keys on internal temporary tables namely ‘group_key‘ and ‘distinct_key‘ on the following conditions:

  • If there is any aggregate function and/or group-by (group_key)
  • Distinct column name(group_key)
  • Distinct in combination with group-by/aggregation functions (distinct_key)

Provided the query results are yielded in temporary table (Using temporary from the explain), else they get optimized away by the existing indexes from the regular table itself. These keys are added to both memory and disk based (MyISAM) internal temporary tables; so it does not matter if the internal temporary table is in memory or disk.

Here is a simple dump of internal temporary table index stats for some of the basic queries related to Information schema [Warning: these queries are really bad, and can't be used for any production use as they are meant for demonstration of different internal keys ]. This is a patch that I might be using for SHOW TEMPORARY TABLES when internal tables are included in the second version. The first version of the patch is already pushed to Maria branch, hoping that it gets pushed to 5.1.

-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_1f
   temp type  : MEMORY
   index count: 1
    key 1-1   : distinct_key
    field     : (null)
    key 1-2   : distinct_key
    field     : ENGINE
 query: select count(distinct engine) from information_schema.tables
-----------------------------
 
-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_21
   temp type  : MEMORY
   index count: 1
    key 1-1   : group_key
    field     : TABLE_NAME
 query: select table_name, sum(data_length+index_length) from information_schema.tables 
        where table_schema='mysql' group by 1
-----------------------------
 
-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_24
   temp type  : MEMORY
   index count: 1
    key 1-1   : group_key
    field     : TABLE_SCHEMA
    key 1-2   : group_key
    field     : TABLE_NAME
    key 1-3   : group_key
    field     : COLUMN_NAME
 query: select tab.table_schema, tab.table_name, column_name, index_name, seq_in_index 
        from Information_schema.tables tab join information_schema.statistics stast 
        using(table_schema,table_name) group by  1,2,3
-----------------------------
 
-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_bd
   temp type  : MEMORY
   index count: 1
    key 1-1   : group_key
    field     : TABLE_NAME
    key 1-2   : group_key
    field     : TABLE_TYPE
    key 1-3   : group_key
    field     : ENGINE
    key 1-4   : group_key
    field     : INDEX_SCHEMA
    key 1-5   : group_key
    field     : INDEX_NAME
 query: select tab.table_schema,  tab.table_name, table_type, engine, index_schema, 
        index_name from information_schema.tables tab join information_schema.statistics
        stats using(table_schema, table_name) where table_schema='mysql' group by 
        1,2,3,4,5, 6 order by 4,3,2,1
-----------------------------
 
-----------------------------
 TMP TABLE STATS, SESSION: 1
   temp file  : /tmp/#sqlf90_1_e0
   temp type  : MEMORY
   index count: 1
    key 1-1   : group_key
    field     : TABLE_NAME
 query: select  table_name, sum(data_length+index_length) from information_schema.tables
        where table_schema='mysql' group by 1
-----------------------------

March 6, 2010

How to improve subqueries derived tables performance

Last week I was working on one of the issue where the sub-query related to OLAP staging was running for about 2+ hours in the production server and finally nailed down to get the query to run in < 7 secs. It was bit interesting and kind of known issue in MySQL sub-queries world and one of the limitation from MySQL on giving more control over derived table results.

Sometimes we can re-write the sub-queries so that there is no derived tables complexity involved; but the bad part is; this particular sub-query is part of an UPDATE statement; so not all sub-queries can be re-written especially when they are part of UPDATE or DELETE statements due to its own limitations.

PROBLEM:

Here is the subset of the problem query and as you can see it runs for about 6 minutes in this small subset of data that I used for testing on Mac. All tables are InnoDB based.

--------------
SELECT
    SUM(aggrpt.imps) as imps,
    SUM(aggrpt.clicks) as clicks,
    SUM(aggrpt.pos) as pos
 
FROM aggrpt
LEFT JOIN
(
    SELECT
    DISTINCT ext_group_id, group_id
    FROM sub
) sub2  ON(sub2.ext_group_id=aggrpt.adgroupid)
 
GROUP BY
aggrpt.report_date,
aggrpt.campaignid,
aggrpt.adgroupid,
aggrpt.keywordid
ORDER BY NULL
INTO OUTFILE '/tmp/test-sub.txt'
-------------- 
 
Query OK, 47827 rows affected (6 min 47.48 sec)

HOW TO GET AROUND – SOLUTIONS:

Moving the derived table (in the above case sub2) to a view did not help; and timings are more or less the same. Here is the two alternative ways, which made the query run in < 10 secs.

  1. By creating external table and adding an index instead of using derived table
  2. Added an index within the mysql code on the derived table temporary results table by adding FORCE INDEX syntax (changed the syntax to support this, so that engine will create an index on temporary derived table results, which in this case happens to fit within heap engine instead of disk based)

CASE 1:

This is a known alternative and lot of people use this in production by avoiding the derived tables and/or sub-queries completely by creating tables for derived tables. The only thing that made the big difference is adding an index on this; without index it takes more or less the same 6 minutes times.

--------------
CREATE TEMPORARY TABLE sub_temp SELECT DISTINCT ext_group_id, group_id FROM sub
--------------
Query OK, 72385 rows affected (0.81 sec)
Records: 72385  Duplicates: 0  Warnings: 0
--------------
alter table sub_temp add index i_ext_group_id(ext_group_id)
--------------
Query OK, 72385 rows affected (0.10 sec)
Records: 72385  Duplicates: 0  Warnings: 0
--------------
SELECT
    SUM(aggrpt.imps) as imps,
    SUM(aggrpt.clicks) as clicks,
    SUM(aggrpt.pos) as pos
 
FROM aggrpt
LEFT JOIN sub_temp sub ON(sub.ext_group_id=aggrpt.adgroupid)
 
GROUP BY
    aggrpt.report_date,
    aggrpt.campaignid,
    aggrpt.adgroupid,
    aggrpt.keywordid
ORDER BY NULL
INTO OUTFILE '/tmp/test-sub-temp.txt'
-------------- 
 
Query OK, 47827 rows affected (10.41 sec)

As you can see its a great improvement from 6minutes to 10secs; but without index i_ext_group_id on sub-temp table; then things will be as usual.

CASE 2:

The engine should be smart enough to add the index on intermediate results temporary table (in this case sub2, index on column i_ext_group_id); but MySQL does not support this or at least should allow people to specify one using FORCE/USE INDEX, so that engine can add one.

For example; I patched MySQL to support this syntax, so that it adds the index on the derived table sub2 results automatically and the query immediately returns in 7 secs.

SELECT
    SUM(aggrpt.imps) as imps,
    SUM(aggrpt.clicks) as clicks,
    SUM(aggrpt.pos) as pos
 
FROM aggrpt
LEFT JOIN
(
    SELECT
    DISTINCT ext_group_id, group_id
    FROM sub
) sub2 <strong>USE INDEX(ext_group_id)</strong> ON(sub2.ext_group_id=aggrpt.adgroupid)
 
GROUP BY
aggrpt.report_date,
aggrpt.campaignid,
aggrpt.adgroupid,
aggrpt.keywordid
ORDER BY NULL
INTO OUTFILE '/tmp/test-sub-force.txt'
-------------- 
 
Query OK, 47827 rows affected (7.18 sec)

This is again a great improvement; but again using USE/FORCE INDEX is a hack here (this can also be implemented as hint) with column name; which acts as a hint to optimizer to create an index on that column (if its hint, then on JOIN column).

CHANGES TO MYSQL OPTIMIZER

But the idea for this post is to show how important the index in the intermediate temporary table results is for the query performance and optimizer should be smart enough to identify and add one automatically. The logic is same as how currently optimizer opts for index vs non-index scan by estimating the cost.