Today evening one of my friend asked me in the IM to look into one of his production server where a query was taking ~11 seconds to run on 20 million row table, even though the query is using the right index and the plan as shown below:

mysql> explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                                                     |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
1 row in set (0.01 sec)
 
mysql> SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (11.61 sec)
 
mysql> show table status like 'xxx_sources'\G
*************************** 1. row ***************************
           Name: xxx_sources
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 19882760
 Avg_row_length: 46
    Data_length: 926941184
Max_data_length: 0
   Index_length: 1188233216
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-04-15 21:03:37
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 0 kB
1 row in set (0.21 sec)

Quickly looking at the plan; I added client_id in the group by to avoid temporary table, and the new plan looks much better, but still took same time for execution (well, cost of temp and copy is cheap in this case)..

mysql> explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)

and then examined the data and noticed that client_id was declared as VARCHAR(255) even though the client_id data is all int; quickly changing client_id to int made a big difference as the query execution took only ~0.24 secs

mysql> SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (0.24 sec)

The performance difference is too big after changing the type to int. This is just an example; but I noticed lot of tables with VARCHAR(64) or VARCHAR(255) or VARCHAR(512) (or even TEXT at times).. as default types even though they store at max of 10-15 bytes of data; not sure why anyone do that; as this is something that must be followed as rule #1 when designing schema. Even if you are not directly querying on that column; it is always better to design a schema with right type and storage so that it is optimal in terms of storage space and performance.

  • Print
  • Twitter
  • Facebook
  • PDF
  • Google Bookmarks

Related posts:

  1. INT and String data comparison, difference in performance because of quotes
  2. MySQL – Simple ways to avoid filesort
  3. When indexes are created in internal temporary tables
  4. MySQL 5.1 – query optimizer regression