Today evening one of my friend asked me in the IM to look into one of his production server where a query was taking ~11 seconds to run on 20 million row table, even though the query is using the right index and the plan as shown below:
mysql> explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel; +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+ | 1 | SIMPLE | xxx_sources | index | idx_client_channel | idx_client_channel | 1032 | NULL | 19205420 | Using where; Using index; Using temporary; Using filesort | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+ 1 row in set (0.01 sec) mysql> SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (11.61 sec) mysql> show table status like 'xxx_sources'\G *************************** 1. row *************************** Name: xxx_sources Engine: InnoDB Version: 10 Row_format: Compact Rows: 19882760 Avg_row_length: 46 Data_length: 926941184 Max_data_length: 0 Index_length: 1188233216 Data_free: 0 Auto_increment: NULL Create_time: 2010-04-15 21:03:37 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: InnoDB free: 0 kB 1 row in set (0.21 sec) |
Quickly looking at the plan; I added client_id in the group by to avoid temporary table, and the new plan looks much better, but still took same time for execution (well, cost of temp and copy is cheap in this case)..
mysql> explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel; +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | 1 | SIMPLE | xxx_sources | index | idx_client_channel | idx_client_channel | 1032 | NULL | 19205420 | Using where; Using index | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ 1 row in set (0.00 sec) |
and then examined the data and noticed that client_id was declared as VARCHAR(255) even though the client_id data is all int; quickly changing client_id to int made a big difference as the query execution took only ~0.24 secs
mysql> SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (0.24 sec) |
The performance difference is too big after changing the type to int. This is just an example; but I noticed lot of tables with VARCHAR(64) or VARCHAR(255) or VARCHAR(512) (or even TEXT at times).. as default types even though they store at max of 10-15 bytes of data; not sure why anyone do that; as this is something that must be followed as rule #1 when designing schema. Even if you are not directly querying on that column; it is always better to design a schema with right type and storage so that it is optimal in terms of storage space and performance.


New blog post: http://tinyurl.com/yyj24fl – Choosing the right data type makes a big difference
MySQL: Choosing the right data type makes a big difference: Today evening one of my friend asked me in the IM to l… http://bit.ly/91Dl3C
Planet MySQL. Choosing the right data type makes a big difference: Today evening one of my friend asked me in the … http://bit.ly/dAu7zo
Choosing the right data type makes a big difference http://goo.gl/fb/MvaQw #database #mysql #performance
[...] the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a [...]
Hi Venu,
As you found, the Using Filesort and Using Temporary were the key bottlenecks. It’s not possible to identify easily if these goto disk, however various status variables, even show profile can help here.
While you created a ‘Using Index’ situation which addressed the above two Extra notes, your SQL is now not from my understanding an ANSI compliant SQL where the GROUP BY must be all non scalar function columns selected. MySQL is lax in this implementation, you need an additional SQL_MODE to be most compatible — http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html#sqlmode_only_full_group_by
So, you should add client_id to your SELECT columns, while redundant in output requirements it is better and valid SQL syntax.
Your post is also right about data types a topic I talk about often. For the benefit of your readers I recommend you show a post show table status so you can demonstrate the reduction in index space.
Ronald, yes; that’s a over look without adding client_id in the column list. Normally I never use column names in the group by or order by as I prefer to use numbers; so that it avoids to make mistakes..
In the above case, it should be..
select client_id, channel, count(channel) as visitors from xxx_sources WHERE client_id = 1301 GROUP BY 1,2;
Thanks again
a quick solution is add quotes:
client_id = ’1301′
speed as you expect
Cialis without prescription….
Cialis. Cheapest cialis….
You can use procedure analyse to know the max and minimum values stored in that column before you alter the column type to INT.
http://dev.mysql.com/doc/refman/5.0/en/procedure-analyse.html
While altering the table use ‘ignore’ keyword to remove the rows having text in it.
my 2 cents