In the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a common mistake when string data types are used for storing int or float/double representation (well sometimes you need to use string due to length or to avoid precision loss); and queries associated with that column does not quote the data to be string when searching…
In the same example; client_id was declared as VARCHAR(255); so without any quotes searching on client_id takes 11 secs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel; +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | 1 | SIMPLE | xxx_sources | index | idx_client_channel | idx_client_channel | 1032 | NULL | 20207319 | Using where; Using index | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ 1 row in set (0.00 sec) mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (11.69 sec) |
But if you quote client_id in the search part(client_id=’1301′); then things will run much faster (0.25sec as opposed to 11.69sec) as it does not need to do the conversion, and even the plan uses the direct const checking:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel; +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ | 1 | SIMPLE | xxx_sources | ref | idx_client_channel | idx_client_channel | 258 | const | 457184 | Using where; Using index | +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ 1 row in set (0.00 sec) mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (0.25 sec) |
Same is the case and performance impact if data is quoted when searching on int/double/float columns. At times its worth to double check column data types and use the same notation when using them (with or without quotes)
I’m having trouble (re)producing the “Same is the case … if data is quoted when searching on int/double/float columns.” Version specific? Can you provide an example?
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html#jointype_index
In short:
‘ref’ is a point lookup on the index using the provided CONST value.
‘index’ is a full-index scan which in this case can be used because the projected values are all contained within an index.
If this were a SELECT *, you would see ‘ALL’ instead of ‘index’.
I meant to say prefix lookup, not point lookup. There appears to be a covering index on (client_id, channel).
Justin
Yes, anyone looking at the plan can guess that without quotes is doing a scan and when it comes to quotes its avoiding that.
I still think (did not stepped through the code to see); sometimes its worth optimizer make a smart choice by converting tuple to column type one-time and doing the compare; so that there is no impact on performance (cases like this); but currently its doing it in reverse way (converting table data to the format of tuple); and that is what is causing the impact.
Hi!
“well sometimes you need to use string due to length or to avoid precision loss” ??? Say what?!!
For numeric data, I’d say you should use a DECIMAL, not a string type.
DECIMAL preserves precision. There is a limit though – If I remember correctly, you can safely use 72 digits total (that’s total adding before and after the comma)
MySQL: INT and String data comparison, difference in performance because of quotes: In the last post choosing abou… http://bit.ly/cNYWDa
INT and String data comparison, difference in performance because of quotes: In the last post choosing about the r… http://bit.ly/9KzNGA
INT and String data comparison, difference in performance because of quotes http://goo.gl/fb/j8Cex #database #mysql
Echoing Roland’s words,
Also noting that DECIMAL has a much more compact storage than VARCHAR.
Venu,
My point was that the performance could be MUCH worse if there was not a covering index, particularly if this table has lots of columns. You are sequentially scanning the index when the type=’index’, but if you changed SELECT clause to * you would get an ‘ALL’ which is MUCH worse.
You basically have the best-case scenario represented here, when in fact often performance could be much worse.
You would see the same plan and exact same performance for:
select channel from xxx_sources
or
select channel from xxx_sources where upper(channel) = ‘XYZ’;
You also get lucky with the covering index because you avoid filesort for the GROUP BY.
I should have said client_id not channel, since client_id is the prefix. index won’t be used for compound index (client_id, channel) if you project out channel. You’ll get an ALL instead.
Oh, and one last thing. I meant my comments to address why MBLR might not have seen the same plan. Maybe he used * or did not have a covering index.