INT and String data comparison, difference in performance because of quotes
In the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a common mistake when string data types are used for storing int or float/double representation (well sometimes you need to use string due to length or to avoid precision loss); and queries associated with that column does not quote the data to be string when searching…
In the same example; client_id was declared as VARCHAR(255); so without any quotes searching on client_id takes 11 secs:
mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel; +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ | 1 | SIMPLE | xxx_sources | index | idx_client_channel | idx_client_channel | 1032 | NULL | 20207319 | Using where; Using index | +----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+ 1 row in set (0.00 sec) mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (11.69 sec) |
But if you quote client_id in the search part(client_id=’1301′); then things will run much faster (0.25sec as opposed to 11.69sec) as it does not need to do the conversion, and even the plan uses the direct const checking:
mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel; +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ | 1 | SIMPLE | xxx_sources | ref | idx_client_channel | idx_client_channel | 258 | const | 457184 | Using where; Using index | +----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+ 1 row in set (0.00 sec) mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel; +---------+----------+ | channel | visitors | +---------+----------+ | NULL | 0 | +---------+----------+ 1 row in set (0.25 sec) |
Same is the case and performance impact if data is quoted when searching on int/double/float columns. At times its worth to double check column data types and use the same notation when using them (with or without quotes)
Related posts:
11 Comments





I’m having trouble (re)producing the “Same is the case … if data is quoted when searching on int/double/float columns.” Version specific? Can you provide an example?
Comment :: April 20, 2010 @ 10:47 am
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html#jointype_index
Comment :: April 20, 2010 @ 10:59 am
In short:
‘ref’ is a point lookup on the index using the provided CONST value.
‘index’ is a full-index scan which in this case can be used because the projected values are all contained within an index.
If this were a SELECT *, you would see ‘ALL’ instead of ‘index’.
Comment :: April 20, 2010 @ 11:01 am
I meant to say prefix lookup, not point lookup. There appears to be a covering index on (client_id, channel).
Comment :: April 20, 2010 @ 11:03 am
Justin
Yes, anyone looking at the plan can guess that without quotes is doing a scan and when it comes to quotes its avoiding that.
I still think (did not stepped through the code to see); sometimes its worth optimizer make a smart choice by converting tuple to column type one-time and doing the compare; so that there is no impact on performance (cases like this); but currently its doing it in reverse way (converting table data to the format of tuple); and that is what is causing the impact.
Comment :: April 20, 2010 @ 12:28 pm
Hi!
“well sometimes you need to use string due to length or to avoid precision loss” ??? Say what?!!
For numeric data, I’d say you should use a DECIMAL, not a string type.
DECIMAL preserves precision. There is a limit though – If I remember correctly, you can safely use 72 digits total (that’s total adding before and after the comma)
Comment :: April 20, 2010 @ 3:25 pm
Echoing Roland’s words,
Also noting that DECIMAL has a much more compact storage than VARCHAR.
Comment :: April 20, 2010 @ 8:44 pm
Venu,
My point was that the performance could be MUCH worse if there was not a covering index, particularly if this table has lots of columns. You are sequentially scanning the index when the type=’index’, but if you changed SELECT clause to * you would get an ‘ALL’ which is MUCH worse.
You basically have the best-case scenario represented here, when in fact often performance could be much worse.
You would see the same plan and exact same performance for:
select channel from xxx_sources
or
select channel from xxx_sources where upper(channel) = ‘XYZ’;
Comment :: April 21, 2010 @ 11:08 am
You also get lucky with the covering index because you avoid filesort for the GROUP BY.
Comment :: April 21, 2010 @ 11:09 am
I should have said client_id not channel, since client_id is the prefix. index won’t be used for compound index (client_id, channel) if you project out channel. You’ll get an ALL instead.
Comment :: April 21, 2010 @ 11:11 am
Oh, and one last thing. I meant my comments to address why MBLR might not have seen the same plan. Maybe he used * or did not have a covering index.
Comment :: April 21, 2010 @ 11:13 am