INT and String data comparison, difference in performance because of quotes

In the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a common mistake when string data types are used for storing int or float/double representation (well sometimes you need to use string due to length or to avoid precision loss); and queries associated with that column does not quote the data to be string when searching…

In the same example; client_id was declared as VARCHAR(255); so without any quotes searching on client_id takes 11 secs:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 20207319 | Using where; Using index | 
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (11.69 sec)

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;

+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+

+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+

+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+

1 row in set (0.00 sec)

mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;

+---------+----------+

| channel | visitors |

+---------+----------+

| NULL | 0 |

+---------+----------+

1 row in set (11.69 sec)

But if you quote client_id in the search part(client_id=’1301′); then things will run much faster (0.25sec as opposed to 11.69sec) as it does not need to do the conversion, and even the plan uses the direct const checking:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys      | key                | key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | xxx_sources | ref  | idx_client_channel | idx_client_channel | 258     | const | 457184 | Using where; Using index | 
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (0.25 sec)

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;

+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+

+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+

+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+

1 row in set (0.00 sec)

mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;

+---------+----------+

| channel | visitors |

+---------+----------+

| NULL | 0 |

+---------+----------+

1 row in set (0.25 sec)

Same is the case and performance impact if data is quoted when searching on int/double/float columns. At times its worth to double check column data types and use the same notation when using them (with or without quotes)

Tags: how to write effficient queries mysql query optimization mysql slow query optimization quoting mysql data during search string int comparision using string for int data types

14 Comments

MLBR says:

April 20, 2010 at 10:47 am

I’m having trouble (re)producing the “Same is the case … if data is quoted when searching on int/double/float columns.” Version specific? Can you provide an example?
Justin Swanhart says:

April 20, 2010 at 10:59 am

http://dev.mysql.com/doc/refman/5.1/en/using-explain.html#jointype_index
Justin Swanhart says:

April 20, 2010 at 11:01 am

In short:
‘ref’ is a point lookup on the index using the provided CONST value.

‘index’ is a full-index scan which in this case can be used because the projected values are all contained within an index.

If this were a SELECT *, you would see ‘ALL’ instead of ‘index’.
Justin Swanhart says:

April 20, 2010 at 11:03 am

I meant to say prefix lookup, not point lookup. There appears to be a covering index on (client_id, channel).
Venu Anuganti says:

April 20, 2010 at 12:28 pm

Justin

Yes, anyone looking at the plan can guess that without quotes is doing a scan and when it comes to quotes its avoiding that.

I still think (did not stepped through the code to see); sometimes its worth optimizer make a smart choice by converting tuple to column type one-time and doing the compare; so that there is no impact on performance (cases like this); but currently its doing it in reverse way (converting table data to the format of tuple); and that is what is causing the impact.
Roland Bouman says:

April 20, 2010 at 3:25 pm

Hi!

“well sometimes you need to use string due to length or to avoid precision loss” ??? Say what?!!

For numeric data, I’d say you should use a DECIMAL, not a string type.
DECIMAL preserves precision. There is a limit though – If I remember correctly, you can safely use 72 digits total (that’s total adding before and after the comma)
Zuissi says:

April 20, 2010 at 4:31 pm

MySQL: INT and String data comparison, difference in performance because of quotes: In the last post choosing abou… http://bit.ly/cNYWDa
planetmysql says:

April 20, 2010 at 4:32 pm

INT and String data comparison, difference in performance because of quotes: In the last post choosing about the r… http://bit.ly/9KzNGA
ProgramaciÃ³n Web says:

April 20, 2010 at 4:51 pm

INT and String data comparison, difference in performance because of quotes http://goo.gl/fb/j8Cex #database #mysql
Shlomi Noach says:

April 20, 2010 at 8:44 pm

Echoing Roland’s words,

Also noting that DECIMAL has a much more compact storage than VARCHAR.
Justin Swanhart says:

April 21, 2010 at 11:08 am

Venu,

My point was that the performance could be MUCH worse if there was not a covering index, particularly if this table has lots of columns. You are sequentially scanning the index when the type=’index’, but if you changed SELECT clause to * you would get an ‘ALL’ which is MUCH worse.

You basically have the best-case scenario represented here, when in fact often performance could be much worse.

You would see the same plan and exact same performance for:
select channel from xxx_sources

or

select channel from xxx_sources where upper(channel) = ‘XYZ’;
Justin Swanhart says:

April 21, 2010 at 11:09 am

You also get lucky with the covering index because you avoid filesort for the GROUP BY.
Justin Swanhart says:

April 21, 2010 at 11:11 am

I should have said client_id not channel, since client_id is the prefix. index won’t be used for compound index (client_id, channel) if you project out channel. You’ll get an ALL instead.
Justin Swanhart says:

April 21, 2010 at 11:13 am

Oh, and one last thing. I meant my comments to address why MBLR might not have seen the same plan. Maybe he used * or did not have a covering index.

INT and String data comparison, difference in performance because of quotes

14 Comments

Leave a Reply Cancel reply

Recent Posts

Categories

Recent Comments

Archives

Share this:

Related posts:

14 Comments

Leave a Reply Cancel reply

Recent Posts

Categories

Recent Comments

Archives