Since few days I was debugging a case from Yahoo! Japan team (thanks to Akira Kitada) where the query runs fine in about 0 secs with the 5.1.16 build where as the same query takes almost close to 2 minutes with the latest 5.1.22 release.

One can simply identify that this is a optimizer issue as it is not picking the right index. Lets consider the following two tables urls and urls_categories

| urls  | CREATE TABLE `urls` (
  `url_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `url` varbinary(2000) NOT NULL,
  `url_status` enum('EXISTS','DEAD','N/A') NOT NULL,
  `yservice_id` smallint(5) unsigned DEFAULT NULL,
  `created_by` smallint(5) unsigned NOT NULL,
  `modified_by` smallint(5) unsigned NOT NULL,
  `created_on` datetime NOT NULL,
  `modified_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `nocheck` tinyint(1) NOT NULL DEFAULT '0',
  `url_md5` char(32) NOT NULL,
  PRIMARY KEY (`url_id`),
  UNIQUE KEY `url_unique` (`url_md5`),
  KEY `modified_on` (`modified_on`),
  KEY `url` (`url`(50)),
  KEY `yservice_id` (`yservice_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
 
 
| urls_categories | CREATE TABLE `urls_categories` (
  `url_id` int(10) unsigned NOT NULL,
  `category_id` tinyint(3) unsigned NOT NULL,
  `workflow` enum('NEW','ACTIVATED','DEACTIVATED','REMOVED') NOT NULL,
  `searchtype_id` tinyint(3) unsigned NOT NULL,
  `work_id` tinyint(3) unsigned NOT NULL,
  `created_by` smallint(5) unsigned NOT NULL,
  `modified_by` smallint(5) unsigned NOT NULL,
  `created_on` datetime NOT NULL,
  `modified_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `comment` varchar(200) NOT NULL,
  PRIMARY KEY (`url_id`,`category_id`),
  KEY `search_index` (`modified_on`,`workflow`,`category_id`),
  KEY `rd_index` (`url_id`,`category_id`,`workflow`),
  KEY `category_id` (`category_id`),
  KEY `work_id` (`work_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`),
  KEY `searchtype_id` (`searchtype_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

And here is the explain output for a simple inner join query when ran with MySQL version 5.1.16:

mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: PRIMARY,rd_index
          key: search_index
      key_len: 6
          ref: NULL
         rows: 3121058
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

and here is what you get when ran with MySQL 5.1.22:

mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

as you can cleary see, for urls_categories table, the key search_index is missing and optimizer is not picking that. Even if I force to use the key, still no effect:

mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc force key(search_index,PRIMARY,rd_index)  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
 
mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc force key(search_index)  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

As a result of the above bad index use, the query takes about 1m 20 secs with 5.1.22 where as the same one runs in about 0.0 secs as shown below..

with 5.1.22:

mysql> select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10;
...
...
10 rows in set (1 min 14.55 sec)

with 5.1.16:

mysql> select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10;
...
...
10 rows in set (0.00 sec)

Here is the bug report to mysql team: MySQL 5.1 optimizer regression.

  • Print
  • Twitter
  • Facebook
  • PDF
  • Google Bookmarks

Related posts:

  1. MySQL – Simple ways to avoid filesort
  2. INT and String data comparison, difference in performance because of quotes
  3. Choosing the right data type makes a big difference
  4. MySQL 5.0 or 5.1 string to longlong regression