It is bit panic to see ‘Using filesort‘ from the extra field when one runs a explain of select query on MySQL server. At times it is bit annoy why MySQL optimizer does not avoid this as you can see from the following cases…

This is the information (Extra field) that scares a lot for many users from Explain output:

1
2
3
4
5
6
7
8
9
10
          id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index; Using temporary; Using filesort

Lets consider the following simple table (MyISAM in this case, does not matter what storage engine it is) populated with distinct random 1000000 (1M) rows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Create Table: CREATE TABLE `testtab` (
  `col1` int(11) DEFAULT NULL,
  `col2` text,
  `col3` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`col3`),
  KEY `col1key` (`col1`)
) ENGINE=MyISAM
 
mysql> select count(*) from testtab;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.00 sec)
 
mysql> show table status like 'testtab'\G
*************************** 1. row ***************************
           Name: testtab
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 1000000
 Avg_row_length: 88
    Data_length: 88000000
Max_data_length: 281474976710655
   Index_length: 26619904
      Data_free: 0
 Auto_increment: 1000001
    Create_time: 2007-11-29 20:32:19
    Update_time: 2007-11-29 20:35:03
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

And lets run a explain for a simple select query from the above table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
mysql> explain select * from testtab where col1 > 50000 AND col1 < 150000 order by col3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: NULL
         rows: 91609
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
 
mysql> EXPLAIN SELECT * FROM testtab WHERE col1 > 50000  AND col3 < 20000  ORDER BY col3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: PRIMARY,col1key
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 21209
        Extra: Using where
1 row in set (0.00 sec)
 
mysql> EXPLAIN SELECT * FROM testtab WHERE col3 < 20000 AND col1 > 50000  ORDER BY col3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: PRIMARY,col1key
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 21209
        Extra: Using where
1 row in set (0.00 sec)

The only difference from the first query (which has filesort) to that of other two are having the col3 in the where clause which appears in the ORDER BY clause…. and another example…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
mysql> EXPLAIN SELECT * FROM testtab WHERE col3 < 20000 order by col3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 21209
        Extra: Using where
1 row in set (0.00 sec)
 
mysql> EXPLAIN SELECT * FROM testtab WHERE col3 < 20000 order by col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 21209
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
 
mysql> EXPLAIN SELECT * FROM testtab WHERE col1 < 20000 order by col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: NULL
         rows: 19892
        Extra: Using where
1 row in set (0.00 sec)
 
mysql> EXPLAIN SELECT * FROM testtab WHERE col1 < 20000 order by col3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtab
         type: range
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: NULL
         rows: 19892
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

The above indicates that any column which appears in the ORDER BY clause should also appear in WHERE clause in order to avoid filesort. Consider yet another query for the same table using a self join this time…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
mysql> explain select t1.col1, t2.col2 from testtab t1 left outer join testtab t2 on (t1.col1=t2.col1) order by t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t1.col1
         rows: 11
        Extra:
2 rows in set (0.00 sec)
 
mysql> explain select t1.col1, t2.col2 from testtab t1 left outer join testtab t2 on (t1.col1=t2.col1) order by t1.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t1.col1
         rows: 11
        Extra:
2 rows in set (0.00 sec)

The only difference from the above case is “Order by column is from the left side of the join clause”… and here is yet another example..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
mysql> explain select t1.col1, t2.col2 from testtab t1 left outer join testtab t2 on (t1.col1=t2.col1) order by t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t1.col1
         rows: 11
        Extra:
2 rows in set (0.00 sec)
 
mysql> explain select t1.col1, t2.col2 from testtab t1 right outer join testtab t2 on (t1.col1=t2.col1) order by t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1000000
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t2.col1
         rows: 11
        Extra: Using index
2 rows in set (0.00 sec)
mysql> explain select t1.col1, t2.col2 from testtab t2 right outer join testtab t1 on (t1.col1=t2.col1) order by t1.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t1.col1
         rows: 11
        Extra:
2 rows in set (0.00 sec)
mysql> explain select t1.col1, t2.col2 from testtab t1 left outer join testtab t2 on (t1.col1=t2.col1) order by t1.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: col1key
      key_len: 5
          ref: NULL
         rows: 1000000
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: col1key
          key: col1key
      key_len: 5
          ref: test.t1.col1
         rows: 11
        Extra:
2 rows in set (0.00 sec)

So, the conclussion:

  1. You can avoid the filesort by making order by column appear in the where clause
  2. When using join, make sure the left side join table column is used in the ORDER BY clause or change the join type

There are so many other ways to avoid, and the complete order by optimization can be found from the MySQL documentation.