March 12, 2012

PostgreSQL : Slony Replication Lag – Manual Log Tables Cleanup

If you using slony for PostgreSQL replication for more than one node (multiple replicas or cascading); then occasionally you might experience delays in replication if any one node is lagging behind.

This normally happens when you add new node or big table or large table-set to existing cluster or for some reason node is behind due to maintenance. Because of this; slony keeps all log events ( in sl_log_1 and sl_log_2 ) and can’t rotate/truncate log tables periodically as scheduled.

For optimal SYNC events, one should ensure sl_log_1 and sl_log_2 rows to be limited to less than 5M rows (depending on environment), else each SYNC might take longer time, and can increase over the period of time as these log tables starts growing, eventually slowing down everything and can lead to situation where the subscribers can’t keep up with the master node.

In general, each SYNC should take less than 1-2 secs for optimal performance. For example, lets consider the following cluster setup (name: prodcluster) using PostgreSQL 8.4 + Slony 2.0.x:

4  - MASTER
6  - SUBSCRIBER FROM 4 + LOCAL EVENT FORWARD to 7
7  - SUBSCRIBER FROM 4, 6
13 - SUBSCRIBER FROM 4

So, any lag in NODE 4, can cause everything to stale. And here is an example SYNC events to consider at NODE 7, where each SYNC from Node 4 is taking ~10-11 SECS

grep 'SYNC .* done' /var/log/slony/slon_log|tail
2012-03-11 10:05:13 CDTINFO   remoteWorkerThread_6: SYNC 5004941896 done in 0.024 seconds
2012-03-11 10:05:15 CDTINFO   remoteWorkerThread_6: SYNC 5004941897 done in 0.726 seconds
2012-03-11 10:05:21 CDTINFO   remoteWorkerThread_4: SYNC 5002699968 done in 11.095 seconds
2012-03-11 10:05:29 CDTINFO   remoteWorkerThread_6: SYNC 5004941898 done in 0.008 seconds
2012-03-11 10:05:32 CDTINFO   remoteWorkerThread_4: SYNC 5002699973 done in 10.681 seconds
2012-03-11 10:05:37 CDTINFO   remoteWorkerThread_6: SYNC 5004941899 done in 0.008 seconds
2012-03-11 10:05:42 CDTINFO   remoteWorkerThread_4: SYNC 5002699979 done in 10.590 seconds
2012-03-11 10:05:45 CDTINFO   remoteWorkerThread_6: SYNC 5004941900 done in 0.008 seconds
2012-03-11 10:05:53 CDTINFO   remoteWorkerThread_4: SYNC 5002699984 done in 10.682 seconds
2012-03-11 10:05:59 CDTINFO   remoteWorkerThread_6: SYNC 5004941901 done in 0.013 seconds

As you can see, each SYNC from node 4 is taking ~10-11 secs. By looking at the slony daemon log files in NODE 4, we can find out that logs are not getting truncated due to locking issue (simulated for demonstration, in real-world you normally get into rotation issues when slaves are lagging).

NOTICE:  Slony-I: could not lock sl_log_1 - sl_log_1 not truncated

So, we need to find the root cause to see why there is a pending LOCK or why the events are not yet consumed by the subscriber. Slony daemon logs are good source of information for troubleshooting. Another easiest way to get around the issue is to delete any old events that has been already SYNCed to subscribers, especially this is important when adding a new node or big tables to cluster. You can get current SYNC information, by using the following query:

db1=# select ev_origin, ev_seqno, "pg_catalog".txid_snapshot_xmin(ev_snapshot)
      from _prodcluster.sl_event where (ev_origin, ev_seqno) in (select ev_origin, min(ev_seqno)
      from _prodcluster.sl_event where ev_type = 'SYNC' group by ev_origin);
 
ev_origin |  ev_seqno  | txid_snapshot_xmin
-----------+------------+--------------------
6  | 5004941853 |          302628906
7  | 5000000122 |           66475860
4  | 5002699761 |          277114491
13 | 5000000010 |            1499335
(4 rows)

Means, any sequence less than 5002699761 has been SYNCed to all subscribers from Node 4; which means those records if exists can be deleted safely from slony log tables:

db1=# select count(1) from _prodcluster.sl_log_1;
count
---------
8061398
(1 row)
 
db1=# select count(1) from _prodcluster.sl_log_2;
count
---------
6415322
(1 row)
 
db1=# select count(1) from _prodcluster.sl_log_1 where log_origin=4 and log_txid < 5002699761;
count
---------
8061398
(1 row)
 
db1=# select count(1) from _prodcluster.sl_log_2 where log_origin=4 and log_txid < 5002699761;
count
---------
6433414
(1 row)

That’s pretty much 90% of the table’s data and can be deleted and vacuumed. And, doing so ..

db1=# delete from _prodcluster.sl_log_1 where log_origin=4 and log_txid < 5002699761;
DELETE 8061398
db1=# delete from _prodcluster.sl_log_2 where log_origin=4 and log_txid < 5002699761;
DELETE 6533972
db1=# vacuum _prodcluster.sl_log_1;
VACUUM
db1=# vacuum _prodcluster.sl_log_2;
VACUUM

And immediately we can see SYNC performing much much better, and slave will be caught up pretty soon until the log tables starts growing again (and if the table can’t be rotated + truncated )

grep 'SYNC .* done' /var/log/slony/slon_log|tail
2012-03-11 10:45:01 CDTINFO   remoteWorkerThread_4: SYNC 5002701126 done in 1.294 seconds
2012-03-11 10:45:04 CDTINFO   remoteWorkerThread_6: SYNC 5004942153 done in 0.010 seconds
2012-03-11 10:45:06 CDTINFO   remoteWorkerThread_4: SYNC 5002701128 done in 1.583 seconds
2012-03-11 10:45:10 CDTINFO   remoteWorkerThread_4: SYNC 5002701130 done in 1.530 seconds
2012-03-11 10:45:15 CDTINFO   remoteWorkerThread_4: SYNC 5002701133 done in 1.453 seconds
2012-03-11 10:45:16 CDTINFO   remoteWorkerThread_6: SYNC 5004942154 done in 0.010 seconds
2012-03-11 10:45:19 CDTINFO   remoteWorkerThread_4: SYNC 5002701135 done in 1.291 seconds
2012-03-11 10:45:23 CDTINFO   remoteWorkerThread_4: SYNC 5002701137 done in 1.269 seconds
2012-03-11 10:45:26 CDTINFO   remoteWorkerThread_6: SYNC 5004942155 done in 0.010 seconds
2012-03-11 10:45:27 CDTINFO   remoteWorkerThread_4: SYNC 5002701139 done in 1.270 seconds

Its one easy way to get around the problem especially when you add new node to cluster or have lot of backlog (Its part of what cleanupevent() does).

December 2, 2010

MySQL At Scale – Zynga Games

Recently am part of Zynga‘s database team as I was pretty much impressed with company’s database usage. As everyone knows how popular Zynga games like Farmville, Cafe World, Mafia Wars, Poker, FrontierVille, FishVille, PetVille and Treasure Island etc are. Zynga launched yet another new game today called CityVille along with series of acquisitions (latest today is NewToy). You can find current Zynga game stats from appdata.

app_full_proxyBut lot of people asked me why I am part of Zynga database team when there is no MySQL being used by any of the games; and lot of articles on the web also indicate the same. For me it does not matter if it is MySQL or NoSQL or any other system as long as the data store can help to scale the systems and/or games in this case.

As a consultant, I help lot of other companies to scale using NoSQL systems apart from MySQL especially on large data handling; as the data store solution should help to scale the systems to yield the desired results; especially MySQL should be used for typical OLTP workloads and combination of MySQL and NoSQL or any other data warehouse clusters for analytics and/or OLAP workloads by combining with right application and caching components based on the business model and how the data is generated, stored, accessed and processed.

If you don’t use the right technology for what you trying to achieve, then you can’t easily scale, and end up spending time in fixing the performance and scalability issues on day to day basis rather than concentrating on building features that is demanded by the business.

As a matter of fact, Zynga may be the second largest MySQL user after Facebook. All games at Zynga are currently powered by MySQL as the backend storage along with memcache as the middle caching layer.

Last month we expanded the MySQL shards to one of the popular game due to increased DAU (Daily Average Users), and the whole expansion of MySQL shards in production happened without any down-time or taking the game down; which is only possible if the application code is tightly integrated with the caching, backend storage and also if the servers are in the cloud and elastic in nature (unless you have your own private cloud).

July 19, 2010

Random Pauses In MySQL – File Handle Serialization

Last month, I blogged about a case involving InnoDB, where all threads acting on InnoDB tables completely stuck for about few hours doing nothing; until we found a way to get around and make the threads to run and do the actual work.

There are few more cases where the server can get into pause state without doing anything for a brief time; this is irrespective of storage engines (well, sort of). Even though server is still active and doing nothing; but blocked on certain conditions and every other thread for the most part has to wait, mostly because of serialization

Here is one such case that involves with THR_OPEN_lock mutex, which is mainly used for opening and closing file handles. When the table is flushed or when it is not in the table cache, then it opens and caches it(open_cache structure) for further use, so that sub-sequent operations can (re)use it without re-opening. Even if the table is InnoDB or any other storage engine type, the frm file has to go through the same mechanism. Not just the regular tables; but implicit file based temporary tables and all SELECT … OUTFILES and LOAD DATA LOCAL. uses the same lock to open and close.

For example; here is a live server thread dump; where most of the threads are blocked on this THR_OPEN_lock mutex:

    129 
     71 pthread_cond_wait@@GLIBC_2.3.2,end_thread,handle_one_connection,start_thread,clone
     25 read,vio_read,my_real_read,my_net_read,handle_one_connection,start_thread,clone
     14 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      5 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread,start_thread,clone
      1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
      1 select,handle_connections_sockets,main
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_master_thread,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,openfrm,open_unireg_entry,open_table,open_tables,open_and_lock_tables,select_like_stmt_test_with_open_n_lock,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,MYSQL_LOG::close,MYSQL_LOG::new_file,rotate_relay_log,process_io_rotate,queue_event,handle_slave_io,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,mi_close,free_tmp_table,close_thread_tables,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Query_log_event::exec_event,handle_slave_sql,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Prepared_statement::cleanup_stmt,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 do_sigwait,sigwait,signal_hand,start_thread,clone
      1 close,my_close,select_to_file::send_eof,do_select,JOIN::exec,mysql_select,handle_select,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone

In the above stack trace, close to 11 threads are blocked and waiting on THR_LOCK_open during either open or close of file handles as last thread is taking time in closing the OUTFILE file descriptor, which holds the lock. The following thread states in the "SHOW PROCESSLIST" is a good indication of such a symptom.

  • opening tables
  • opening table
  • closing tables
  • query end
  • init
  • end

There is an open scalability bug on this and hoping that future versions of MySQL will address this. The best bet will be to split and use different mutex locks between regular tables, implicit temporary tables; input/output files, so that main table operations are not blocked by rest of the temporary operations or maintain a separate fd list and use atomic operations to toggle the state by getting rid of this lock on regular open/close operations. The symptom also exhibits when one is deleting a large table or using large IO buffering.

June 23, 2010

How read_buffer_size Impacts Write Buffering and Write Performance

Even though the name read_buffer_size implies that the variable controls only read buffering, but it actually does dual purpose by providing sequential IO buffering for both reads and writes.

In case of write buffering, it groups the sequential writes until read_buffer_size(it is min(read_buffer_size, 8K)); and then actually does the physical write once the buffer is full. In most cases; this value is the initial value of read_buffer_size when server actually started first time; as this is a dynamic global variable; even if you change the value dynamically at run time; it will not affect write buffering size (and in some cases of read buffering as well) as this is stored one-time in my_default_record_cache_size (might be a bug ?); and that variable is used in initializing IO cache buffers.

Here is some use cases where read_buffer_size is actually used for buffering writes:

  • SELECT INTO … OUTFILE ‘fileName
    • When writing to the OUTFILE, the writes are buffered before writing to OUTFILE
  • When filesort is used, during merge buffers and when merged results are written to a temporary file, then writes are buffered

Normally you will see performance boost due to buffering on slower write disks or when you have IO saturation; but it does not matter when you have descent disks or raid controller with write_cache with BBU enabled.

Here is some stats on how many physical writes are actually posted for a simple SELECT … INTO OUTFILE (file size 384936838 bytes) for variable read_buffer_size values (server needs to be restarted in-order to get the new value):

read_buffer_size physical writes exe time in secs
=0 (defaults to 8200) 23495 28.39
=131072 (default) 2937 27.44
=16777216 23 26.71
=33554432 12 26.00
=536870912 1 26.72

Total writes are calculated using simple patch that I wrote around mysys/my_write.c to get the real physical writes posted as a global status counter.

mysql> show global status like 'Write_count';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Write_count   | 23496 |
+---------------+-------+
1 row in set (0.00 sec)

As you can see, increase in read_buffer_size might save total physical writes and might help if you have lot of OUTFILEs or heavy file sorting to some extent; but again this will actually affect overall performance due to one of the known bug and the buffer is also allocated per query based; so be careful as allocation and initialization of big buffers are much costlier than real IO cost.

In either case; may be worth if the sequential read buffering is actually controlled by read_buffer_size and introduce new write_buffer_size that controls the write buffering instead of using the same for both.

May 11, 2010

MySQL Query Engine Scalability Issues

Lately in the MySQL community, we only hear about scalability or performance improvements of storage engines, but nothing about query engine itself. For example, one classic example being InnoDB; if we look back all the scalability issues that community reported a year back or even few months back; most part of those issues have been fixed in forth coming MySQL 5.5 version (or even Percona Server or Facebook patches).

Even if you look at the new storage engines that are in development, they all going to concentrate on existing scalability issues that are common to any storage engine, and they will address it before it gets into beta or production ready.

But most part; we still have enough issues pending in the MySQL query engine; that is something that can’t be addressed by storage engines unless MySQL platform itself gets redesigned such that everything is plug and play and storage engine can override any behavior of the query engine (like how late binding works, but it is very un-likely due to lot of issues in terms of implementation as storage engines can’t have complete control over query engine, like allowing storage engines to do sorting / plan execution / caching etc; but this also deviates the main purpose of storage engine itself)

Here comes some known pending issues within query engine, that makes it hard to scale on designing larger systems where queries per second (QPS) is highly demanded:

Negative Scalability of Buffers

There is a negative performance with few global/session buffer variables; and there are number of open bugs on this.

  • table_cache
  • read_buffer_size
  • sort_buffer_size
    • Recently there are number of community blog posts on this sort_buffer_size scalability (here, here and here).

      The main problem is not because of memory allocation, but its because of making array of string pointers to keep the sorted data, and this array initialization (make_char_array() in filesort.cc) is what is causing the bottleneck as it directly depends on buffer size. One option is to skip this initialization of char array completely or use dynamic initialization as you sort (might slow down a bit for smaller sort keys); and InnoDB especially takes a bigger hit in performance if you have larger sort_buffer_size as it gives estimated rows (estimated_rows_upper_bound) being more than actual rows.

    • Here is sysbench performance results of negative scalability of sort_buffer_size for read-only queries (completely in memory data); and notice the performance drop by ~10 times from 2MB to 128MB
      sort_buffer_size threads=16 (txn/sec) threads=32 (txn/sec)
      2097144 (2MB,default) 6475 6302
      6291456 (6MB) 6123 6018
      33554432 (32MB) 3481 1590
      67108864 (64MB) 2084 1085
      134217728 (128MB) 1024 748
    • If you have a slave server without any other workload or server with very few threads < 5-10; then none of these might cause any major performance impact.

Query Cache Scalability

Most of the servers that I manage; first thing I do is to turn query cache completely off (query_cache_size=0 and query_cache_type=0) as much as possible unless the cache hit ratio is > 20%. The performance really degrades by magnitude especially if you have larger cache size and more queries in the cache.

This is something that MySQL should address either by allowing relaxed caching model (cache expires automatically after x secs instead of write-invalidation) or per table caching model without the overhead of contention. But still this is something that needs to be completely re-designed.

Key Buffer Size Scalability

Even though this is completely used by MyISAM; but still one thing that needs to be addressed as most of the servers still use MyISAM where transactions are not required due to its simplicity in administration and performance.

May be worth to support multiple key cache buffers similar to that of  multiple buffer pools introduced by InnoDB in 5.5 or even per table cache buffers.

ON Duplicate Key Update Performance

Yet another widely used feature from MySQL is ON DUPLICATE KEY UPDATE; but this also takes a hit in performance as it starts working on larger tables especially if it needs to update more number of duplicated rows. One option is to push the logic to storage engine for in-line update of the duplicated row as it searches; instead of query engine controlling this with multi-pass iterations to storage engine.

Recently there was a discussion on this in the internals list

Status Checking

SHOW SESSION/GLOBAL STATUS or VARIABLES uses temporary table(in-memory or disk does not matter); even though this is something that is widely used by most of the monitoring tools every few seconds if not minutes; and MySQL should avoid using temporary table for this by having a pre-allocated heap for this.

This will help lot of monitoring tools to judge how many real temporary tables has been actually created by real queries; right now Created_tmp_tables status variable gets incremented for every SHOW STATUS/VARIABLE command.

Mutex Locks

Most of the functions still use common mutex locks even though there is no dependency; which causes contention and un-necessary waits on busy server; so MySQL should split and use individual locks of their own as much as possible, especially LOCK_thread_count is a nightmare and even causes SHOW STATUS to be blocked in most cases.

Pre-allocation of Query Buffers

As memory is cheap; and most of the large end systems make use of 32/64/128G memory; it may be worth MySQL to consider to support pre-allocation of query buffers (like join_buffer_size, sort_buffer_size etc) by exposing preallocate_sort_buffer_size=XX, preallocate_join_buffer_size=XX, so that  pre-allocation size gets re-used by x-threads in parallel

For example, lets say one sets preallocate_join_buffer_size=256M and join_buffer_size=32M; then 8 parallel queries ( x join_threads = prellocate_join_buffer_size/join_buffer_size) can re-use the same pre-allocated pool memory instead of allocating and de-allocating from the heap; and if it runs out of pre-allocated pool memory, then it gets from the heap and releases back.

Re-usable Common Code, Distributed and Pluggable Components

One thing that could encourage more plugins or storage engines or even more developers to support the community is by having common re-usable code like separate Lock Management, Parser Management, Cache Management, User Management, Parsing, Storage Management (read/write wrappers), Central Information Schema Management.. etc; so that every time some one needs to write a new storage engine, they don’t need to re-invent the wheel from scratch.

Like solving, how one can develop or integrate a key/value/NoSQL storage engine with least development cycles by making use of existing components within MySQL code base.

April 20, 2010

INT and String data comparison, difference in performance because of quotes

In the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a common mistake when string data types are used for storing int or float/double representation (well sometimes you need to use string due to length or to avoid precision loss); and queries associated with that column does not quote the data to be string when searching…

In the same example; client_id was declared as VARCHAR(255); so without any quotes searching on client_id takes 11 secs:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 20207319 | Using where; Using index | 
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (11.69 sec)

But if you quote client_id in the search part(client_id=’1301′); then things will run much faster (0.25sec as opposed to 11.69sec) as it does not need to do the conversion, and even the plan uses the direct const checking:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys      | key                | key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | xxx_sources | ref  | idx_client_channel | idx_client_channel | 258     | const | 457184 | Using where; Using index | 
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (0.25 sec)

Same is the case and performance impact if data is quoted when searching on int/double/float columns. At times its worth to double check column data types and use the same notation when using them (with or without quotes)

April 19, 2010

Choosing the right data type makes a big difference

Today evening one of my friend asked me in the IM to look into one of his production server where a query was taking ~11 seconds to run on 20 million row table, even though the query is using the right index and the plan as shown below:

mysql&gt; explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                                                     |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
1 row in set (0.01 sec)
 
mysql&gt; SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (11.61 sec)
 
mysql&gt; show table status like 'xxx_sources'\G
*************************** 1. row ***************************
           Name: xxx_sources
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 19882760
 Avg_row_length: 46
    Data_length: 926941184
Max_data_length: 0
   Index_length: 1188233216
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-04-15 21:03:37
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 0 kB
1 row in set (0.21 sec)

Quickly looking at the plan; I added client_id in the group by to avoid temporary table, and the new plan looks much better, but still took same time for execution (well, cost of temp and copy is cheap in this case)..

mysql&gt; explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)

and then examined the data and noticed that client_id was declared as VARCHAR(255) even though the client_id data is all int; quickly changing client_id to int made a big difference as the query execution took only ~0.24 secs

mysql&gt; SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (0.24 sec)

The performance difference is too big after changing the type to int. This is just an example; but I noticed lot of tables with VARCHAR(64) or VARCHAR(255) or VARCHAR(512) (or even TEXT at times).. as default types even though they store at max of 10-15 bytes of data; not sure why anyone do that; as this is something that must be followed as rule #1 when designing schema. Even if you are not directly querying on that column; it is always better to design a schema with right type and storage so that it is optimal in terms of storage space and performance.

April 16, 2010

MySQL 5.5 – A Community Winner

Ever since MySQL 5.5 beta has been announced by Edward Screven, Oracle’s chief corporate architect; there is lot of positive buzz (here, here, …) about the performance and scalability improvements added in this release. We should all be thankful to Michael Ronstrom (as most of the key developers are already working on different forks), who did a great job in the improvements especially scalability related that allows to scale beyond 16 cores by improving the performance by 2-5X in most common workloads. Not to forget about numerous improvements to replication by replication team.

Even though 5.5 has lot of new improvements officially from Sun/Oracle; but some of the changes are actually driven by community (yet another thanks to Google, Mark Callaghan and his team, Percona and his team, Facebook etc) and most of the ideas or patches were already floating for a while and they were used in the production as well (5.0 or 5.1). This is actually a good sign that community can look forward for 5.5 GA instead of worrying about what patches and builds to use.

This is a clear indication that 5.5 performance and scalability improvements were actually driven by community.

Key improvements in 5.5:

  1. InnoDB changes in 1.1
    • Multiple buffer pools (controlled by innodb_buffer_pool_instances)
    • Multiple rollback segments
    • Splitting of purge operation from main background thread (controlled by innodb_purge_threads)
    • New log_buf mutex now controls the mini transaction writes in buffer pool instead of shared log_sys, reduces the contention on buffer pool
    • Separate mutex for flush list handling, reduces the contention on buffer pool
    • Improved recovery time
  2. Rest of the changes as part of InnoDB plugin 1.0.x
  3. Numerous replication related changes

Even though they announced InnoDB as the default storage engine in 5.5; but the latest build still has MyISAM as the default

Server version: 5.5.4-m3 MySQL Community Server (GPL)
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                        | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| InnoDB             | YES     | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        |
| MRG_MYISAM         | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |
| BLACKHOLE          | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |
| CSV                | YES     | CSV storage engine                                             | NO           | NO   | NO         |
| MyISAM             | DEFAULT | Default engine as of MySQL 3.23 with great performance         | NO           | NO   | NO         |
| ARCHIVE            | YES     | Archive storage engine                                         | NO           | NO   | NO         |
| FEDERATED          | NO      | Federated MySQL storage engine                                 | NULL         | NULL | NULL       |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                             | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.00 sec)