December 2, 2010

MySQL At Scale – Zynga Games

Recently am part of Zynga‘s database team as I was pretty much impressed with company’s database usage. As everyone knows how popular Zynga games like Farmville, Cafe World, Mafia Wars, Poker, FrontierVille, FishVille, PetVille and Treasure Island etc are. Zynga launched yet another new game today called CityVille along with series of acquisitions (latest today is NewToy). You can find current Zynga game stats from appdata.

app_full_proxyBut lot of people asked me why I am part of Zynga database team when there is no MySQL being used by any of the games; and lot of articles on the web also indicate the same. For me it does not matter if it is MySQL or NoSQL or any other system as long as the data store can help to scale the systems and/or games in this case.

As a consultant, I help lot of other companies to scale using NoSQL systems apart from MySQL especially on large data handling; as the data store solution should help to scale the systems to yield the desired results; especially MySQL should be used for typical OLTP workloads and combination of MySQL and NoSQL or any other data warehouse clusters for analytics and/or OLAP workloads by combining with right application and caching components based on the business model and how the data is generated, stored, accessed and processed.

If you don’t use the right technology for what you trying to achieve, then you can’t easily scale, and end up spending time in fixing the performance and scalability issues on day to day basis rather than concentrating on building features that is demanded by the business.

As a matter of fact, Zynga may be the second largest MySQL user after Facebook. All games at Zynga are currently powered by MySQL as the backend storage along with memcache as the middle caching layer.

Last month we expanded the MySQL shards to one of the popular game due to increased DAU (Daily Average Users), and the whole expansion of MySQL shards in production happened without any down-time or taking the game down; which is only possible if the application code is tightly integrated with the caching, backend storage and also if the servers are in the cloud and elastic in nature (unless you have your own private cloud).

July 19, 2010

Random Pauses In MySQL – File Handle Serialization

Last month, I blogged about a case involving InnoDB, where all threads acting on InnoDB tables completely stuck for about few hours doing nothing; until we found a way to get around and make the threads to run and do the actual work.

There are few more cases where the server can get into pause state without doing anything for a brief time; this is irrespective of storage engines (well, sort of). Even though server is still active and doing nothing; but blocked on certain conditions and every other thread for the most part has to wait, mostly because of serialization

Here is one such case that involves with THR_OPEN_lock mutex, which is mainly used for opening and closing file handles. When the table is flushed or when it is not in the table cache, then it opens and caches it(open_cache structure) for further use, so that sub-sequent operations can (re)use it without re-opening. Even if the table is InnoDB or any other storage engine type, the frm file has to go through the same mechanism. Not just the regular tables; but implicit file based temporary tables and all SELECT … OUTFILES and LOAD DATA LOCAL. uses the same lock to open and close.

For example; here is a live server thread dump; where most of the threads are blocked on this THR_OPEN_lock mutex:

    129 
     71 pthread_cond_wait@@GLIBC_2.3.2,end_thread,handle_one_connection,start_thread,clone
     25 read,vio_read,my_real_read,my_net_read,handle_one_connection,start_thread,clone
     14 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      5 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread,start_thread,clone
      1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
      1 select,handle_connections_sockets,main
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_master_thread,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,openfrm,open_unireg_entry,open_table,open_tables,open_and_lock_tables,select_like_stmt_test_with_open_n_lock,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,MYSQL_LOG::close,MYSQL_LOG::new_file,rotate_relay_log,process_io_rotate,queue_event,handle_slave_io,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,mi_close,free_tmp_table,close_thread_tables,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Query_log_event::exec_event,handle_slave_sql,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Prepared_statement::cleanup_stmt,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 do_sigwait,sigwait,signal_hand,start_thread,clone
      1 close,my_close,select_to_file::send_eof,do_select,JOIN::exec,mysql_select,handle_select,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone

In the above stack trace, close to 11 threads are blocked and waiting on THR_LOCK_open during either open or close of file handles as last thread is taking time in closing the OUTFILE file descriptor, which holds the lock. The following thread states in the "SHOW PROCESSLIST" is a good indication of such a symptom.

  • opening tables
  • opening table
  • closing tables
  • query end
  • init
  • end

There is an open scalability bug on this and hoping that future versions of MySQL will address this. The best bet will be to split and use different mutex locks between regular tables, implicit temporary tables; input/output files, so that main table operations are not blocked by rest of the temporary operations or maintain a separate fd list and use atomic operations to toggle the state by getting rid of this lock on regular open/close operations. The symptom also exhibits when one is deleting a large table or using large IO buffering.

June 23, 2010

How read_buffer_size Impacts Write Buffering and Write Performance

Even though the name read_buffer_size implies that the variable controls only read buffering, but it actually does dual purpose by providing sequential IO buffering for both reads and writes.

In case of write buffering, it groups the sequential writes until read_buffer_size(it is min(read_buffer_size, 8K)); and then actually does the physical write once the buffer is full. In most cases; this value is the initial value of read_buffer_size when server actually started first time; as this is a dynamic global variable; even if you change the value dynamically at run time; it will not affect write buffering size (and in some cases of read buffering as well) as this is stored one-time in my_default_record_cache_size (might be a bug ?); and that variable is used in initializing IO cache buffers.

Here is some use cases where read_buffer_size is actually used for buffering writes:

  • SELECT INTO … OUTFILE ‘fileName
    • When writing to the OUTFILE, the writes are buffered before writing to OUTFILE
  • When filesort is used, during merge buffers and when merged results are written to a temporary file, then writes are buffered

Normally you will see performance boost due to buffering on slower write disks or when you have IO saturation; but it does not matter when you have descent disks or raid controller with write_cache with BBU enabled.

Here is some stats on how many physical writes are actually posted for a simple SELECT … INTO OUTFILE (file size 384936838 bytes) for variable read_buffer_size values (server needs to be restarted in-order to get the new value):

read_buffer_size physical writes exe time in secs
=0 (defaults to 8200) 23495 28.39
=131072 (default) 2937 27.44
=16777216 23 26.71
=33554432 12 26.00
=536870912 1 26.72

Total writes are calculated using simple patch that I wrote around mysys/my_write.c to get the real physical writes posted as a global status counter.

mysql> show global status like 'Write_count';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Write_count   | 23496 |
+---------------+-------+
1 row in set (0.00 sec)

As you can see, increase in read_buffer_size might save total physical writes and might help if you have lot of OUTFILEs or heavy file sorting to some extent; but again this will actually affect overall performance due to one of the known bug and the buffer is also allocated per query based; so be careful as allocation and initialization of big buffers are much costlier than real IO cost.

In either case; may be worth if the sequential read buffering is actually controlled by read_buffer_size and introduce new write_buffer_size that controls the write buffering instead of using the same for both.

May 11, 2010

MySQL Query Engine Scalability Issues

Lately in the MySQL community, we only hear about scalability or performance improvements of storage engines, but nothing about query engine itself. For example, one classic example being InnoDB; if we look back all the scalability issues that community reported a year back or even few months back; most part of those issues have been fixed in forth coming MySQL 5.5 version (or even Percona Server or Facebook patches).

Even if you look at the new storage engines that are in development, they all going to concentrate on existing scalability issues that are common to any storage engine, and they will address it before it gets into beta or production ready.

But most part; we still have enough issues pending in the MySQL query engine; that is something that can’t be addressed by storage engines unless MySQL platform itself gets redesigned such that everything is plug and play and storage engine can override any behavior of the query engine (like how late binding works, but it is very un-likely due to lot of issues in terms of implementation as storage engines can’t have complete control over query engine, like allowing storage engines to do sorting / plan execution / caching etc; but this also deviates the main purpose of storage engine itself)

Here comes some known pending issues within query engine, that makes it hard to scale on designing larger systems where queries per second (QPS) is highly demanded:

Negative Scalability of Buffers

There is a negative performance with few global/session buffer variables; and there are number of open bugs on this.

  • table_cache
  • read_buffer_size
  • sort_buffer_size
    • Recently there are number of community blog posts on this sort_buffer_size scalability (here, here and here).

      The main problem is not because of memory allocation, but its because of making array of string pointers to keep the sorted data, and this array initialization (make_char_array() in filesort.cc) is what is causing the bottleneck as it directly depends on buffer size. One option is to skip this initialization of char array completely or use dynamic initialization as you sort (might slow down a bit for smaller sort keys); and InnoDB especially takes a bigger hit in performance if you have larger sort_buffer_size as it gives estimated rows (estimated_rows_upper_bound) being more than actual rows.

    • Here is sysbench performance results of negative scalability of sort_buffer_size for read-only queries (completely in memory data); and notice the performance drop by ~10 times from 2MB to 128MB
      sort_buffer_size threads=16 (txn/sec) threads=32 (txn/sec)
      2097144 (2MB,default) 6475 6302
      6291456 (6MB) 6123 6018
      33554432 (32MB) 3481 1590
      67108864 (64MB) 2084 1085
      134217728 (128MB) 1024 748
    • If you have a slave server without any other workload or server with very few threads < 5-10; then none of these might cause any major performance impact.

Query Cache Scalability

Most of the servers that I manage; first thing I do is to turn query cache completely off (query_cache_size=0 and query_cache_type=0) as much as possible unless the cache hit ratio is > 20%. The performance really degrades by magnitude especially if you have larger cache size and more queries in the cache.

This is something that MySQL should address either by allowing relaxed caching model (cache expires automatically after x secs instead of write-invalidation) or per table caching model without the overhead of contention. But still this is something that needs to be completely re-designed.

Key Buffer Size Scalability

Even though this is completely used by MyISAM; but still one thing that needs to be addressed as most of the servers still use MyISAM where transactions are not required due to its simplicity in administration and performance.

May be worth to support multiple key cache buffers similar to that of  multiple buffer pools introduced by InnoDB in 5.5 or even per table cache buffers.

ON Duplicate Key Update Performance

Yet another widely used feature from MySQL is ON DUPLICATE KEY UPDATE; but this also takes a hit in performance as it starts working on larger tables especially if it needs to update more number of duplicated rows. One option is to push the logic to storage engine for in-line update of the duplicated row as it searches; instead of query engine controlling this with multi-pass iterations to storage engine.

Recently there was a discussion on this in the internals list

Status Checking

SHOW SESSION/GLOBAL STATUS or VARIABLES uses temporary table(in-memory or disk does not matter); even though this is something that is widely used by most of the monitoring tools every few seconds if not minutes; and MySQL should avoid using temporary table for this by having a pre-allocated heap for this.

This will help lot of monitoring tools to judge how many real temporary tables has been actually created by real queries; right now Created_tmp_tables status variable gets incremented for every SHOW STATUS/VARIABLE command.

Mutex Locks

Most of the functions still use common mutex locks even though there is no dependency; which causes contention and un-necessary waits on busy server; so MySQL should split and use individual locks of their own as much as possible, especially LOCK_thread_count is a nightmare and even causes SHOW STATUS to be blocked in most cases.

Pre-allocation of Query Buffers

As memory is cheap; and most of the large end systems make use of 32/64/128G memory; it may be worth MySQL to consider to support pre-allocation of query buffers (like join_buffer_size, sort_buffer_size etc) by exposing preallocate_sort_buffer_size=XX, preallocate_join_buffer_size=XX, so that  pre-allocation size gets re-used by x-threads in parallel

For example, lets say one sets preallocate_join_buffer_size=256M and join_buffer_size=32M; then 8 parallel queries ( x join_threads = prellocate_join_buffer_size/join_buffer_size) can re-use the same pre-allocated pool memory instead of allocating and de-allocating from the heap; and if it runs out of pre-allocated pool memory, then it gets from the heap and releases back.

Re-usable Common Code, Distributed and Pluggable Components

One thing that could encourage more plugins or storage engines or even more developers to support the community is by having common re-usable code like separate Lock Management, Parser Management, Cache Management, User Management, Parsing, Storage Management (read/write wrappers), Central Information Schema Management.. etc; so that every time some one needs to write a new storage engine, they don’t need to re-invent the wheel from scratch.

Like solving, how one can develop or integrate a key/value/NoSQL storage engine with least development cycles by making use of existing components within MySQL code base.

April 20, 2010

INT and String data comparison, difference in performance because of quotes

In the last post choosing about the right type; there is a case about quoting the tuple values; that I forgot to mention which is pretty much a common mistake when string data types are used for storing int or float/double representation (well sometimes you need to use string due to length or to avoid precision loss); and queries associated with that column does not quote the data to be string when searching…

In the same example; client_id was declared as VARCHAR(255); so without any quotes searching on client_id takes 11 secs:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 20207319 | Using where; Using index | 
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (11.69 sec)

But if you quote client_id in the search part(client_id=’1301′); then things will run much faster (0.25sec as opposed to 11.69sec) as it does not need to do the conversion, and even the plan uses the direct const checking:

mysql> explain SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys      | key                | key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | xxx_sources | ref  | idx_client_channel | idx_client_channel | 258     | const | 457184 | Using where; Using index | 
+----+-------------+-------------+------+--------------------+--------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT SQL_NO_CACHE channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = '1301' GROUP BY client_id, channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 | 
+---------+----------+
1 row in set (0.25 sec)

Same is the case and performance impact if data is quoted when searching on int/double/float columns. At times its worth to double check column data types and use the same notation when using them (with or without quotes)

April 19, 2010

Choosing the right data type makes a big difference

Today evening one of my friend asked me in the IM to look into one of his production server where a query was taking ~11 seconds to run on 20 million row table, even though the query is using the right index and the plan as shown below:

mysql&gt; explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                                                     |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+-----------------------------------------------------------+
1 row in set (0.01 sec)
 
mysql&gt; SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (11.61 sec)
 
mysql&gt; show table status like 'xxx_sources'\G
*************************** 1. row ***************************
           Name: xxx_sources
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 19882760
 Avg_row_length: 46
    Data_length: 926941184
Max_data_length: 0
   Index_length: 1188233216
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-04-15 21:03:37
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 0 kB
1 row in set (0.21 sec)

Quickly looking at the plan; I added client_id in the group by to avoid temporary table, and the new plan looks much better, but still took same time for execution (well, cost of temp and copy is cheap in this case)..

mysql&gt; explain SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY client_id, channel;
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
| id | select_type | table       | type  | possible_keys      | key                | key_len | ref  | rows     | Extra                    |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | xxx_sources | index | idx_client_channel | idx_client_channel | 1032    | NULL | 19205420 | Using where; Using index |
+----+-------------+-------------+-------+--------------------+--------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)

and then examined the data and noticed that client_id was declared as VARCHAR(255) even though the client_id data is all int; quickly changing client_id to int made a big difference as the query execution took only ~0.24 secs

mysql&gt; SELECT channel, COUNT(channel) AS visitors FROM xxx_sources WHERE client_id = 1301 GROUP BY channel;
+---------+----------+
| channel | visitors |
+---------+----------+
| NULL    |        0 |
+---------+----------+
1 row in set (0.24 sec)

The performance difference is too big after changing the type to int. This is just an example; but I noticed lot of tables with VARCHAR(64) or VARCHAR(255) or VARCHAR(512) (or even TEXT at times).. as default types even though they store at max of 10-15 bytes of data; not sure why anyone do that; as this is something that must be followed as rule #1 when designing schema. Even if you are not directly querying on that column; it is always better to design a schema with right type and storage so that it is optimal in terms of storage space and performance.

April 16, 2010

MySQL 5.5 – A Community Winner

Ever since MySQL 5.5 beta has been announced by Edward Screven, Oracle’s chief corporate architect; there is lot of positive buzz (here, here, …) about the performance and scalability improvements added in this release. We should all be thankful to Michael Ronstrom (as most of the key developers are already working on different forks), who did a great job in the improvements especially scalability related that allows to scale beyond 16 cores by improving the performance by 2-5X in most common workloads. Not to forget about numerous improvements to replication by replication team.

Even though 5.5 has lot of new improvements officially from Sun/Oracle; but some of the changes are actually driven by community (yet another thanks to Google, Mark Callaghan and his team, Percona and his team, Facebook etc) and most of the ideas or patches were already floating for a while and they were used in the production as well (5.0 or 5.1). This is actually a good sign that community can look forward for 5.5 GA instead of worrying about what patches and builds to use.

This is a clear indication that 5.5 performance and scalability improvements were actually driven by community.

Key improvements in 5.5:

  1. InnoDB changes in 1.1
    • Multiple buffer pools (controlled by innodb_buffer_pool_instances)
    • Multiple rollback segments
    • Splitting of purge operation from main background thread (controlled by innodb_purge_threads)
    • New log_buf mutex now controls the mini transaction writes in buffer pool instead of shared log_sys, reduces the contention on buffer pool
    • Separate mutex for flush list handling, reduces the contention on buffer pool
    • Improved recovery time
  2. Rest of the changes as part of InnoDB plugin 1.0.x
  3. Numerous replication related changes

Even though they announced InnoDB as the default storage engine in 5.5; but the latest build still has MyISAM as the default

Server version: 5.5.4-m3 MySQL Community Server (GPL)
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                        | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| InnoDB             | YES     | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        |
| MRG_MYISAM         | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |
| BLACKHOLE          | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |
| CSV                | YES     | CSV storage engine                                             | NO           | NO   | NO         |
| MyISAM             | DEFAULT | Default engine as of MySQL 3.23 with great performance         | NO           | NO   | NO         |
| ARCHIVE            | YES     | Archive storage engine                                         | NO           | NO   | NO         |
| FEDERATED          | NO      | Federated MySQL storage engine                                 | NULL         | NULL | NULL       |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                             | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.00 sec)

April 15, 2010

RAID Controllers Cache Management – Missing Features

PERC4DC_4 We all know how important hardware RAID controllers are in today’s data storage performance especially when dealing with large data sets. If we look at the trend from now to couple of years back; they really evolved rapidly with lot of useful features and their usage also grown as most of the new servers by default has one or two controllers built-in (one for internal and another one for external storage array or for redundancy).

Few popular RAID controller vendors in the market: 

More or less everyone supports all common features and differs in number of ports, protocol support (ISCSI, SATA, SAS, HBA/FB), transfer speed, RAID levels, total disks support, cache size and its management.

Controller Cache – Database Workloads

For database OLTP workloads (IO bound), controller cache plays a crucial role for overall write or read throughput, depending on how the cache is used. Most RAID controllers are equipped with either 128MB or 256 MB or 512MB cache, and newer controllers like HP Smart Array P812 supports 1GB.

Write-back mode improves the writes performance by magnitude as the write request is returned as completed as soon as the data is in the controller cache without actually writing to the disk (that’s why controller needs a BBU, Battery Backup Module so that there is no data loss on power failures)

In case if you enable the read ahead from the controller (sometimes good for OLAP workloads or ETL data warehouse, especially adaptive read ahead due to heavy sequential access); then the same cache is used to store the pre-fetched data that can be satisfied later from the cache without hitting the disk. But in case if the database system does read ahead (like InnoDB), then it is better to turn off read ahead from controller to avoid page trashing.

For some workloads, the controller cache can also cause negative performance if the cache is not properly utilized by the controller.

Missing Cache Management Tools

At present, none of the controllers either supports any cache management tools nor exposes how the cache has been actually used, so that one can adjust the cache according to the workloads for improved performance.

Some of the missing features:

  • A way to flush the data from cache to disks, so that the systems can be taken for offline maintenance. Right now there is no easy way to flush data from cache to disk; other than some of the controllers will indicate through LED whether data is in the cache or not
  • Way to set the cache threshold in time or %, so that it can start flushing to disk once it meets the threshold value. For example; if you notice big spikes from RRD graphs for every few minutes, then one can adjust the threshold to evenly distribute the load.
  • Cache usage statistics (writes data size, read ahead data size etc ), so that workload can be adjusted to yield much better results
  • Splitting of cache between reads and writes either in size or by %; so that they do not overlap and cause performance issues. For example; one prefers to set 20% for read ahead data and 80% for writes. Only HP Smart Array controller supports this feature at present.

As you get more control over the controller cache, the more you can tweak and adjust the workloads to get improved performance. Hopefully one day all vendors will expose more cache management options.