Lately in the MySQL community, we only hear about scalability or performance improvements of storage engines, but nothing about query engine itself. For example, one classic example being InnoDB; if we look back all the scalability issues that community reported a year back or even few months back; most part of those issues have been fixed in forth coming MySQL 5.5 version (or even Percona Server or Facebook patches).
Even if you look at the new storage engines that are in development, they all going to concentrate on existing scalability issues that are common to any storage engine, and they will address it before it gets into beta or production ready.
But most part; we still have enough issues pending in the MySQL query engine; that is something that can’t be addressed by storage engines unless MySQL platform itself gets redesigned such that everything is plug and play and storage engine can override any behavior of the query engine (like how late binding works, but it is very un-likely due to lot of issues in terms of implementation as storage engines can’t have complete control over query engine, like allowing storage engines to do sorting / plan execution / caching etc; but this also deviates the main purpose of storage engine itself)
Here comes some known pending issues within query engine, that makes it hard to scale on designing larger systems where queries per second (QPS) is highly demanded:
Negative Scalability of Buffers
There is a negative performance with few global/session buffer variables; and there are number of open bugs on this.
- table_cache
- read_buffer_size
- sort_buffer_size
- Recently there are number of community blog posts on this sort_buffer_size scalability (here, here and here).
The main problem is not because of memory allocation, but its because of making array of string pointers to keep the sorted data, and this array initialization (make_char_array() in filesort.cc) is what is causing the bottleneck as it directly depends on buffer size. One option is to skip this initialization of char array completely or use dynamic initialization as you sort (might slow down a bit for smaller sort keys); and InnoDB especially takes a bigger hit in performance if you have larger sort_buffer_size as it gives estimated rows (estimated_rows_upper_bound) being more than actual rows.
- Here is sysbench performance results of negative scalability of sort_buffer_size for read-only queries (completely in memory data); and notice the performance drop by ~10 times from 2MB to 128MB
sort_buffer_size threads=16 (txn/sec) threads=32 (txn/sec) 2097144 (2MB,default) 6475 6302 6291456 (6MB) 6123 6018 33554432 (32MB) 3481 1590 67108864 (64MB) 2084 1085 134217728 (128MB) 1024 748 - If you have a slave server without any other workload or server with very few threads < 5-10; then none of these might cause any major performance impact.
- Recently there are number of community blog posts on this sort_buffer_size scalability (here, here and here).
Query Cache Scalability
Most of the servers that I manage; first thing I do is to turn query cache completely off (query_cache_size=0 and query_cache_type=0) as much as possible unless the cache hit ratio is > 20%. The performance really degrades by magnitude especially if you have larger cache size and more queries in the cache.
This is something that MySQL should address either by allowing relaxed caching model (cache expires automatically after x secs instead of write-invalidation) or per table caching model without the overhead of contention. But still this is something that needs to be completely re-designed.
Key Buffer Size Scalability
Even though this is completely used by MyISAM; but still one thing that needs to be addressed as most of the servers still use MyISAM where transactions are not required due to its simplicity in administration and performance.
May be worth to support multiple key cache buffers similar to that of multiple buffer pools introduced by InnoDB in 5.5 or even per table cache buffers.
ON Duplicate Key Update Performance
Yet another widely used feature from MySQL is ON DUPLICATE KEY UPDATE; but this also takes a hit in performance as it starts working on larger tables especially if it needs to update more number of duplicated rows. One option is to push the logic to storage engine for in-line update of the duplicated row as it searches; instead of query engine controlling this with multi-pass iterations to storage engine.
Recently there was a discussion on this in the internals list
Status Checking
SHOW SESSION/GLOBAL STATUS or VARIABLES uses temporary table(in-memory or disk does not matter); even though this is something that is widely used by most of the monitoring tools every few seconds if not minutes; and MySQL should avoid using temporary table for this by having a pre-allocated heap for this.
This will help lot of monitoring tools to judge how many real temporary tables has been actually created by real queries; right now Created_tmp_tables status variable gets incremented for every SHOW STATUS/VARIABLE command.
Mutex Locks
Most of the functions still use common mutex locks even though there is no dependency; which causes contention and un-necessary waits on busy server; so MySQL should split and use individual locks of their own as much as possible, especially LOCK_thread_count is a nightmare and even causes SHOW STATUS to be blocked in most cases.
Pre-allocation of Query Buffers
As memory is cheap; and most of the large end systems make use of 32/64/128G memory; it may be worth MySQL to consider to support pre-allocation of query buffers (like join_buffer_size, sort_buffer_size etc) by exposing preallocate_sort_buffer_size=XX, preallocate_join_buffer_size=XX, so that pre-allocation size gets re-used by x-threads in parallel
For example, lets say one sets preallocate_join_buffer_size=256M and join_buffer_size=32M; then 8 parallel queries ( x join_threads = prellocate_join_buffer_size/join_buffer_size) can re-use the same pre-allocated pool memory instead of allocating and de-allocating from the heap; and if it runs out of pre-allocated pool memory, then it gets from the heap and releases back.
Re-usable Common Code, Distributed and Pluggable Components
One thing that could encourage more plugins or storage engines or even more developers to support the community is by having common re-usable code like separate Lock Management, Parser Management, Cache Management, User Management, Parsing, Storage Management (read/write wrappers), Central Information Schema Management.. etc; so that every time some one needs to write a new storage engine, they don’t need to re-invent the wheel from scratch.
Like solving, how one can develop or integrate a key/value/NoSQL storage engine with least development cycles by making use of existing components within MySQL code base.
New blog post: http://tinyurl.com/2bcgbaz – MySQL Query Engine Scalability Issues
Venu, nice overview. This is something I was hoping Maria or Drizzle will fix in coming days.
Also, we need more enhancements towards plugin API that exposes more features to storage engine.
MySQL Query Engine Scalability Issues – https://venublog.com/2010/05/11/mysql-query-engine-scalability-issues/
MySQL: MySQL Query Engine Scalability Issues: Lately in the MySQL community, we only hear about scalability or per… http://bit.ly/cBBe9M
Planet MySQL. #MySQL Query Engine Scalability Issues: Lately in the MySQL community, we only hear about scalabilit… http://bit.ly/c1BqT8
> May be worth to support multiple key cache buffers similar to that of multiple buffer pools introduced by InnoDB in 5.5 or even per table cache buffers.
http://dev.mysql.com/doc/refman/5.0/en/multiple-key-caches.html
I meant to say, automatic splitting to ‘n’ buffers with fixed size
MySQL Query Engine Scalability Issues
http://bit.ly/aB0OGQ