April 23, 2008

MySQL Server build –without-server

Looks like MySQL build team should add a test scenario to cover this in the automation. Even if you specify –without-server with the latest 5.1.24; it still builds all most all plugins. This is really bad deal. Forget about mandatory plugins like MyISAM, Heap and Merge; it also builds rest of the plugins unless you skip them using –without-<name> or –without-plugin-<name>. The mandatory check should also be relaxed when one uses –without-server. The configure currently throws an error with you try to skip any mandatory plugins with –without-server too.

When one need clients (directory client) and its libraries (directories like libmysql, libmysql_r and mysys, mystrings, dbug) we do not need to build the whole ‘SQL‘ dir and no need to enter the ‘storage‘ directory at all. I patched the configure script to escape all these cases now including skipping of mandatory plugins; and I can see that everything works as expected after the patch. Even though I could not find a easy way to strip ‘ndbclient’ out of the ‘ndb’ engine. But this will allow me to build on systems with gcc2 where we only need client programs : (

Simple way to crash InnoDB plugin 1.0

Now I figured out the reason for MySQL Server 5.1.24 crash when used with InnoDB plugin 1.0. As I had a older my.cnf in the path and it had innodb_flush_method=fdatasync as the default flush method. But from 5.1.24 onwards, fdatasync is not supported as the flush method (not sure why we have such a change in the final stage of RC code, but  …)

Even though I get an error in the mysqld.err log that InnoDB failed to register; but server starts and loads rest of InnoDB information schema plugins without the main InnoDB plugin.

InnoDB: Unrecognized value fdatasync for innodb_flush_method
080423 22:36:04 [ERROR] Plugin 'InnoDB' init function returned error.
080423 22:36:04 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

As the server started with rest of the InnoDB plugins; upon querying any of them; the server simply crashes…

mysql> show plugins;
+---------------------+--------+--------------------+---------+---------+
| Name                | Status | Type               | Library | License |
+---------------------+--------+--------------------+---------+---------+
| binlog              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| partition           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| ARCHIVE             | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| BLACKHOLE           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| CSV                 | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| FEDERATED           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| MEMORY              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| INNODB_TRX          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCKS        | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCK_WAITS   | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_RESET    | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM       | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| MyISAM              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| MRG_MYISAM          | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
+---------------------+--------+--------------------+---------+---------+
16 rows in set (0.00 sec)
 
mysql> select * from Information_schema.Innodb_locks;
ERROR 2013 (HY000): Lost connection to MySQL server during query

You can crash in the same way if you have any invalid InnoDB variable in the config file. It looks like InnoDB needs to have dependency on the main plugin when it is loading the information schema plugins or at least it should have validation checks up on querying the information schema tables for the main plugin. I will file a bug report on this.

Good thing is I was able to get InnoDB plugin to work with MySQL 5.1.24 and started doing some benchmarks, and results seemed to be very positive. I will post the updated results later tomorrow.

April 21, 2008

MySQL 5.1.24, InnoDB plugin 1.0 failures, server crash

Looks like the InnoDB plugin is completely broken with MySQL-5.1.24; at least there is no luck for me so far to set it up right. First, the Makefile is broken. Second, I tried to do a static build of InnoDB plugin with MySQL server (–enable-innodb and –with-plugin-innobase) resulted in partial InnoDB plugins (Information schema) missing the main one as shown below:

   1:  mysql> show plugins;
   2:  +---------------------+--------+--------------------+---------+---------+
   3:  | Name                | Status | Type               | Library | License |
   4:  +---------------------+--------+--------------------+---------+---------+
   5:  | binlog              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   6:  | partition           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   7:  | ARCHIVE             | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   8:  | BLACKHOLE           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   9:  | CSV                 | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  10:  | FEDERATED           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  11:  | MEMORY              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  12:  | INNODB_TRX          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  13:  | INNODB_LOCKS        | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  14:  | INNODB_LOCK_WAITS   | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  15:  | INNODB_CMP          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  16:  | INNODB_CMP_RESET    | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  17:  | INNODB_CMPMEM       | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  18:  | INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  19:  | MyISAM              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  20:  | MRG_MYISAM          | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  21:  +---------------------+--------+--------------------+---------+---------+
  22:  16 rows in set (0.00 sec)

As you can see the main InnoDB plugin is missing and rest of the InnoDB information schema plugins for compression and locks are present. So, I built a dynamically loadable plugin separately and tried to install it using INSTALL PLUGIN, but it fails.

   1:  mysql> INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
   2:  ERROR 1126 (HY000): Can't open shared library '/home/y/lib64/mysql/plugin/ha_innodb.so' (errno: 2
   3:                                        cannot open shared object file: No such file or directory)
   4:  mysql> INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
   5:  ERROR 1123 (HY000): Can't initialize function 'INNODB'; Plugin initialization function failed.

You can see the plugin .so should be in lib/mysql/plugin directory as opposed to top lib/mysql directory. And when I tried to select Information_schema.InnoDB_locks on a fresh database, the server crashes..

   1:  mysql> select * from Information_schema.Innodb_locks;
   2:  ERROR 2013 (HY000): Lost connection to MySQL server during query
 
The backtrace looks like ..
/mysql/libexec64/mysqld(get_schema_tables_result(JOIN*, enum_schema_table_state)+0x142) [0x6e9d32]

/mysql/libexec64/mysqld(JOIN::exec()+0x760) [0x64b480]
/mysql/libexec64/mysqld(mysql_select(THD*, Item***, TABLE_LIST*...) [0x64d267]

/mysql/libexec64/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x181) [0x64dab1]

I decided not to file any bugs yet as InnoDB team looks like working on getting this to work with 5.1.24. Now, I started a new build with 5.1.23 to see how it goes.

April 20, 2008

Incompatible change in mysql_com.h

I do not know if anyone noticed this; but there was an incompatible change from MySQL 5.1.22 to 5.1.23 in mysql_com.h file for NET structure by renaming the members last_errno and last_error to client_last_errno and client_last_error. This is really annoying as it not just breaks the compilation of lot of depending applications; and functionality will be annoy if one uses the wrong client library as there is no protocol version change. The change should have been done in alpha or beta code but not when its tagged as Release Candidate (RC). The change history does not even have a clue of this change.

Now; I noticed that in 5.1.24 the change is reverted back. I guess someone should be very careful when changing any structures related to the protocol as it could break the things silently.

April 17, 2008

Notes on scaling heavy concurrent writes in real time

Here is the quick notes from the session scaling heavy concurrent writes in real time by Dathan Pattishall. Its bad that he left Flickr i.e. Yahoo. Hopefully they will find a replacement, if not contact me I have few people who are interested.

  • Who am I
    • since 1999 working on mysql
    • scaled many companies (FriendFinder, Friendster, Flickr, now RockYou)
    • Favorites federation, partitioning, shards, RAID-10
  • Requirements
    • scale better
    • store data forever
    • associate time with the data
    • allow for change
    • keep it cheap
    • and downtime is not an option
  • Spread the data around
    • federate
    • all referrers are owned by the page owner
    • spread data out by that
    • but federate in a different direction
    • add a new column to the global account lookup
  • Did not work
    • strings as PK was not good
    • inserts slowed when the table grew larger than memory
    • not enough I/O to handle double the load
  • Start over
    • converted URL into a 64-bit ID - conv(substr(md5(url),1,16),16,10) (Dathan had a typo in the function in his slides, thanks to Patrick who tried it when the session is running)
    • 64-bit number, its unique
    • reduced PK size to 8 bytes + owner, object, object-type
  • InnoDB and strings
    • index on string takes a lot of space
    • each index has its own 16K page
    • fragmentation across pages was hurting, chewing up I/O
  • InnoDB and High Concurrency of string writes
    • requirement: 200 ms for total db access for all apps
    • writes gets slowed down when you cross buffer_pool_size
    • 10 ms to 20 secs sometimes for full transaction
    • replication keeping up
  • Buffer the writes
    • java daemon that buffers up to 4000 messages grouped to a transaction and apply serially
    • does not use much memory or CPU
    • written by Zen Java master
  • Reduce the use of big strings
  • Keep smaller amounts of data
    • Use MyISAM for non-pro users and keep only X weeks of data
    • MyISAM keeps 1/6th the size of InnoDB
    • Migrate the data when they migrate
  • Distributed locks
    • GET_LOCK and RELEASE_LOCK on the same server for which a user operates on

Notes on InnoDB Scale on servers with many cores

Here is the quick notes from the session Helping InnoDB scale on servers with many cores by Mark Callaghan from Google (mcallaghan at google dot com).

  • we have a team now, to help scale MySQL to do the enhancements (9 people,  I hope yahoo management reads this)
  • Overview
    • describe the problems on big servers
    • work done by InnoDB community
    • ask MySQL/InnoDB to fix the problems by taking the patches
  • Community team
    • InnoDB/Oracle
    • Google MySQL team
    • InnoDB community
    • Percona - Peter and Vadim
  • Goal
    • Fix bottlenecks on big SMP
    • utilize servers with many disks
    • support thousands of connections
    • handle corruption in memory and on disk
    • make query plans predictable
    • make thousands of tables and accounts
    • Keep InnoDB beautiful while making these changes
  • Desirable features
    • linear scalability with cores
    • 128GB buffer cache
    • with many disks and remote disks
    • recovering from corruption
  • CPU problems
    • mutex implementation
    • spin lock mutex uses pthreads rather than automatic
    • RW-mutex uses the spin lock mutex
    • Mutex hotspots
      • buffer cache
      • memory allocation
      • transaction log
      • adaptive hash latch
  • Symptoms of CPU problems
    • adaptive hash latch contention
    • excessive mutex contention
      • server has many queries, is slow and is not IO bound
      • vmstat will report lot of idle time
      • oprofile will show a lot of time spent in pthread functions
  • Making InnoDB spin lock mutex fast
    • replace pthread_mutex_trylock with CAS
  • RW-mutex fast
    • use atomic ops to change internal state
    • use separate events to wake readers and writers
  • More work to be done
    • always release adaptive hash latch
    • RW-mutex for transaction log
    • replace malloc heap with scalable malloc (tcmalloc or mtmalloc)
    • use atomix ops for rw-mutex
    • reduce contention for the sync array mutex
    • remove some counters and fields from mutexes
    • platforms with > 8 cores
      • transaction log mutex has contention
      • buffer cache contention
  • To support 128GB buffer cache
    • data structures must scale
      • walking a list with 8M page entries might be slow
    • resources need to be a split
      • more than one mutex might be needed for the buffer cache and LRU chain
    • Detection of corruption is more important
      • memory will be corrupted by software and hardware bugs

Future design hurdles to tackle in MySQL Server

The Future of MySQL by Monty Widenius and Jay Pipes.

  • Why this talk
    • MySQL and Sun should become more transparent
    • Easier to discuss and act when you have facts
    • when user know the limitations, they know how to go around
  • Threads
    • one connection/thread doesn’t work well
    • no priority threads
    • no way to ensure we have X threads
  • Symptoms
    • Too many context switches
    • we are not using multi-core efficiently
    • does not scale that well after 4-8 cores
  • Solution
    • –thread-handling=pool-of-threads (6.0)
  • Lots of work
  • Memory as a resource
    • no single memory allocator
    • sort_buffer_size is not flexible
    • memory engine has limitations (no varchar/blob)
  • Metadata
    • no online operations
    • .frm files are used to generate table info
    • no versioning
    • they are not ACID
    • slow information schema / SHOW command
  • Privileges
    • not modular or pluggable
    • hard to extend
    • Get community to add LDAP support and ask MySQL/Sun to add ROLE as high priority
  • Pluggable storage engine
    • storage engines are depending on internal mysql data structures (table, field)
    • can only be used with the exact MySQL server version that is compiled
    • think as loadable engine
  • Items
    • hold temporary values during evaluation
    • are not re-entrant
    • hard to make parallel execution of statement
    • we cant  cache prepared statements and stored procedures
  • Parser
    • state machine too large
    • not pluggable
    • not cacheable
    • still uses Bison
    • bad error messages
  • Modularity
    • very monolithic
    • few defined interfaces
    • server and libraries are not documented
    • multiple execution paths
    • no rewrite state for optimizer
  • Stored procedures/triggers
    • not cacheable across connections
    • only supports SQL
    • pre-locking for all tables
    • all cursors are materialized
    • trigger code is not shared across opened tables
    • uses more memory
    • hard to debug and profile
  • Replication
    • not fail safe
    • no synchronous option
    • no consistency checking
    • setup and resync is complicated
    • single thread on the slave
    • no multi master
    • only InnoDB synchronizes with the replication binary log
  • Client/Server protocol
    • only one running query/connection
    • not all statements can be prepared
    • not all language support prepared statement protocol
    • multi-statements/sps with multiple results (need to add the support)
  • Table names
    • stored as files (.frm)
    • file system can be a case sensitive or not
    • falcon has its own interpretation of how things should be done
  • Why Falcon and Maria
    • we don’t have any transactional engine
    • if one fails, another will succeed
  • Phone home
    • no clue for mysql/sun what features mysql users are using
    • endless internals discussion
    • what can be deprecated
    • default values for future
  • Open source project ?
    • very hard to contribute code or ideas
    • customers are using their production systems as bug-finders
    • road map and decision making is not transparent
  • Release policy
    • constantly shipping releases
    • 5.1 was declared RC way too early
    • features are removed/added in RC code (hmm?, what’s wrong)
  • Good news
    • we know the problems and have a good clue of how to fix (most) of them
    • sun is more open source / free software friendly than MySQL
    • sun really understands developer