April 23, 2008

Simple way to crash InnoDB plugin 1.0

Now I figured out the reason for MySQL Server 5.1.24 crash when used with InnoDB plugin 1.0. As I had a older my.cnf in the path and it had innodb_flush_method=fdatasync as the default flush method. But from 5.1.24 onwards, fdatasync is not supported as the flush method (not sure why we have such a change in the final stage of RC code, but  …)

Even though I get an error in the mysqld.err log that InnoDB failed to register; but server starts and loads rest of InnoDB information schema plugins without the main InnoDB plugin.

InnoDB: Unrecognized value fdatasync for innodb_flush_method
080423 22:36:04 [ERROR] Plugin 'InnoDB' init function returned error.
080423 22:36:04 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

As the server started with rest of the InnoDB plugins; upon querying any of them; the server simply crashes…

mysql> show plugins;
+---------------------+--------+--------------------+---------+---------+
| Name                | Status | Type               | Library | License |
+---------------------+--------+--------------------+---------+---------+
| binlog              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| partition           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| ARCHIVE             | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| BLACKHOLE           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| CSV                 | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| FEDERATED           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| MEMORY              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| INNODB_TRX          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCKS        | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCK_WAITS   | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_RESET    | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM       | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
| MyISAM              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
| MRG_MYISAM          | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
+---------------------+--------+--------------------+---------+---------+
16 rows in set (0.00 sec)
 
mysql> select * from Information_schema.Innodb_locks;
ERROR 2013 (HY000): Lost connection to MySQL server during query

You can crash in the same way if you have any invalid InnoDB variable in the config file. It looks like InnoDB needs to have dependency on the main plugin when it is loading the information schema plugins or at least it should have validation checks up on querying the information schema tables for the main plugin. I will file a bug report on this.

Good thing is I was able to get InnoDB plugin to work with MySQL 5.1.24 and started doing some benchmarks, and results seemed to be very positive. I will post the updated results later tomorrow.

April 21, 2008

MySQL 5.1.24, InnoDB plugin 1.0 failures, server crash

Looks like the InnoDB plugin is completely broken with MySQL-5.1.24; at least there is no luck for me so far to set it up right. First, the Makefile is broken. Second, I tried to do a static build of InnoDB plugin with MySQL server (–enable-innodb and –with-plugin-innobase) resulted in partial InnoDB plugins (Information schema) missing the main one as shown below:

   1:  mysql> show plugins;
   2:  +---------------------+--------+--------------------+---------+---------+
   3:  | Name                | Status | Type               | Library | License |
   4:  +---------------------+--------+--------------------+---------+---------+
   5:  | binlog              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   6:  | partition           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   7:  | ARCHIVE             | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   8:  | BLACKHOLE           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
   9:  | CSV                 | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  10:  | FEDERATED           | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  11:  | MEMORY              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  12:  | INNODB_TRX          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  13:  | INNODB_LOCKS        | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  14:  | INNODB_LOCK_WAITS   | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  15:  | INNODB_CMP          | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  16:  | INNODB_CMP_RESET    | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  17:  | INNODB_CMPMEM       | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  18:  | INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL    | GPL     |
  19:  | MyISAM              | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  20:  | MRG_MYISAM          | ACTIVE | STORAGE ENGINE     | NULL    | GPL     |
  21:  +---------------------+--------+--------------------+---------+---------+
  22:  16 rows in set (0.00 sec)

As you can see the main InnoDB plugin is missing and rest of the InnoDB information schema plugins for compression and locks are present. So, I built a dynamically loadable plugin separately and tried to install it using INSTALL PLUGIN, but it fails.

   1:  mysql> INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
   2:  ERROR 1126 (HY000): Can't open shared library '/home/y/lib64/mysql/plugin/ha_innodb.so' (errno: 2
   3:                                        cannot open shared object file: No such file or directory)
   4:  mysql> INSTALL PLUGIN INNODB SONAME 'ha_innodb.so';
   5:  ERROR 1123 (HY000): Can't initialize function 'INNODB'; Plugin initialization function failed.

You can see the plugin .so should be in lib/mysql/plugin directory as opposed to top lib/mysql directory. And when I tried to select Information_schema.InnoDB_locks on a fresh database, the server crashes..

   1:  mysql> select * from Information_schema.Innodb_locks;
   2:  ERROR 2013 (HY000): Lost connection to MySQL server during query
 
The backtrace looks like ..
/mysql/libexec64/mysqld(get_schema_tables_result(JOIN*, enum_schema_table_state)+0x142) [0x6e9d32]

/mysql/libexec64/mysqld(JOIN::exec()+0x760) [0x64b480]
/mysql/libexec64/mysqld(mysql_select(THD*, Item***, TABLE_LIST*...) [0x64d267]

/mysql/libexec64/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x181) [0x64dab1]

I decided not to file any bugs yet as InnoDB team looks like working on getting this to work with 5.1.24. Now, I started a new build with 5.1.23 to see how it goes.

April 20, 2008

Incompatible change in mysql_com.h

I do not know if anyone noticed this; but there was an incompatible change from MySQL 5.1.22 to 5.1.23 in mysql_com.h file for NET structure by renaming the members last_errno and last_error to client_last_errno and client_last_error. This is really annoying as it not just breaks the compilation of lot of depending applications; and functionality will be annoy if one uses the wrong client library as there is no protocol version change. The change should have been done in alpha or beta code but not when its tagged as Release Candidate (RC). The change history does not even have a clue of this change.

Now; I noticed that in 5.1.24 the change is reverted back. I guess someone should be very careful when changing any structures related to the protocol as it could break the things silently.

April 17, 2008

Notes on scaling heavy concurrent writes in real time

Here is the quick notes from the session scaling heavy concurrent writes in real time by Dathan Pattishall. Its bad that he left Flickr i.e. Yahoo. Hopefully they will find a replacement, if not contact me I have few people who are interested.

  • Who am I
    • since 1999 working on mysql
    • scaled many companies (FriendFinder, Friendster, Flickr, now RockYou)
    • Favorites federation, partitioning, shards, RAID-10
  • Requirements
    • scale better
    • store data forever
    • associate time with the data
    • allow for change
    • keep it cheap
    • and downtime is not an option
  • Spread the data around
    • federate
    • all referrers are owned by the page owner
    • spread data out by that
    • but federate in a different direction
    • add a new column to the global account lookup
  • Did not work
    • strings as PK was not good
    • inserts slowed when the table grew larger than memory
    • not enough I/O to handle double the load
  • Start over
    • converted URL into a 64-bit ID - conv(substr(md5(url),1,16),16,10) (Dathan had a typo in the function in his slides, thanks to Patrick who tried it when the session is running)
    • 64-bit number, its unique
    • reduced PK size to 8 bytes + owner, object, object-type
  • InnoDB and strings
    • index on string takes a lot of space
    • each index has its own 16K page
    • fragmentation across pages was hurting, chewing up I/O
  • InnoDB and High Concurrency of string writes
    • requirement: 200 ms for total db access for all apps
    • writes gets slowed down when you cross buffer_pool_size
    • 10 ms to 20 secs sometimes for full transaction
    • replication keeping up
  • Buffer the writes
    • java daemon that buffers up to 4000 messages grouped to a transaction and apply serially
    • does not use much memory or CPU
    • written by Zen Java master
  • Reduce the use of big strings
  • Keep smaller amounts of data
    • Use MyISAM for non-pro users and keep only X weeks of data
    • MyISAM keeps 1/6th the size of InnoDB
    • Migrate the data when they migrate
  • Distributed locks
    • GET_LOCK and RELEASE_LOCK on the same server for which a user operates on

Notes on InnoDB Scale on servers with many cores

Here is the quick notes from the session Helping InnoDB scale on servers with many cores by Mark Callaghan from Google (mcallaghan at google dot com).

  • we have a team now, to help scale MySQL to do the enhancements (9 people,  I hope yahoo management reads this)
  • Overview
    • describe the problems on big servers
    • work done by InnoDB community
    • ask MySQL/InnoDB to fix the problems by taking the patches
  • Community team
    • InnoDB/Oracle
    • Google MySQL team
    • InnoDB community
    • Percona - Peter and Vadim
  • Goal
    • Fix bottlenecks on big SMP
    • utilize servers with many disks
    • support thousands of connections
    • handle corruption in memory and on disk
    • make query plans predictable
    • make thousands of tables and accounts
    • Keep InnoDB beautiful while making these changes
  • Desirable features
    • linear scalability with cores
    • 128GB buffer cache
    • with many disks and remote disks
    • recovering from corruption
  • CPU problems
    • mutex implementation
    • spin lock mutex uses pthreads rather than automatic
    • RW-mutex uses the spin lock mutex
    • Mutex hotspots
      • buffer cache
      • memory allocation
      • transaction log
      • adaptive hash latch
  • Symptoms of CPU problems
    • adaptive hash latch contention
    • excessive mutex contention
      • server has many queries, is slow and is not IO bound
      • vmstat will report lot of idle time
      • oprofile will show a lot of time spent in pthread functions
  • Making InnoDB spin lock mutex fast
    • replace pthread_mutex_trylock with CAS
  • RW-mutex fast
    • use atomic ops to change internal state
    • use separate events to wake readers and writers
  • More work to be done
    • always release adaptive hash latch
    • RW-mutex for transaction log
    • replace malloc heap with scalable malloc (tcmalloc or mtmalloc)
    • use atomix ops for rw-mutex
    • reduce contention for the sync array mutex
    • remove some counters and fields from mutexes
    • platforms with > 8 cores
      • transaction log mutex has contention
      • buffer cache contention
  • To support 128GB buffer cache
    • data structures must scale
      • walking a list with 8M page entries might be slow
    • resources need to be a split
      • more than one mutex might be needed for the buffer cache and LRU chain
    • Detection of corruption is more important
      • memory will be corrupted by software and hardware bugs

Future design hurdles to tackle in MySQL Server

The Future of MySQL by Monty Widenius and Jay Pipes.

  • Why this talk
    • MySQL and Sun should become more transparent
    • Easier to discuss and act when you have facts
    • when user know the limitations, they know how to go around
  • Threads
    • one connection/thread doesn’t work well
    • no priority threads
    • no way to ensure we have X threads
  • Symptoms
    • Too many context switches
    • we are not using multi-core efficiently
    • does not scale that well after 4-8 cores
  • Solution
    • –thread-handling=pool-of-threads (6.0)
  • Lots of work
  • Memory as a resource
    • no single memory allocator
    • sort_buffer_size is not flexible
    • memory engine has limitations (no varchar/blob)
  • Metadata
    • no online operations
    • .frm files are used to generate table info
    • no versioning
    • they are not ACID
    • slow information schema / SHOW command
  • Privileges
    • not modular or pluggable
    • hard to extend
    • Get community to add LDAP support and ask MySQL/Sun to add ROLE as high priority
  • Pluggable storage engine
    • storage engines are depending on internal mysql data structures (table, field)
    • can only be used with the exact MySQL server version that is compiled
    • think as loadable engine
  • Items
    • hold temporary values during evaluation
    • are not re-entrant
    • hard to make parallel execution of statement
    • we cant  cache prepared statements and stored procedures
  • Parser
    • state machine too large
    • not pluggable
    • not cacheable
    • still uses Bison
    • bad error messages
  • Modularity
    • very monolithic
    • few defined interfaces
    • server and libraries are not documented
    • multiple execution paths
    • no rewrite state for optimizer
  • Stored procedures/triggers
    • not cacheable across connections
    • only supports SQL
    • pre-locking for all tables
    • all cursors are materialized
    • trigger code is not shared across opened tables
    • uses more memory
    • hard to debug and profile
  • Replication
    • not fail safe
    • no synchronous option
    • no consistency checking
    • setup and resync is complicated
    • single thread on the slave
    • no multi master
    • only InnoDB synchronizes with the replication binary log
  • Client/Server protocol
    • only one running query/connection
    • not all statements can be prepared
    • not all language support prepared statement protocol
    • multi-statements/sps with multiple results (need to add the support)
  • Table names
    • stored as files (.frm)
    • file system can be a case sensitive or not
    • falcon has its own interpretation of how things should be done
  • Why Falcon and Maria
    • we don’t have any transactional engine
    • if one fails, another will succeed
  • Phone home
    • no clue for mysql/sun what features mysql users are using
    • endless internals discussion
    • what can be deprecated
    • default values for future
  • Open source project ?
    • very hard to contribute code or ideas
    • customers are using their production systems as bug-finders
    • road map and decision making is not transparent
  • Release policy
    • constantly shipping releases
    • 5.1 was declared RC way too early
    • features are removed/added in RC code (hmm?, what’s wrong)
  • Good news
    • we know the problems and have a good clue of how to fix (most) of them
    • sun is more open source / free software friendly than MySQL
    • sun really understands developer

Notes from Social graph and the Database

From Jeff Rothschild, VP of technology, Facebook

  1. The power of connectedness
  2. Impact on the database
  3. Our challenge 
  • The power of connectedness
    • photo tagging
    • getting an email that someone tagged a photo in Facebook
    • inter tagging between friends and it continues as a network
    • outstanding growth because of the photo tagging
    • 26B photos in archive now
    • Most trafficked photos applications than anybody else
    • events, invites impact on social graph
    • Opening up the social graph to outside and create a platform and API, so other developers can make use of it (28K applications so far)
    • Hypergrowth because of the platform (1M users in four days)
    • 200 applications that has over million users
    • circle of connectivity can make the globe (with positive feedback, performance etc)
    • the social graph links everything
    • and that means lot of work for database
  • Impact on the database
    • user -> connections -> objects
    • interact with people, people has lot of data
    • 100 friends, 100 friends will have 100’s of objects/friend…10’s of thousands of possible objects
    • Horizontal partitioning supports parallel queries
    • 50K+ requests per second now
    • memcached can match speeds with the application
    • multi-threading with multi-core to take advantage with 1G cards
    • New C client which enhances run-time performance for PHP
    • New binary protocol for efficiency both ends (app and memcached)
    • Introduced memcached proxy to purge dirty records
    • MySQL replication to replicate data between data centers
    • Proxy is not a solution here due to replication
    • Changed Insert statements with MEMCACHE_DIRTY key1, keys …
    • Put more RAM in cache (95% hit rate is in cache)
    • 20M requests and only 500K requests goes to MySQL server
  • Optimizations
    • memory + servers (reduce to save $$$)
    • Tuning (moderate query scope, aggregate related data)
    • Partitioning (selective caching, archiving stale data)
  • Our Challenge
    • MySQL and memcached is a powerful combination
    • MySQL at the speed of memcached
    • memcache performance for simple tables
    • flash memory storage
    • dramatic improvement in I/O
    • supports higher miss rate
    • allows higher data/ram ratio
    • persistent “Cache”
      • flash for memcache
      • persistent distributed hash table could be a good fit
    • structured storage
      • not all data benefits
      • integrated cache and persistent replication
      • unmanaged distribution
    • A strong community is our challenge