July 19, 2010

MapReduce – DBInputFormat – Serialization on readers

Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server is mapreduce job to access InnoDB tables for read-only meta data. And debugging it further, noticed that every access to database server is serialized with read lock; so slave has to wait until the locks are released (bad part is, if the resource is not released in-time or if it takes lot of time to process and cleanup, then things will be really bad on server front).

As am new to hadoop/mapreduce world; after pulling mapreduce code, noticed that DbInputFormat class by default uses SERIALIZABLE isolation level in getConnection() internally; which is kind of odd and can cause lot of locking and performance issues especially if multiple threads are acting on the same table.

Surprisingly, Sqoop, a database import tool for Hadoop also uses the same DBInputFormat class, so not sure if anyone actually tested for performance especially on live server with multiple threads where table is frequently updated. Even though the reader thread won’t take any performance hit, but other threads has to wait for any write operation as SERIALIZABLE by default acquires read lock.

I patched the mapreduce code to use READ COMMITTED isolation, and things started working back; even though the scenario is bit different here due to the nature of server workload, but this is not something that is un-common in most of the production environment.

Random Pauses In MySQL – File Handle Serialization

Last month, I blogged about a case involving InnoDB, where all threads acting on InnoDB tables completely stuck for about few hours doing nothing; until we found a way to get around and make the threads to run and do the actual work.

There are few more cases where the server can get into pause state without doing anything for a brief time; this is irrespective of storage engines (well, sort of). Even though server is still active and doing nothing; but blocked on certain conditions and every other thread for the most part has to wait, mostly because of serialization

Here is one such case that involves with THR_OPEN_lock mutex, which is mainly used for opening and closing file handles. When the table is flushed or when it is not in the table cache, then it opens and caches it(open_cache structure) for further use, so that sub-sequent operations can (re)use it without re-opening. Even if the table is InnoDB or any other storage engine type, the frm file has to go through the same mechanism. Not just the regular tables; but implicit file based temporary tables and all SELECT … OUTFILES and LOAD DATA LOCAL. uses the same lock to open and close.

For example; here is a live server thread dump; where most of the threads are blocked on this THR_OPEN_lock mutex:

    129 
     71 pthread_cond_wait@@GLIBC_2.3.2,end_thread,handle_one_connection,start_thread,clone
     25 read,vio_read,my_real_read,my_net_read,handle_one_connection,start_thread,clone
     14 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      5 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread,start_thread,clone
      1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
      1 select,handle_connections_sockets,main
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_master_thread,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,openfrm,open_unireg_entry,open_table,open_tables,open_and_lock_tables,select_like_stmt_test_with_open_n_lock,Prepared_statement::prepare,mysql_stmt_prepare,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,MYSQL_LOG::close,MYSQL_LOG::new_file,rotate_relay_log,process_io_rotate,queue_event,handle_slave_io,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,my_close,mi_close,free_tmp_table,close_thread_tables,dispatch_command,handle_one_connection,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Query_log_event::exec_event,handle_slave_sql,start_thread,clone
      1 __lll_lock_wait,_L_lock_1233,pthread_mutex_lock,close_thread_tables,Prepared_statement::cleanup_stmt,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone
      1 do_sigwait,sigwait,signal_hand,start_thread,clone
      1 close,my_close,select_to_file::send_eof,do_select,JOIN::exec,mysql_select,handle_select,mysql_execute_command,Prepared_statement::execute,mysql_stmt_execute,dispatch_command,handle_one_connection,start_thread,clone

In the above stack trace, close to 11 threads are blocked and waiting on THR_LOCK_open during either open or close of file handles as last thread is taking time in closing the OUTFILE file descriptor, which holds the lock. The following thread states in the "SHOW PROCESSLIST" is a good indication of such a symptom.

  • opening tables
  • opening table
  • closing tables
  • query end
  • init
  • end

There is an open scalability bug on this and hoping that future versions of MySQL will address this. The best bet will be to split and use different mutex locks between regular tables, implicit temporary tables; input/output files, so that main table operations are not blocked by rest of the temporary operations or maintain a separate fd list and use atomic operations to toggle the state by getting rid of this lock on regular open/close operations. The symptom also exhibits when one is deleting a large table or using large IO buffering.