April 13, 2009

Merge and Truncate Problems

Last few weeks I noticed a problem in the replication, where on and off the replication SQL thread fails to truncate a table with the following error:

ERROR 1105 (HY000): MyISAM table 'XXXX' is in use (most likely by a MERGE table). Try FLUSH TABLES.

So, I was keep skipping the error, as it was fuzzy why when only one thread is actually executing (no other threads on DB other than slave SQL thread); and today it happened, and I got a simple repro:

 
mysql> drop table if exists t1, t2, merge;
Query OK, 0 rows affected, 3 warnings (0.05 sec)
 
mysql> create table t1(c1 int)Engine=MyISAM;
Query OK, 0 rows affected (0.03 sec)
 
mysql> create table t2(c1 int)Engine=MyISAM;
Query OK, 0 rows affected (0.06 sec)
 
mysql> insert into t1 values(10);
Query OK, 1 row affected (0.04 sec)
 
mysql> insert into t2 values(20);
Query OK, 1 row affected (0.03 sec)
 
mysql> create table merge(c1 int)Engine=MERGE union(t1,t2);
Query OK, 0 rows affected (1.03 sec)
 
mysql> select * from merge;
+------+
| c1   |
+------+
|   10 | 
|   20 | 
+------+
2 rows in set (0.03 sec)
 
mysql> truncate t1;
ERROR 1105 (HY000): MyISAM table 't1' is in use (most likely by a MERGE table). Try FLUSH TABLES.

Well, but the DROP works on the same session:

mysql> drop table t1;
Query OK, 0 rows affected (0.03 sec)

So, this looks like a bug to me as I expect truncate to work even if the table handle is open and it is not in transaction and not explicitly locked by the user.

I tested this with latest 5.0.77 release and the same truncate works fine with 5.1.32 release, so must be a bug in 5.0 branch.

April 4, 2009

HP P800 Smart Array Performance

I had a chance to test HP Smart Array P800 SAS RAID controller with HP Proliant DL380 G5 server with the following configurations:

  • Dual Quad Core 2.5GhzHP DL380
  • 32-GB RAM
  • P800 Smart array with SAS interface
    • 25 disks, 10K RPM, 146G each
    • 512M cache
    • Battery backed

Here is the sysbench fileio random write performance numbers  for RAID-10 with the following configuration:

  • 18 disks, ext3 noatime, nodiratime (mirrored)
  • 0% read, 100% write cache
  • 64K strip sizep800 Array Controller style=
  • Battery enabled
  • 16K block size (sysbench)
threads sync - rndwr (reqs/sec) direct - rndwr(reqs/sec)
1 4690 4502
4 5556 5203
16 5560 5233
32 5559 5204

Overall the numbers seems to be reasonable for pure random writes with 64K strip size. The same configuration with RAID-5 drops the write performance by 1/4th. The controller does not provide a way to configure the cache burst time, and it automatically controls the cache and evenly distributes to other volumes.

Overall HP P800 might be a good choice for database workloads as it has 25 SAS disks support (depending on enclosure ) in a affordable price, even though 15K disks price to 10K price is wide; and people can simply add more 10K disks than buying fewer 15K

March 16, 2009

MySQL 5.1 Mix Of Handlers In Partitions

One good feature in MySQL 5.1 is the support of partitions which is built into the MySQL server. Lot of applications that I noticed use their own partition schemes by splitting the large table data to multiple tables on a desired key and wrapper table or application logic that identifies what table to use.

Here comes MySQL to rescue all those applications to push the partition logic to MySQL server itself without needing a application specific logic; which maintains partitions underneath the covers (each partition is a sub-table) on a single large table as shown below:

partition

The partition key is specific to table and/or application. If the partition key is account_id or site_id or anything that takes both active writes and reads; it is easy to make the partition table as InnoDB; so that they can really scale well.

If the partitions purpose is to serve reads; then making it MyISAM makes much better sense. Again, MyISAM with partitions does not support KEY CACHE or LOAD INDEX on partitioned table (hopefully in the future versions we can expect), that leaves to use only global key_buffer_size.

 

 

 

By the way, all the tests that I did so far, proves that MyISAM works well with partitions when compared to InnoDB due to stats that InnoDB provides to optimizer on partitions (not always) seems to be wrong especially when you have a partition on a range which is not unique (unless the application logic is forced to use INDEX or have the right join order and use STRAIGHT_JOIN).

The real problem comes when you have a table where active writes happens on a particular range (like current month) and rest of them are used as logging or serving for reports as shown below:

DROP TABLE IF EXISTS test_part;
 
CREATE TABLE test_part(
date DATE NOT NULL DEFAULT '0000-00-00',
comment VARCHAR(20) DEFAULT NULL
)ENGINE=MyISAM
PARTITION BY RANGE (to_days(date))
(
PARTITION nov08 VALUES LESS THAN(TO_DAYS('2008-12-01')),
PARTITION dec08 VALUES LESS THAN(TO_DAYS('2009-01-01')),
PARTITION jan09 VALUES LESS THAN(TO_DAYS('2009-02-01')),
PARTITION feb09 VALUES LESS THAN(TO_DAYS('2009-03-01')),
PARTITION mar09 VALUES LESS THAN(TO_DAYS('2009-04-01')),
PARTITION unpart VALUES LESS THAN MAXVALUE
);

In this case, choosing a right table type is more complex as making it MyISAM blocks the reads on active writes to current month; and choosing InnoDB makes it use more tablespace and memory along with performance penalty.

The one solution for this is to make active writes partition (current month, partition mar09 in the above example) as InnoDB and rest of the partitions as MyISAM. Unfortunately MySQL does not yet support mixing of table types within the partitions.

mysql> 
mysql> CREATE TABLE test_part(
    -> date DATE NOT NULL DEFAULT '0000-00-00',
    -> comment VARCHAR(20) DEFAULT NULL
    -> )ENGINE=MyISAM
    -> PARTITION BY RANGE (to_days(date))
    -> (
    -> PARTITION nov08 VALUES LESS THAN(TO_DAYS('2008-12-01')),
    -> PARTITION dec08 VALUES LESS THAN(TO_DAYS('2009-01-01')),
    -> PARTITION jan09 VALUES LESS THAN(TO_DAYS('2009-02-01')),
    -> PARTITION feb09 VALUES LESS THAN(TO_DAYS('2009-03-01')),
    -> PARTITION mar09 VALUES LESS THAN(TO_DAYS('2009-04-01')) ENGINE=InnoDB,
    -> PARTITION unpart VALUES LESS THAN MAXVALUE
    -> );
ERROR 1497 (HY000): The mix of handlers in the partitions is not allowed in this version of MySQL
mysql>

So, usage of partitions on date ranges with writes happening in the current month, will have a tough decision to make until MySQL supports mix of table handlers within the partitioned table.

March 12, 2009

SSD Peformance With Different File Systems

I just tested the new Intel X25-E Solid State Drive on RHEL-4 64-bit with three file intel-ssd-gallerysystems (ext2, ext3 and XFS) using the sysbench fileio test.

The test is done to gauge the performance of SSD as a temporary solution to act as tmpdir for MySQL server where result set temporary files are heavily created.

Here is the test results:

ssd-test

Overall, ext2 yields much better results on random IO along with sequential read. All file systems are mounted with noatime and XFS is mounted with noatime + nobarrier

So, even if one decides to use SSD as MySQL data storage; it will be nice to avoid it for any log storage  (error log, slow and general query logs, bin-logs and innodb transaction log) due to slower sequential writes. Sequential write on ext2 with O_DIRECT seems to be better than any other file system.

December 3, 2008

Transcend SSD 32GB Express Card

Yesterday I got the new Transcend SSD Express Card with 32GB that fits nicely with the Macbook Pro express card slot.

Even though the write speed is very low (8MB/s) for any standard express card; but read speed seems to be 18MB/s; so decided opt for this as it can serve as ready boost to my Windows operating system that is running under Vmware Fusion along with storing all my music files. I needed one, so that I can move all virtual machine files to this card; but due to the slow write speed; its hard at this time; and I may need to wait till we get the true SSD performance on Express card.

One funny thing is; Macbook seems to eject the card few hours once automatically; which seems to be odd to me and search does reveal that lot of other people also has the same issue. Not sure if its a bug from Macbook or Transcend.

Anyway, I will wait for the true SSD speed on express cards.

November 4, 2008

InnoDB show table status gets blocked

Yesterday I was going through the locks related code; and found that in 5.0; when you have a global read lock; then the SHOW TABLE STATUS actually gets blocked when it had its own read lock on the new session; and this is not the case with MySQL 5.1

For example; lets execute the following set of statements in two sessions with MySQL 5.0 (latest bazaar version with InnoDB enabled):

Session 1:

mysql> use test;
Database changed
mysql> drop table if exists t1;
Query OK, 0 rows affected (0.07 sec)
 
mysql> create table test.t1(c1 int)Engine=InnoDB;
Query OK, 0 rows affected (0.00 sec)
 
mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

Session 2:

mysql> use test
Database changed
mysql> lock table t1 read;
Query OK, 0 rows affected (0.01 sec)
 
mysql> show table status like 't1'\G

And you will notice that the last SHOW TABLE STATUS command actually gets blocked until you release the lock in session 1 or rollback the transaction or until lock wait timeout.

And when you execute the same set of statements with 5.1; then it won’t get blocked; and that is supposed to be what one expects as read-lock(global or session) shouldn’t block read-only query from any session.

Here is the code snippet in MySQL 5.0; which is causing the problem as it internally calls ha_commit_trans on the statement, which internally gets blocked on global read lock when closing the tables (sql_base.cc:close_thread_tables) where as in 5.1; this is entirely taken care; and ha_commit_stmt is not actually called from this function at all..

if (thd->locked_tables || prelocked_mode)
{
/*
Let us commit transaction for statement. Since in 5.0 we only have
one statement transaction and don't allow several nested statement
transactions this call will do nothing if we are inside of stored
function or trigger (i.e. statement transaction is already active and
does not belong to statement for which we do close_thread_tables()).
TODO: This should be fixed in later releases.
*/
ha_commit_stmt(thd);

I am not sure if this is a bug or not; but does not harm anybody; other than if one has mysqldump running in –single-transaction mode; then if your application is using individual READ LOCKs and processing any SHOW commands;  then it may be a problem. And in case if you already have READ LOCKS; then FLUSH TABLES WITH READ LOCK will anyway block till you release it explicitly

September 25, 2008

Setting Incremental Backup using ZRM

Its been a while I wrote anything on MySQL and/or Database related things as I was on a long two month vacation and immediately had a job change.

Today I was playing a bit in trying to setup up the ZRM community backup on Cent OS 5 to have a incremental backup mechanism; to see how it works as I never had a chance to use it so far. In the process, I could not find a single source of document which explains the basic steps needed on first-hand setup (or my search did not yield the right docs), so thought of writing and noting it down myself for any future reference. More or less, the instructions should be same for any Linux favor.

Here are the steps:

  • First install perl-DBI and perl-XML-Parser modules
    sudo yum install perl-DBI perl-XML-Parser
  • Download and install ZRM package.
    sudo rpm -ivh MySQL-zrm-2.0-1.noarch.rpm
    Password:
    warning: MySQL-zrm-2.0-1.noarch.rpm: Header V3 DSA signature: NOKEY, key ID 3c5d1c92
    Preparing...                ########################################### [100%]
       1:MySQL-zrm              ########################################### [100%]
    Checking and Migrating old ZRM data
  • Create user called `backup_user` identified by `backup_user` in MySQL
    mysql> GRANT ALL ON *.* TO 'backup_user'@localhost IDENTIFIED BY 'backup_user';
    Query OK, 0 rows affected (0.00 sec)
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)

  • Add a new directory under /etc/mysql-zrm/ as ‘dailyrun’ and copy the mysql-zrm.conf to this directory
    sudo mkdir -p /etc/mysql-zrm/dailyrun
    sudo cp /etc/mysql-zrm/mysql-zrm.conf /etc/mysql-zrm/dailyrun/
  • Start editing the file /etc/mysql-zrm/dailyrun/mysql-zrm.conf and fill some of basic information and leave backup-level=1 and backup-mode=raw. The conf-file self-explains on every parameter, so it becomes easy for anyone to setup even for first time.
  • Start testing the backup using mysql-zrm-scheduler by issuing:
    sudo mysql-zrm-scheduler --now --backup-set dailyrun
  • Once tested, add it to the standard daily scheduler to backup at 1 AM:
    mysql-zrm-scheduler --add --interval daily --start 01:00 --backup-set dailyrun

I am not going further to explain on how to verify the backup and restore it back, even though they are straight forward by using the ZRM utilities provided by the Zmanda; and complete information can be found from here.