April 14, 2008

Notes from MySQL Cluster

Here is the quick notes from MySQL Cluster by Stewart Smith session that I attended today in the MySQL conference 2008.

  • What it is
    • Clustering of in memory databases in a shared-nothing system
    • Designed for HA, 99.9% Uptime (not really) and sub-second failover
    • Supports Hot (online) consistent backup along with compression
    • No locks are used during the backup as NDB uses a global counters
  • Think as virtual file system
  • Redundancy
    • NoOfReplicas (1,2,3,4), For production use 1 or 2 but don’t use any other as it may have bugs (2 is preferred and well tested mode)
    • 1 means no redundancy (a node fails and cluster fails)
    • 2 means two copies
  • High performance (due to parallelism)
  • In memory only
    • Both data and index needs to fit in memory
    • 5.1 optionally supports to have non-index stuff to disk
    • Rows will have in-memory part and disk part
  • Durability
    • Checkpoint to disk (frequency is configurable, default 2 secs?)
    • Rate is limited to avoid IO
    • Cluster will restore from last known Global Check Point (GCP)
    • Any comitted transactions that are not part of checkpoint will be lost
  • Node Types
    1. Data Nodes (ndbd)
      • Grouped into nodegroups
      • NoOfReplicas is number of nodes in nodegroup
      • Up2 48 in one cluster
      • Interact using regular SQL or C++ NDBAPI or using other langauge binders
      • For each transaction, one is used in Transaction Coordinator and load balanced around the cluster
      • Node failure, application has to retry the transaction
      • All MySQL servers connected to server have the same view of the database
      • Replication between data nodes is synchrounous as opposed to standard asynchrounous MySQL replication, uses two-phase commit
      • ndbd is single threaded (thats why it is not CPU bound), multithreading coming soon
      • Compressed backups, checkpoint might use other CPUs
      • Multiple nodes for SMP
      • Every table should have a Primary key, else 64-bit int will be internally used (same as InnoDB)
      • Uses hashkey(PK) to find the right partition
    2. Management Server (ndb_mgmd)
      • Interact with management client (ndb_mgm) or C management API (mgmapi)
      • distributes configuration to other nodes
      • Backup is started by management server
      • Can have more than one for redundancy
      • Stopping management server will not break the existing cluster
      • It has to be running when nodes join the cluster
      • Backing up of config.ini is very much recommended
      • Issues any commands to nodes (start, stop, restart)
    3. SQL Node (mysqld)
      • MySQL Server
      • Many API/SQL nodes are needed to load storage nodes
      • dependent on application and types of queries
      • Is multi-threaded, takes advantage of SMP
  • Collection of nodes constitute Cluster
  • Requirements
    • Node is a process not a physical computer
    • Atleast 3 physical machines needed for HA
      • Avoid split brain problem (network down, node down etc)
      • In case of split brain problem, nodes will be shutdown forecefully and all comitted transacations will be lost and will be restored only to last known checkpoint.
      • With only two, its hard to know if it has relly failed or network connection has failed
      • You probably need more than 3 machines though
    • ndbd is single threaded, and future version might be multi-threaded
    • SQL node is multi-threaded (ofcourse)
    • Management server is not CPU intensive (almost 0 CPU)
    • Data Nodes need lots of memory, Disk I/O can be calculated and generally not CPU bound (as its single threaded)
    • Memory needed for Index and Data can be calculated using a tool provided (ndb_size.pl)
  • Setup
    • Configure the nodes
    • Management server needs to be started first (ndbm_mgmd -f config.ini)
    • Data Nodes
      • on each storage node ndbd -c IP
      • Can have more than one on the same physical box
    • SQL Nodes
      • Make sure NDBCLUSTER option is enabled (command line or config)
      • Make sure the connect string is specified (command line or config)
      • start the server
    • Monitor (using ndb_mgb -c localhost:9350 -e ’show’)
  • Using the Cluster
    • Tables can be stored in cluster with ENGINE=NDBCLUSTER
    • Supports views, stored procedur, triggers etc
    • Permissions needs to exists in each SQL Node (mysql.user*)
  • Limitations:
    • prior to 5.1; everything in memory (both index and data)
    • 5.1, non-index can be to disk
    • Foreignkey constraints are not yet supported
    • maximum of 255 nodes can belong to a single MySQL Cluster with 48 of being data nodes
    • No dynamic expansion and needs to be restarted to introduce new nodes in the cluster
    • ndbd is single threaded

April 12, 2008

Solution to most common db problems ?

Today I just happen to read May 2008 edition of Dr.Dobb’s Journal printed edition and the article “Kernel-mode Databases” written by Andrel and Alexander from McObject really excited me.

The basic principle of any database engine is its performance and scalability and success of any database in most part relays on these two concepts. To achieve this, operating system kernel plays a major role in the form of …

  • resource allocation
  • thread scheduling
  • low-level hardware access
  • network
  • security

So, looks like eXtremeDB try to overcome from this by releasing kernel-mode database. The kernel mode package actually overrides most of the kernel access calls that database engine expects by replacing with optimized direct calls and exposing them to the top layer. And the run-time engine of eXtremeDB actually linked with the kernel mode to avoid any overhead from the RPC.

Interesting quotes:

“Kernel-mode threads have direct access to kernel-mode databases; concurrent access is coordinated by the database run-time”

“Applications performing computer security functions performed an order of magnitude faster”

It will be really nice if McObject can release benchmark results between regular kernel threads and the optimized one, along with conducting some more tests to prove the benefit from their kernel mode. Even though they found a solution for part of the common problems (kernel overhead, security, thread scheduling) but still there is a overhead from regular IO and exclusive lock time when doing the updates. Looks like some of their locks are using “Compare-and-swap“; but could not find much reference to where they were actually used within the code base.

April 11, 2008

SQL Server Data Services - Invitation to attend the event

After Microsoft released its beta version of SQL Server Data Services; I was more interested to know what kind of considerations were taken care in the design and architecture of SSDS in order to get performance, distribution of data and scalability as it is developed on top of SQL Server.

To my surprise, I noticed Microsoft is conducting the live event to answer some of these questions by following up Ryan Dunn blog; Today I also got the confirmation to attend the event and I look forward to get some of my general questions answered.

It will be interesting to see the pricing model of SSDS in the coming days, and might even attract if they make it FREE for initial customers and/or by setting the limit on the data usage by distributing it as part of OS. It will be really cool if they could also integrate some kind of access mechanism as part of the windows operating system itself to access/store the data.

I hope Microsoft will also change the name of SSDS to something else before the final release as Data Services is not catchy and does not sound right (check yet another post on the similar lines).

February 26, 2008

Query Cache - Prepared Statements

For years, MySQL never supported query cache with prepared statements. When I initially implemented MySQL prepared statements and its associated C API in MySQL 4.1 version; prepared statements were just saving the parsing cost of the query and nothing much. But now, the things are totally different.

Lot of developers stayed away from prepared statements due to this limitation (atleast for select statements). But now the latest MySQL 5.1 server does support the caching of prepared statements with some limitations.

I just wrote a simple C API test which uses mysql_prepare, mysql_execute and mysql_real_query and all seemed to work as expected and the cache works even with intermixed (prepare and non-prepare) on the same binary protocol. Here is a simple example how the QCache works with MySQL command line client when queries were intermixed with prepared and non-prepare execution on 5.1.23 server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
mysql> show status like '%cache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 0        |
| Qcache_inserts          | 0        |
| Qcache_not_cached       | 1        |
| Qcache_queries_in_cache | 0        |
+-------------------------+----------+
11 rows in set (0.01 sec)
 
mysql> create table t1(c1 int);
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1 values(10),(20),(30);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
mysql> select * from t1;
+------+
| c1   |
+------+
|   10 |
|   20 |
|   30 |
+------+
3 rows in set (0.00 sec)
 
mysql> prepare stmt1 from 'select * from t1 where c1=?';
Query OK, 0 rows affected (0.00 sec)
Statement prepared
 
mysql> set @a=10;
Query OK, 0 rows affected (0.00 sec)
 
mysql> execute stmt1 using @a;
+------+
| c1   |
+------+
|   10 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 0        |
| Qcache_inserts          | 1        |
| Qcache_not_cached       | 1        |
| Qcache_queries_in_cache | 1        |
+-------------------------+----------+
8 rows in set (0.00 sec)
 
mysql> select * from t1 where c1=10;
+------+
| c1   |
+------+
|   10 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 1        |
| Qcache_inserts          | 1        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 1        |
+-------------------------+----------+
8 rows in set (0.00 sec)
 
mysql> select * from t1 where c1=20;
+------+
| c1   |
+------+
|   20 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 1        |
| Qcache_inserts          | 2        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 2        |
+-------------------------+----------+
8 rows in set (0.01 sec)
 
mysql> set @a=20;
Query OK, 0 rows affected (0.02 sec)
 
mysql> execute stmt1 using @a;
+------+
| c1   |
+------+
|   20 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 2        |
| Qcache_inserts          | 2        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 2        |
+-------------------------+----------+
8 rows in set (0.00 sec)
 
mysql> select * from t1 where c1=20;
+------+
| c1   |
+------+
|   20 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 3        |
| Qcache_inserts          | 2        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 2        |
+-------------------------+----------+
8 rows in set (0.00 sec)
 
mysql> select * from t1 where c1=30;
+------+
| c1   |
+------+
|   30 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 3        |
| Qcache_inserts          | 3        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 3        |
+-------------------------+----------+
8 rows in set (0.00 sec)
 
mysql> set @a=30;
Query OK, 0 rows affected (0.00 sec)
 
mysql> execute stmt1 using @a;
+------+
| c1   |
+------+
|   30 |
+------+
1 row in set (0.00 sec)
 
mysql> show status like '%Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_hits             | 4        |
| Qcache_inserts          | 3        |
| Qcache_not_cached       | 2        |
| Qcache_queries_in_cache | 3        |
+-------------------------+----------+
8 rows in set (0.00 sec)

December 22, 2007

Sysbench - test bails on duplicate key entries

Since few days am doing a benchmark of mysql 4.1 and 5.1 on different hardware using sysbench so that I can test MySQL along with threads, cpu and IO related. But most of the time, when I mix the OLTP test with mysql to have writes; then the test bails when there is a duplicate key entry:

1
2
3
Threads started!
ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry 'XXXXX' for key X
FATAL: database error, exiting...

as I run with multiple threads using a shell script; which runs for about 12-16 hrs…so I did not wanted some tests to bail in middle…as having a duplicate key for the updates is fine … so applied a simple patch by having extra argument –mysql-ignore-duplicates, which makes the test to continue even on the duplicate key errors … another option is to reissue the query with different numbers, but thought this is enough for base testing.

Here is the patch file for mysql-ignore-duplicates and can also be downloaded from here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- sysbench-0.5.0/sysbench/drivers/mysql/drv_mysql.c   2006-10-10 12:22:51.000000000 -0700
+++ venu-patch/sysbench/drivers/mysql/drv_mysql.c       2007-12-22 02:30:36.492383000 -0800
@@ -65,6 +65,7 @@
    SB_ARG_TYPE_STRING, "auto"},
   {"mysql-ssl", "use SSL connections, if available in the client library", SB_ARG_TYPE_FLAG, "off"},
   {"myisam-max-rows", "max-rows parameter for MyISAM tables", SB_ARG_TYPE_INT, "1000000"},
+  {"mysql-ignore-duplicates", "Ignore the duplicate key errors", SB_ARG_TYPE_FLAG, "off"},
 
   {NULL, NULL, SB_ARG_TYPE_NULL, NULL}
 };
@@ -87,6 +88,7 @@
   unsigned int       myisam_max_rows;
   mysql_drv_trx_t    engine_trx;
   unsigned int       use_ssl;
+  unsigned int       ignore_duplicates;
 } mysql_drv_args_t;
 
 #ifdef HAVE_PS
@@ -232,6 +234,10 @@
   args.db = sb_get_value_string("mysql-db");
   args.myisam_max_rows = sb_get_value_int("myisam-max-rows");
   args.use_ssl = sb_get_value_flag("mysql-ssl");
+  args.ignore_duplicates = sb_get_value_flag("mysql-ignore-duplicates");
+
+  if (args.ignore_duplicates)
+       log_text(LOG_ALERT, "WARNING: Duplicate Key errors will be ignored");
 
   use_ps = 0;
 #ifdef HAVE_PS
@@ -640,6 +646,12 @@
       if (rc == ER_LOCK_DEADLOCK || rc == ER_LOCK_WAIT_TIMEOUT ||
           rc == ER_CHECKREAD)
         return SB_DB_ERROR_DEADLOCK;
+         if (args.ignore_duplicates && rc == ER_DUP_ENTRY) {
+            log_text(LOG_ALERT, "WARNING: Ignoring the duplicate error on execute mysql_stmt_execute(): Err%d %s",
+                    mysql_errno(con->ptr),
+                    mysql_error(con->ptr));
+            return SB_DB_ERROR_NONE;
+         }
       log_text(LOG_ALERT, "failed to execute mysql_stmt_execute(): Err%d %s",
                mysql_errno(con->ptr),
                mysql_error(con->ptr));

December 13, 2007

MySQL 5.1 - LOAD DATA INFILE

Today to debug one of the issue related to InnoDB where the INSERT INTO .. SELECT simply bails with an error as “ERROR 1105 (HY000): Unknown error”; I simply started loading the data onto 5.1 server as it was consistent in 4.1 version (even though it was noted as fixed). So, LOAD DATA INFILE suddenly pops with an error as ..

1
2
3
4
5
6
7
8
9
mysql> LOAD DATA LOCAL INFILE 'venudata-all.txt' INTO TABLE DOCUMENT_DESC FIELDS TERMINATED BY ',';
ERROR 1148 (42000): The used command is not allowed with this MySQL version
mysql> show variables like '%infile%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| local_infile  | ON    |
+---------------+-------+
1 row in set (0.00 sec)

As you can see I have the local_infile enabled in server. And further reading enabled me that I need to start my mysql command line utility also with –local-infile option… well…looks like time for me to read more documentation and feature changes.

December 5, 2007

MySQL 5.1 - query optimizer regression

Since few days I was debugging a case from Yahoo! Japan team (thanks to Akira Kitada) where the query runs fine in about 0 secs with the 5.1.16 build where as the same query takes almost close to 2 minutes with the latest 5.1.22 release.

One can simply identify that this is a optimizer issue as it is not picking the right index. Lets consider the following two tables urls and urls_categories

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| urls  | CREATE TABLE `urls` (
  `url_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `url` varbinary(2000) NOT NULL,
  `url_status` enum('EXISTS','DEAD','N/A') NOT NULL,
  `yservice_id` smallint(5) unsigned DEFAULT NULL,
  `created_by` smallint(5) unsigned NOT NULL,
  `modified_by` smallint(5) unsigned NOT NULL,
  `created_on` datetime NOT NULL,
  `modified_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `nocheck` tinyint(1) NOT NULL DEFAULT '0',
  `url_md5` char(32) NOT NULL,
  PRIMARY KEY (`url_id`),
  UNIQUE KEY `url_unique` (`url_md5`),
  KEY `modified_on` (`modified_on`),
  KEY `url` (`url`(50)),
  KEY `yservice_id` (`yservice_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
 
 
| urls_categories | CREATE TABLE `urls_categories` (
  `url_id` int(10) unsigned NOT NULL,
  `category_id` tinyint(3) unsigned NOT NULL,
  `workflow` enum('NEW','ACTIVATED','DEACTIVATED','REMOVED') NOT NULL,
  `searchtype_id` tinyint(3) unsigned NOT NULL,
  `work_id` tinyint(3) unsigned NOT NULL,
  `created_by` smallint(5) unsigned NOT NULL,
  `modified_by` smallint(5) unsigned NOT NULL,
  `created_on` datetime NOT NULL,
  `modified_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `comment` varchar(200) NOT NULL,
  PRIMARY KEY (`url_id`,`category_id`),
  KEY `search_index` (`modified_on`,`workflow`,`category_id`),
  KEY `rd_index` (`url_id`,`category_id`,`workflow`),
  KEY `category_id` (`category_id`),
  KEY `work_id` (`work_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`),
  KEY `searchtype_id` (`searchtype_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

And here is the explain output for a simple inner join query when ran with MySQL version 5.1.16:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: PRIMARY,rd_index
          key: search_index
      key_len: 6
          ref: NULL
         rows: 3121058
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

and here is what you get when ran with MySQL 5.1.22:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

as you can cleary see, for urls_categories table, the key search_index is missing and optimizer is not picking that. Even if I force to use the key, still no effect:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc force key(search_index,PRIMARY,rd_index)  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
 
mysql> explain select SQL_NO_CACHE * from urls u inner join urls_categories uc force key(search_index)  using (url_id) order by uc.modified_on limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

As a result of the above bad index use, the query takes about 1m 20 secs with 5.1.22 where as the same one runs in about 0.0 secs as shown below..

with 5.1.22:

1
2
3
4
mysql> select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10;
...
...
10 rows in set (1 min 14.55 sec)

with 5.1.16:

1
2
3
4
mysql> select SQL_NO_CACHE * from urls u inner join urls_categories uc  using (url_id) order by uc.modified_on limit 10;
...
...
10 rows in set (0.00 sec)

Here is the bug report to mysql team: MySQL 5.1 optimizer regression.