Here is the quick notes from MySQL Cluster by Stewart Smith session that I attended today in the MySQL conference 2008.
- What it is
- Clustering of in memory databases in a shared-nothing system
- Designed for HA, 99.9% Uptime (not really) and sub-second failover
- Supports Hot (online) consistent backup along with compression
- No locks are used during the backup as NDB uses a global counters
- Think as virtual file system
- Redundancy
- NoOfReplicas (1,2,3,4), For production use 1 or 2 but don’t use any other as it may have bugs (2 is preferred and well tested mode)
- 1 means no redundancy (a node fails and cluster fails)
- 2 means two copies
- High performance (due to parallelism)
- In memory only
- Both data and index needs to fit in memory
- 5.1 optionally supports to have non-index stuff to disk
- Rows will have in-memory part and disk part
- Durability
- Checkpoint to disk (frequency is configurable, default 2 secs?)
- Rate is limited to avoid IO
- Cluster will restore from last known Global Check Point (GCP)
- Any comitted transactions that are not part of checkpoint will be lost
- Node Types
- Data Nodes (ndbd)
- Grouped into nodegroups
- NoOfReplicas is number of nodes in nodegroup
- Up2 48 in one cluster
- Interact using regular SQL or C++ NDBAPI or using other langauge binders
- For each transaction, one is used in Transaction Coordinator and load balanced around the cluster
- Node failure, application has to retry the transaction
- All MySQL servers connected to server have the same view of the database
- Replication between data nodes is synchrounous as opposed to standard asynchrounous MySQL replication, uses two-phase commit
- ndbd is single threaded (thats why it is not CPU bound), multithreading coming soon
- Compressed backups, checkpoint might use other CPUs
- Multiple nodes for SMP
- Every table should have a Primary key, else 64-bit int will be internally used (same as InnoDB)
- Uses hashkey(PK) to find the right partition
- Management Server (ndb_mgmd)
- Interact with management client (ndb_mgm) or C management API (mgmapi)
- distributes configuration to other nodes
- Backup is started by management server
- Can have more than one for redundancy
- Stopping management server will not break the existing cluster
- It has to be running when nodes join the cluster
- Backing up of config.ini is very much recommended
- Issues any commands to nodes (start, stop, restart)
- SQL Node (mysqld)
- MySQL Server
- Many API/SQL nodes are needed to load storage nodes
- dependent on application and types of queries
- Is multi-threaded, takes advantage of SMP
- Data Nodes (ndbd)
- Collection of nodes constitute Cluster
- Requirements
- Node is a process not a physical computer
- Atleast 3 physical machines needed for HA
- Avoid split brain problem (network down, node down etc)
- In case of split brain problem, nodes will be shutdown forecefully and all comitted transacations will be lost and will be restored only to last known checkpoint.
- With only two, its hard to know if it has relly failed or network connection has failed
- You probably need more than 3 machines though
- ndbd is single threaded, and future version might be multi-threaded
- SQL node is multi-threaded (ofcourse)
- Management server is not CPU intensive (almost 0 CPU)
- Data Nodes need lots of memory, Disk I/O can be calculated and generally not CPU bound (as its single threaded)
- Memory needed for Index and Data can be calculated using a tool provided (ndb_size.pl)
- Setup
- Configure the nodes
- Management server needs to be started first (ndbm_mgmd -f config.ini)
- Data Nodes
- on each storage node ndbd -c IP
- Can have more than one on the same physical box
- SQL Nodes
- Make sure NDBCLUSTER option is enabled (command line or config)
- Make sure the connect string is specified (command line or config)
- start the server
- Monitor (using ndb_mgb -c localhost:9350 -e ‘show’)
- Using the Cluster
- Tables can be stored in cluster with ENGINE=NDBCLUSTER
- Supports views, stored procedur, triggers etc
- Permissions needs to exists in each SQL Node (mysql.user*)
- Limitations:
- prior to 5.1; everything in memory (both index and data)
- 5.1, non-index can be to disk
- Foreignkey constraints are not yet supported
- maximum of 255 nodes can belong to a single MySQL Cluster with 48 of being data nodes
- No dynamic expansion and needs to be restarted to introduce new nodes in the cluster
- ndbd is single threaded
Few little notes:
– think of the mysql storage engine interface as a VFS layer (just plug in storage engines.. but different engines have different properties)
– we really design for 99.999% (5 nines) uptime…. but you may not get this due to things like human error
– committed trxns since the last checkpoint are only lost on cluster failure, they survive node failure.
hope you enjoyed the session!
Thanks Stewart for clarifying it.
Typically, a storage area network (SAN) is part of the overall network of computing resources for an enterprise. Storage Area Network (SAN) allows a network of systems to access the storage over a dedicated storage