Here is the quick notes from MySQL Cluster by Stewart Smith session that I attended today in the MySQL conference 2008.

  • What it is
    • Clustering of in memory databases in a shared-nothing system
    • Designed for HA, 99.9% Uptime (not really) and sub-second failover
    • Supports Hot (online) consistent backup along with compression
    • No locks are used during the backup as NDB uses a global counters
  • Think as virtual file system
  • Redundancy
    • NoOfReplicas (1,2,3,4), For production use 1 or 2 but don’t use any other as it may have bugs (2 is preferred and well tested mode)
    • 1 means no redundancy (a node fails and cluster fails)
    • 2 means two copies
  • High performance (due to parallelism)
  • In memory only
    • Both data and index needs to fit in memory
    • 5.1 optionally supports to have non-index stuff to disk
    • Rows will have in-memory part and disk part
  • Durability
    • Checkpoint to disk (frequency is configurable, default 2 secs?)
    • Rate is limited to avoid IO
    • Cluster will restore from last known Global Check Point (GCP)
    • Any comitted transactions that are not part of checkpoint will be lost
  • Node Types
    1. Data Nodes (ndbd)
      • Grouped into nodegroups
      • NoOfReplicas is number of nodes in nodegroup
      • Up2 48 in one cluster
      • Interact using regular SQL or C++ NDBAPI or using other langauge binders
      • For each transaction, one is used in Transaction Coordinator and load balanced around the cluster
      • Node failure, application has to retry the transaction
      • All MySQL servers connected to server have the same view of the database
      • Replication between data nodes is synchrounous as opposed to standard asynchrounous MySQL replication, uses two-phase commit
      • ndbd is single threaded (thats why it is not CPU bound), multithreading coming soon
      • Compressed backups, checkpoint might use other CPUs
      • Multiple nodes for SMP
      • Every table should have a Primary key, else 64-bit int will be internally used (same as InnoDB)
      • Uses hashkey(PK) to find the right partition
    2. Management Server (ndb_mgmd)
      • Interact with management client (ndb_mgm) or C management API (mgmapi)
      • distributes configuration to other nodes
      • Backup is started by management server
      • Can have more than one for redundancy
      • Stopping management server will not break the existing cluster
      • It has to be running when nodes join the cluster
      • Backing up of config.ini is very much recommended
      • Issues any commands to nodes (start, stop, restart)
    3. SQL Node (mysqld)
      • MySQL Server
      • Many API/SQL nodes are needed to load storage nodes
      • dependent on application and types of queries
      • Is multi-threaded, takes advantage of SMP
  • Collection of nodes constitute Cluster
  • Requirements
    • Node is a process not a physical computer
    • Atleast 3 physical machines needed for HA
      • Avoid split brain problem (network down, node down etc)
      • In case of split brain problem, nodes will be shutdown forecefully and all comitted transacations will be lost and will be restored only to last known checkpoint.
      • With only two, its hard to know if it has relly failed or network connection has failed
      • You probably need more than 3 machines though
    • ndbd is single threaded, and future version might be multi-threaded
    • SQL node is multi-threaded (ofcourse)
    • Management server is not CPU intensive (almost 0 CPU)
    • Data Nodes need lots of memory, Disk I/O can be calculated and generally not CPU bound (as its single threaded)
    • Memory needed for Index and Data can be calculated using a tool provided (ndb_size.pl)
  • Setup
    • Configure the nodes
    • Management server needs to be started first (ndbm_mgmd -f config.ini)
    • Data Nodes
      • on each storage node ndbd -c IP
      • Can have more than one on the same physical box
    • SQL Nodes
      • Make sure NDBCLUSTER option is enabled (command line or config)
      • Make sure the connect string is specified (command line or config)
      • start the server
    • Monitor (using ndb_mgb -c localhost:9350 -e ‘show’)
  • Using the Cluster
    • Tables can be stored in cluster with ENGINE=NDBCLUSTER
    • Supports views, stored procedur, triggers etc
    • Permissions needs to exists in each SQL Node (mysql.user*)
  • Limitations:
    • prior to 5.1; everything in memory (both index and data)
    • 5.1, non-index can be to disk
    • Foreignkey constraints are not yet supported
    • maximum of 255 nodes can belong to a single MySQL Cluster with 48 of being data nodes
    • No dynamic expansion and needs to be restarted to introduce new nodes in the cluster
    • ndbd is single threaded
  • Print
  • Twitter
  • Facebook
  • PDF
  • Google Bookmarks

Related posts:

  1. Notes from Online MySQL Backup in 6.0
  2. Notes from Falcon from the beginning
  3. Notes on InnoDB Scale on servers with many cores
  4. Notes from InnoDB status, architecture and new features