Apart from my consulting as part of ScaleIn, I also invest to bootstrap companies with really disruptive ideas; and in the process met few database specific companies who are already in launching stage.
Was little surprised to see people are spending lot of time and efforts in still evaluating, implementing and stabilizing distributed clustering solutions by themselves instead of making use of existing services in the public domain (especially part of apache foundation) as there is no need to reinvent the wheel (well, that’s personal take).
Both Zookeeper and Spread toolkit is actively used by number of popular projects like HBase, Solr, Neo4J, Storm, Kafka, Norbert, etc. uses Zookeeper as their clustering co-ordination service and Vertica Analytics DB uses spread as their default clustering service; its worth to investigate and extend the existing open-source services and support them by choosing the right service that fits the most; instead of developing new custom solutions.
Especially, even if you don’t want to deal at the zookeeper level by extending its services, solutions like Norbert (used by LinkedIn) integrates client/server solutions by combining with netty and protocol buffers (or you can extend with thrift) for easy adoption and integration by enabling compatibility with all leading language bindings.
Apart from Zookeeper and Spread, there are number of distributed services available to support clustering and quorum mechanism to elect the primary candidate (master/writer or reader/slave) along with providing various sync services by isolating the functionality with redundancy; or opt for simple publish-subscribe model like kafka, rabbitmq, or any other messaging service, etc. based on ordering/ priority / guaranteed delivery / ACK, etc. as deemed.
Currently, any OLTP system (like MySQL or PostgreSQL or any RDBMS system for the matter of fact,
non clustered) can use these services to provide high availability (HA) and load balancing (LB); but can’t guarantee the data consistency and replication (it can with proper scripting or manual invention). For example, here is a simple wrapper for redis HA solution using Zookeeper; pretty much same applies to any client/server architecture.
Still waiting for a simple distributed co-coordinating service that really supports atomicity and consistency over distributed nodes by segmentation (segmentation is controlled by replication factor – user defined by table or globally across cluster) without write ahead log (transactional redo log). Within cluster all nodes uses multi-phase commit (user controlled beyond 1 node to persist on segmented nodes or all at once ACK if business logic depends on latest copy on all segmented nodes due to distribution of requests), so that any OLTP system can make use of these services (with a little overhead on performance, user controlled though).
Redo log on the other hand, can be used mainly for redundant cluster.