Other day we had a small discussion about data stores and hardware; and which one drives the other when it comes to data storage solution, rather it is a hard discussion as both on its own are bigger entities; and one can not easily conclude as it depends on use cases and actually speaking data store limitation(s) drives the need for more powerful hardware for demanding scalability needs.
We all know how important the hardware is in today’s data scalability, especially when dealing with large data sets. Without hardware, it is hard to scale even if you have a powerful data store either it could be SQL (row or columnar) or NoSQL (key/value or other means) or any other data storage solution; because they are limited by the data structures & its implementation and data store performance directly depends on the hardware lately.
At times, data store vendors claim that they have scalable, high performance architecture; that means the solution is directly built on top of hardware scalability and performance by taking advantage of today’s evolving hardware technology. Also, hardware evolution is too aggressive in the recent years when compared to data store solutions due to the market share as hardware is everywhere as it is not just the storage solution.
In short, when a data store performance is directly proportional to hardware performance; that means the data store actually surpassed all of its software performance bottlenecks (algorithms, decision making, data structures etc). Overcoming from software performance is not that easy as the requirement changes day by day and it depends on data size and how data is actually:
If data is stored and retrieved from memory or non-persistent storage solution; then one does not need to worry about rest of the stuff or performance as it yields the best throughput; but memory or non-persistent solution can be a solution for smaller data sets, but not for large data sets that deals with tera bytes of data.
Other than newly evolving columnar data stores (yet to see any one solution that is really pitching with universal acceptance like Oracle/SQLServer/MySQL), NoSQL or big data warehouse solutions (like Aster data, Green Plum etc), none of the existing solutions really take advantage of the latest hardware or even the data structures as most of the data store kernels are written years back. In today’s world; the only option for scalability is by depending on the hardware and by distributing the load across multiple systems (either in shared-nothing or shared-common or even “cloud” way…).
Hoping to see a solution, one day that actually bridges the gap between data store, hardware and scalability without the need of using multiple technologies for common use cases instead of depending on one single solution that can be universally adopted. Brian Aker in his recent interview and Baron claims the same thought.