Today I just happen to read May 2008 edition of Dr.Dobb’s Journal printed edition and the article “Kernel-mode Databases” written by Andrel and Alexander from McObject really excited me.
The basic principle of any database engine is its performance and scalability and success of any database in most part relays on these two concepts. To achieve this, operating system kernel plays a major role in the form of …
- resource allocation
- thread scheduling
- low-level hardware access
- network
- security
So, looks like eXtremeDB try to overcome from this by releasing kernel-mode database. The kernel mode package actually overrides most of the kernel access calls that database engine expects by replacing with optimized direct calls and exposing them to the top layer. And the run-time engine of eXtremeDB actually linked with the kernel mode to avoid any overhead from the RPC.
Interesting quotes:
|
It will be really nice if McObject can release benchmark results between regular kernel threads and the optimized one, along with conducting some more tests to prove the benefit from their kernel mode. Even though they found a solution for part of the common problems (kernel overhead, security, thread scheduling) but still there is a overhead from regular IO and exclusive lock time when doing the updates. Looks like some of their locks are using “Compare-and-swap“; but could not find much reference to where they were actually used within the code base.
I could not find any benchmark results from them. It will be really nice if they could post some interesting numbers
Kernel mode package is open source. But it is not available to download though. We need to contact their sales team to get the eval copy (and source seems to be part of that)
If anybody finds the source, couple you please post a link.
Thanks for your interest in McObject’s kernel mode database. Let me clarify. The eXtremeDB kernel-mode edition is a C-language library that is built for use in the Linux kernel or in any OS that supports the concept of separate kernel- and user-mode spaces. As such, it does not use any of the user-mode synchronization primitives (mutexes, semaphores, etc) and instead uses the kernel flavor of primitives (spinlocks); it also does not use the C runtime library. Because eXtremeDB-KM is a main-memory database, it does not have any of the disk I/O overhead inherent in disk-based DBMSs, it eliminates caching logic, and it provides direct access to managed data (your kernel driver uses a handle to read/write the data into the database).
If your application is a driver that requires a database, the KM edition provides a tool that enables you to create and manage a database inside the kernel. For applications that also include user-mode components, the kernel mode database can be accessed from outside the kernel via a special set of interfaces provided by eXtremeDB-KM. The mechanism of these interfaces is similar to the well-understood network RPC with the proxy (character device, connected to the database) and user-mode stubs.
With regard to exclusive locks: the current release of eXtremeDB-KM uses an exclusive lock (built upon a kernel spinlock) for its update operations. The time required by a database transaction to update the database is very short (as you’ve mentioned yourself, there are no layers between the driver and the database, also the data is in memory, all the time). That justifies the exclusive locking (as opposed to having a special “lock arbiter†that requires some sort of message passing).
The KM package includes a sample program that could be viewed as a benchmark: this is the same application described in the DDJ article. Running it in your chosen environment is probably the best way to generate the performance numbers that interest you.
However, to clear up any misunderstanding, I should state here that McObject’s eXtremeDB embedded database software is not open-source. It is distributed under a commercial license. It is true that the kernel mode product is always sold with a license for full source code; the user typically builds the database library and samples for a particular version of the kernel. The evaluation might be made available in binary form as well — please contact McObject (sales@mcobject.com) for details. And if you have questions of a technical nature, feel free to e-mail McObject’s CTO (and author of the DDJ article), Andrei Gorine, gor@mcobject.com.
Thanks Ted for clarifying the things, I do appreciate that. You can reach me at venu at venublog dot com at any time.