VoltDB – DIY OLTP. Open Source. Win.
May 26, 2010 5 Comments
In a seemingly perfect marriage of product and target market, database pioneer Mike Stonebraker’s new in-memory database company VoltDB has emerged from stealth mode using the open source model, soon to be open core. Its first release, GPL licensed Community Edition will appeal to developers who need blindingly fast transaction processing and are willing to do a lot of work themselves to get there – the do it yourself (DIY) database. Who better than the Gluecon community? Gluecon was the perfect place to do the formal roll out, filled as it is with hands-on folks looking to work with NoSQL products (like Cassandra, CouchDB, MongoDB, Riak, Voldemort, etc.)
The case for ACID-compliant databases was a differentiator in such an environment, while at the same time hitting the demands of high performance that echoed through the halls. Conference organizers describe it this way: “Glue is about all of the bits and pieces, APIs and meta-data, standards and connectors that will help us to glue together the varying applications of a post-cloud world.” Who better to work with a product that requires you to build the needed functions for your application yourself in java and then compile them in advance? What you get in return is, in effect, a customized database server that does exactly and only what you want it to do, eliminating the overhead of everything else.
Stonebraker has been driving and publishing research on the HStore project for years, exploring ways to leverage advances in the systems we run on to strip away decades worth of accreted workarounds in the code lines of RDBMSs. (One especially useful piece on the “One Size Fits All” (OSFA) question can be found here.) Curt Monash does his usual fine job describing some concepts and issues around VoltDB in his blog. To net it out: system overhead for logging to disk, buffer management, and plan building and optimization that must concern itself with physical I/O and parsing potentially complex SQL and building appropriate strategies slows things down dramatically. Multicore systems allow a new way of thinking about handling the work, but abandoning ACID properties (atomicity, consistency, isolation and durability), as some of the new alternative NoSQL offerings do, compromises transaction integrity. VoltDB does not. It partitions data and distributes work to every CPU core on commodity servers or clusters. Like other MPP products such as Kognitio‘s WX2, there is no dedicated head node; it automatically replicates data and uses the replicas when failures occur to recover automatically. And it is literally an order of magnitude faster for some things than traditional RDBMSs are. “Five times faster than Cassandra and 45 times faster than Oracle on an Intel Xeon X5550-based Dell PowerEdge R610 cluster, is the claim.” No audited benchmarks yet, though, or many named customers. But it’s early days.
There’s always a catch, of course, and in this case it’s about the amount of work programmers have to do themselves. Monash points out VoltDB’s limited SQL. In my conversation with VoltDB’s marketing VP Andy Ellicott and “field engineer” Tim Callaghan, we discussed how java stored procedures (SPs) must be created – a ROUND function, for example, is DIY. You put all your SPs into a project file and compile it, and voila – you have your own engine. The resulting engine uses asynchronous connections: fire off your (compiled) SP, and immediately go to the next one. No connection pools to manage – another piece of overhead removed.
This is a refactoring of work in modern architectures, but in many ways it’s regressive, pointing back to earlier processing models. (One could say the same about the “one file, one program” model often used in the NoSQL community.)
[The following 2 paragraphs edited to clarify some details] I had the chance to check in with another pioneer, Don Haderle, the “father of DB2” as he is often called (and now associated with Vertica and ANTS Software, the source of IBM’s new Sybase-to-DB2 play), to chat about some of the architectural issues. Don noted that “This is a restrictive, yet useful model. The transaction/application can only access the data in the container/partition and may only use services provided by the application (in this case database) server. This is very useful especially when tied to extended transactional capabilities. (As background, an extended transaction system provides compensation transactions to reverse previously committed changes. This is popular in message and BPE. For example, you book am airline trip and at some later point undo a leg of the journey. The software system has a script to provide compensating transactions to reverse the changes associated with that leg.)
“It hearkens back to days of yore. CICS provided wrappers around all of the system services that the transaction could invoke. This could guard against WAITing while holding key resources to include the actions of those services in the transaction context. In CICS talk, this is a pseudo conversational transaction (full conversation mode allows for a dialogue between the user (terminal) and the application).” Similarly, he noted, in IBM’s IMS, you can only invoke services it “knows about already.” SOA works in a very different way, and IBM has found interesting ways to partially reconcile the two models. But in the end, neither the old nor the new approaches support OSFA transactional application environments. There’s significant technology in the messaging space to choreograph business transactions, including compensating transactions to reverse previously committed effects.”
Time will tell if the developer community will find the required development restrictions worth the benefits; they will impose costs for maintenance, skills requirements, and especially good documentation and governance practices. VoltDB is hopeful they can build a community that will share useful functions as developers create them; again, the open source model’s history suggests that’s possible. In fact, VoltDB already has a contributed Erlang driver. But setting the mechanism up is not a trivial task, and will take them some time to build and harden. But models exist, like the Eigenbase project, and with a little investment of time and people, VoltDB could well go viral.