In a seemingly perfect marriage of product and target market, database pioneer Mike Stonebraker’s new in-memory database company VoltDB has emerged from stealth mode using the open source model, soon to be open core. Its first release, GPL licensed Community Edition will appeal to developers who need blindingly fast transaction processing and are willing to do a lot of work themselves to get there – the do it yourself (DIY) database. Who better than the Gluecon community? Gluecon was the perfect place to do the formal roll out, filled as it is with hands-on folks looking to work with NoSQL products (like Cassandra, CouchDB, MongoDB, Riak, Voldemort, etc.)
The case for ACID-compliant databases was a differentiator in such an environment, while at the same time hitting the demands of high performance that echoed through the halls. Conference organizers describe it this way: “Glue is about all of the bits and pieces, APIs and meta-data, standards and connectors that will help us to glue together the varying applications of a post-cloud world.” Who better to work with a product that requires you to build the needed functions for your application yourself in java and then compile them in advance? What you get in return is, in effect, a customized database server that does exactly and only what you want it to do, eliminating the overhead of everything else.
Stonebraker has been driving and publishing research on the HStore project for years, exploring ways to leverage advances in the systems we run on to strip away decades worth of accreted workarounds in the code lines of RDBMSs. (One especially useful piece on the “One Size Fits All” (OSFA) question can be found here.) Curt Monash does his usual fine job describing some concepts and issues around VoltDB in his blog. To net it out: system overhead for logging to disk, buffer management, and plan building and optimization that must concern itself with physical I/O and parsing potentially complex SQL and building appropriate strategies slows things down dramatically. Multicore systems allow a new way of thinking about handling the work, but abandoning ACID properties (atomicity, consistency, isolation and durability), as some of the new alternative NoSQL offerings do, compromises transaction integrity. VoltDB does not. It partitions data and distributes work to every CPU core on commodity servers or clusters. Like other MPP products such as Kognitio‘s WX2, there is no dedicated head node; it automatically replicates data and uses the replicas when failures occur to recover automatically. And it is literally an order of magnitude faster for some things than traditional RDBMSs are. “Five times faster than Cassandra and 45 times faster than Oracle on an Intel Xeon X5550-based Dell PowerEdge R610 cluster, is the claim.” No audited benchmarks yet, though, or many named customers. But it’s early days.
There’s always a catch, of course, and in this case it’s about the amount of work programmers have to do themselves. Monash points out VoltDB’s limited SQL. In my conversation with VoltDB’s marketing VP Andy Ellicott and “field engineer” Tim Callaghan, we discussed how java stored procedures (SPs) must be created – a ROUND function, for example, is DIY. You put all your SPs into a project file and compile it, and voila – you have your own engine. The resulting engine uses asynchronous connections: fire off your (compiled) SP, and immediately go to the next one. No connection pools to manage – another piece of overhead removed.
This is a refactoring of work in modern architectures, but in many ways it’s regressive, pointing back to earlier processing models. (One could say the same about the “one file, one program” model often used in the NoSQL community.)
[The following 2 paragraphs edited to clarify some details] I had the chance to check in with another pioneer, Don Haderle, the “father of DB2” as he is often called (and now associated with Vertica and ANTS Software, the source of IBM’s new Sybase-to-DB2 play), to chat about some of the architectural issues. Don noted that “This is a restrictive, yet useful model. The transaction/application can only access the data in the container/partition and may only use services provided by the application (in this case database) server. This is very useful especially when tied to extended transactional capabilities. (As background, an extended transaction system provides compensation transactions to reverse previously committed changes. This is popular in message and BPE. For example, you book am airline trip and at some later point undo a leg of the journey. The software system has a script to provide compensating transactions to reverse the changes associated with that leg.)
“It hearkens back to days of yore. CICS provided wrappers around all of the system services that the transaction could invoke. This could guard against WAITing while holding key resources to include the actions of those services in the transaction context. In CICS talk, this is a pseudo conversational transaction (full conversation mode allows for a dialogue between the user (terminal) and the application).” Similarly, he noted, in IBM’s IMS, you can only invoke services it “knows about already.” SOA works in a very different way, and IBM has found interesting ways to partially reconcile the two models. But in the end, neither the old nor the new approaches support OSFA transactional application environments. There’s significant technology in the messaging space to choreograph business transactions, including compensating transactions to reverse previously committed effects.”
Time will tell if the developer community will find the required development restrictions worth the benefits; they will impose costs for maintenance, skills requirements, and especially good documentation and governance practices. VoltDB is hopeful they can build a community that will share useful functions as developers create them; again, the open source model’s history suggests that’s possible. In fact, VoltDB already has a contributed Erlang driver. But setting the mechanism up is not a trivial task, and will take them some time to build and harden. But models exist, like the Eigenbase project, and with a little investment of time and people, VoltDB could well go viral.
Thanks for the write-up Merv. One question I have is related to your assertion that using VoltDB places more burden on the app developer?
Do you think it would have been more accurate to contrast it with other data management solutions for high-scalability database applications?
1. Anyone serious about getting better OLTP performance out of an existing traditional DBMS will likely move to stored procs, if they haven’t already, to get rid of network overhead.
2. Key-Value stores force people to write application code to pull the data out and do the manipulation that the DBMS normally does.
3. Sharding a database also requires the movement of code from the DBMS to the application – logic to figure out which shard to access and to perform cross-shard manipulation, etc.
Talking about VoltDB development in relationship to the alternatives above, puts it in a more accurate context; database application developers and architects will have to pay a price somewhere to get better scalability–by sticking with SQL and automating partitioning and cross-partition data access and manipulation, we fee VoltDB simplifies work for the developer.
Andy, thanks for reading! These are all good points, worth airing. A few thoughts back at you. I don’t know that I was saying a burden is being thrown on the app developer – in fact, I think it’s a positive for the audience you targeted to take control, and they certainly want to do so. And yes, at the edge, the truly high-performance apps are very developer-driven and they do a great deal of coding.
1) is fairly stated, but of course it will take a bit more work, at least the first time some functions that are “atomic” are built. I’m assuming that having built and compiled a ROUND function, say, I could reuse it in another function that wanted it. From a “level of work” perspective, I can imagine new functions – especially in your target market like finance, where new algorithms are the stuff of life itself – will mean a lot of recompiles, quite often. And I do worry some about the degree of testing that will be needed in a complex environment of interacting hand-coded functions you’re not supporting because you didn’t build them.
2) is dead on – and Volt’s key-value blog post about benchmarking is a good statement of the situation. I encourage people to read it.
3) this one I have to spend more time thinking about – I’m looking at the caching platforms (not just memcache) and that will be a good topic for more dialogue. But your comments are again, spot on.
Overall, I certainly don’t question your assertion about the benefits. But I do believe that in the field the amount of work is substantially shifted from vendor to developer in direct proportion to how aggressively the customer tweaks, adjusts, and improves their apps. In your highest value target market, I would assert that is very high. The market will tell us if the benefit is enough to warrant that change. I’m not betting against you on that – I think you’ve raised the bar. Congratulations on shipping, and let the games begin!