Analytic Event Processing (AEP) is hot. But does it mean RDBMS begins to decline in importance? Charles Brett of C3B Consulting and I recently had a quick dialogue about it and came up with different conclusions. That conversation is reproduced here. It’s only the beginning – l hope you will weigh in with your thoughts.
Over the past 2 weeks I have been traveling in California, in part to attend the High Performance Transaction Systems Workshop (an institution founded by the late Jim Gray who, alas, disappeared with his yacht off San Francisco some 3 years ago) and in part to visit a number of customer and vendor organizations. I am still musing on the implications and will be writing more. There were, however, two strong inputs that have impressed themselves on me.
The first is that the relational database (RDBMS) is a much more endangered species than I had thought. This is not to say that RDBMS will disappear, or that their revenues will diminish soon. But it is to say that future growth in the storage for processing of data is going to come from non-RDBMS storage solutions. Hadoop and the use of MapReduce techniques is a prime candidate for processing very large datasets (which may in turn feed conventional RDBMS). Specialized columnar databases – for example from Vertica – are acquiring their own following. In addition, for the huge datasets (measured in quintillions of bytes) that you find in ‘big’ science, Michael Stonebraker and others are working on SciDB.
I’ve been on the road too: at Oracle Open World, Teradata Partners, IBM IoD, and most recent Data Warehouse Institute (TDWI) event. (Quick blog post about that at http://mervadrian.wordpress.com/2009/10/29/a-tale-of-three-cities-and-oracle-teradata-and-ibm-databases/ ) And I couldn’t disagree more with you about the future of RDBMS – while it is going to be supplemented by products highly tuned for some specific use cases, its ongoing growth will continue to be healthy. RDBMS has done well in the downturn, with growth in license revenues alone measured in the $billions, and there is no evidence of a slowdown, although price pressure from open source alternatives will continue.
Merv’s view is that the RDBMS is safe. Financially, it probably is. But RDBMS will becoming a specialty, and will lose their dominant position. The database storage world is changing fast and there will be many alternatives, often far less expensive than conventional RDBMS, for customers to choose from, and covering faster as well as greater capabilities. Instead of a world where the traditional, transactional RDBMS has ruled supreme for 3 decades we will likely see the RDBMS become a tool for specific tasks, and no longer be deployed for generalized consumption.
In the near term, the value proposition of RDBMS remains sound – it’s a good general purpose engine for persisting data to capture transactions, to use it more than once for routine reporting and ad hoc analysis, etc. Use cases for more complex, high-performance, and special purpose applications have always existed, and in the past, such pretenders as object databases served them. They were zero billion dollar markets, and remained so, driving those vendors out of business.
While it’s likely to be different this time, alternatives don’t mean a market goes away. In fact, there are several RDBMS use cases to consider in this context: columnar stores for BI; streaming data, as Charles notes, scientific applications, and text analytics, which is getting its own play. Stonebraker is (surprise) active in most of them, as well in a new venture called VoltDB (some wags called it Horizontica) that will challenge the OLTP performance of “conventional” RDBMS.
New workloads mean new sales, and new revenue, and maybe some new vendors will survive the shakeout – or become part of big vendor portfolios, as many already have. It is precisely for “generalized consumption,” for mainstream workloads at moderate cost, with existing skills in place, that RDBMS will – excuse the pun – persist.
With what Merv says in the two paragraphs above, we concur. In part confirming this impression [that “the RDBMS will become a specialized tool”] have been the discussions with various vendors about analyzing events. In financial markets, huge numbers of events are generated via data feeds (about stock, currency and commodity prices and their movements). On the shop floor in manufacturing, the same happens: most modern automated industrial devices produce events about what is happening. In medicine, it is similar: ECG, EKGs and numerous other devices all supply data points. In logistics, location information (for example from GPS units installed in thousands of vehicles) is copious.
What these four instances, and there are many more, have in common is the need for analysis of the ‘event’ data – whether in real time, near real-time or soon after. Large internet business firms (Google, eBay, Yahoo and many others) generate vast amounts of information from users visiting their sites. This data is proving to be a highly valuable resource for prediction and for determining and then optimizing around what customers want.
A new class of event processing or complex event processing is emerging. Oracle, Starview Technology and Sybase are three companies which are focusing on not only capturing events but also deploying specialist analytical platforms to accelerate the delivery of information obtained from the events data. Sybase has its feet firmly in the financial sector where its Real-time Analytical Platform is a complete (and expensive) solution. Oracle has combined its own CEP approach to data with much of the infrastructure for CEP that it acquired from BEA – to create a highly efficient platform. Starview, by contrast, has an industrial background and so is concentrating on what you can find out from the machines on the shop floor. All three of these merit additional descriptions (but this is for subsequent discussions). Suffice it to say for now that each is distinctively different from each other and demonstrably superior to other competitors trying to occupy the same space.
I agree that CEP, or stream-based data processing, is an exciting new class of applications that will create new business for the ultimately successful suppliers. But the mainstream firms like Oracle, and, by the way, IBM, whose InfoSphere Streams I wrote about in http://mervadrian.wordpress.com/2009/05/22/infosphere-streams-is-a-game-changer/ , see these opportunities as likely to be connected to other databases for follow-on processing. These workloads will drive new RDBMS (or other new database engine) revenues as well as their own.
One great example case is clickstream data. The problem of “sessionization” is a complex, multipass one – interpret a set of clicks of variable length, changing direction (back to previous page, off in a different direction), different type (read/transact/cancel) – and classify the resulting sessions. Then do analytics to determine what kinds of users had what kinds of sessions. Although a good deal of work can be done as the data flows by, much of the value add is downstream, operating on persisted data – often in an RDBMS. And lower-end alternatives like SQLStream show promise and are taking their own approach to the issue.
What do the evolution of the data storage tools and Analytical Event Processing (AEP) have in common? They both need copious data as well as the means to analyze more than the streams of data that are generated by ticker feeds, GPS, machine tools, medical equipment or whatever. In different ways, in the analysis of C3B Consulting, both will be act as change agents for how data is used and stored. The old paradigm where the RDBMS (and before that the transactional database) reigned supreme is breaking down under the colossal volumes of data that are now being captured.
At HPTS, Wayne Duquaine in his presentation observed that there are c 6.7 billion people on the planet and there are already 25 Billion MCUs installed (plus 1B WiFi devices and 5B sensors). These are already generating data points. What is happening is that these can now be captured for processing to improve almost anything – from potentially predicting heart attacks a day early through to better utilization of power. According to Wayne, by adding sensors to many mechanical devices and analyzing the output, energy efficiency can rise from today’s 55% to more than 90%. For this to happen, however, AEP has to come out of the shadows and data stores will change.
We agree on this. Where we differ is on whether it implies that RDBMS is going to see a relative decay of importance (the revenue streams will probably continue, however). At Teradata Partners, I heard about important new opportunities customers are tackling with data volumes that were unheard of just a few years ago. And they, like IBM, Oracle, Netezza, Aster Data and others, are adding support for new tools like MapReduce into their portfolios for hybrid applications that use both approaches.
In addition, moving logic into the (existing) database is seen as a way of tackling other problems that have suffered because of IO bottlenecks and the resulting cost in latency as well as the redundancy of moving data to other platforms. Several vendors have added direct support for SAS inside the database as a counter to IBM’s expected similar efforts with SPSS. However, using more specialty data copies, with multiple different engines, can lead down a slippery slope resulting in chaotic, ungoverned data proliferation.
What we are watching is a refactoring of the software portfolio. Specialty engines have taken the place of the monolithic transaction monitor, offering specialized services like rules processing, master data management, application services, etc. “One layer back” in the architecture, the persistence layer is undergoing a similar transformation, evolving engines that support specialized needs. As a result, rather than RDBMS shrinking, we believe that with the addition of AEP, the whole pie will grow.
© 2009 C3B Consulting and IT Market Strategy. All rights reserved.