Analytic Event Processing (AEP) is hot. But does it mean RDBMS begins to decline in importance? Charles Brett of C3B Consulting and I recently had a quick dialogue about it and came up with different conclusions. That conversation is reproduced here. It’s only the beginning – l hope you will weigh in with your thoughts.
Over the past 2 weeks I have been traveling in California, in part to attend the High Performance Transaction Systems Workshop (an institution founded by the late Jim Gray who, alas, disappeared with his yacht off San Francisco some 3 years ago) and in part to visit a number of customer and vendor organizations. I am still musing on the implications and will be writing more. There were, however, two strong inputs that have impressed themselves on me.
The first is that the relational database (RDBMS) is a much more endangered species than I had thought. This is not to say that RDBMS will disappear, or that their revenues will diminish soon. But it is to say that future growth in the storage for processing of data is going to come from non-RDBMS storage solutions. Hadoop and the use of MapReduce techniques is a prime candidate for processing very large datasets (which may in turn feed conventional RDBMS). Specialized columnar databases – for example from Vertica – are acquiring their own following. In addition, for the huge datasets (measured in quintillions of bytes) that you find in ‘big’ science, Michael Stonebraker and others are working on SciDB.
I’ve been on the road too: at Oracle Open World, Teradata Partners, IBM IoD, and most recent Data Warehouse Institute (TDWI) event. (Quick blog post about that at http://mervadrian.wordpress.com/2009/10/29/a-tale-of-three-cities-and-oracle-teradata-and-ibm-databases/ ) And I couldn’t disagree more with you about the future of RDBMS – while it is going to be supplemented by products highly tuned for some specific use cases, its ongoing growth will continue to be healthy. RDBMS has done well in the downturn, with growth in license revenues alone measured in the $billions, and there is no evidence of a slowdown, although price pressure from open source alternatives will continue.
Merv’s view is that the RDBMS is safe. Financially, it probably is. But RDBMS will becoming a specialty, and will lose their dominant position. The database storage world is changing fast and there will be many alternatives, often far less expensive than conventional RDBMS, for customers to choose from, and covering faster as well as greater capabilities. Instead of a world where the traditional, transactional RDBMS has ruled supreme for 3 decades we will likely see the RDBMS become a tool for specific tasks, and no longer be deployed for generalized consumption.
In the near term, the value proposition of RDBMS remains sound – it’s a good general purpose engine for persisting data to capture transactions, to use it more than once for routine reporting and ad hoc analysis, etc. Use cases for more complex, high-performance, and special purpose applications have always existed, and in the past, such pretenders as object databases served them. They were zero billion dollar markets, and remained so, driving those vendors out of business.
While it’s likely to be different this time, alternatives don’t mean a market goes away. In fact, there are several RDBMS use cases to consider in this context: columnar stores for BI; streaming data, as Charles notes, scientific applications, and text analytics, which is getting its own play. Stonebraker is (surprise) active in most of them, as well in a new venture called VoltDB (some wags called it Horizontica) that will challenge the OLTP performance of “conventional” RDBMS.
New workloads mean new sales, and new revenue, and maybe some new vendors will survive the shakeout – or become part of big vendor portfolios, as many already have. It is precisely for “generalized consumption,” for mainstream workloads at moderate cost, with existing skills in place, that RDBMS will – excuse the pun – persist.
With what Merv says in the two paragraphs above, we concur. In part confirming this impression [that “the RDBMS will become a specialized tool”] have been the discussions with various vendors about analyzing events. In financial markets, huge numbers of events are generated via data feeds (about stock, currency and commodity prices and their movements). On the shop floor in manufacturing, the same happens: most modern automated industrial devices produce events about what is happening. In medicine, it is similar: ECG, EKGs and numerous other devices all supply data points. In logistics, location information (for example from GPS units installed in thousands of vehicles) is copious.
What these four instances, and there are many more, have in common is the need for analysis of the ‘event’ data – whether in real time, near real-time or soon after. Large internet business firms (Google, eBay, Yahoo and many others) generate vast amounts of information from users visiting their sites. This data is proving to be a highly valuable resource for prediction and for determining and then optimizing around what customers want.
A new class of event processing or complex event processing is emerging. Oracle, Starview Technology and Sybase are three companies which are focusing on not only capturing events but also deploying specialist analytical platforms to accelerate the delivery of information obtained from the events data. Sybase has its feet firmly in the financial sector where its Real-time Analytical Platform is a complete (and expensive) solution. Oracle has combined its own CEP approach to data with much of the infrastructure for CEP that it acquired from BEA – to create a highly efficient platform. Starview, by contrast, has an industrial background and so is concentrating on what you can find out from the machines on the shop floor. All three of these merit additional descriptions (but this is for subsequent discussions). Suffice it to say for now that each is distinctively different from each other and demonstrably superior to other competitors trying to occupy the same space.
I agree that CEP, or stream-based data processing, is an exciting new class of applications that will create new business for the ultimately successful suppliers. But the mainstream firms like Oracle, and, by the way, IBM, whose InfoSphere Streams I wrote about in http://mervadrian.wordpress.com/2009/05/22/infosphere-streams-is-a-game-changer/ , see these opportunities as likely to be connected to other databases for follow-on processing. These workloads will drive new RDBMS (or other new database engine) revenues as well as their own.
One great example case is clickstream data. The problem of “sessionization” is a complex, multipass one – interpret a set of clicks of variable length, changing direction (back to previous page, off in a different direction), different type (read/transact/cancel) – and classify the resulting sessions. Then do analytics to determine what kinds of users had what kinds of sessions. Although a good deal of work can be done as the data flows by, much of the value add is downstream, operating on persisted data – often in an RDBMS. And lower-end alternatives like SQLStream show promise and are taking their own approach to the issue.
What do the evolution of the data storage tools and Analytical Event Processing (AEP) have in common? They both need copious data as well as the means to analyze more than the streams of data that are generated by ticker feeds, GPS, machine tools, medical equipment or whatever. In different ways, in the analysis of C3B Consulting, both will be act as change agents for how data is used and stored. The old paradigm where the RDBMS (and before that the transactional database) reigned supreme is breaking down under the colossal volumes of data that are now being captured.
At HPTS, Wayne Duquaine in his presentation observed that there are c 6.7 billion people on the planet and there are already 25 Billion MCUs installed (plus 1B WiFi devices and 5B sensors). These are already generating data points. What is happening is that these can now be captured for processing to improve almost anything – from potentially predicting heart attacks a day early through to better utilization of power. According to Wayne, by adding sensors to many mechanical devices and analyzing the output, energy efficiency can rise from today’s 55% to more than 90%. For this to happen, however, AEP has to come out of the shadows and data stores will change.
We agree on this. Where we differ is on whether it implies that RDBMS is going to see a relative decay of importance (the revenue streams will probably continue, however). At Teradata Partners, I heard about important new opportunities customers are tackling with data volumes that were unheard of just a few years ago. And they, like IBM, Oracle, Netezza, Aster Data and others, are adding support for new tools like MapReduce into their portfolios for hybrid applications that use both approaches.
In addition, moving logic into the (existing) database is seen as a way of tackling other problems that have suffered because of IO bottlenecks and the resulting cost in latency as well as the redundancy of moving data to other platforms. Several vendors have added direct support for SAS inside the database as a counter to IBM’s expected similar efforts with SPSS. However, using more specialty data copies, with multiple different engines, can lead down a slippery slope resulting in chaotic, ungoverned data proliferation.
What we are watching is a refactoring of the software portfolio. Specialty engines have taken the place of the monolithic transaction monitor, offering specialized services like rules processing, master data management, application services, etc. “One layer back” in the architecture, the persistence layer is undergoing a similar transformation, evolving engines that support specialized needs. As a result, rather than RDBMS shrinking, we believe that with the addition of AEP, the whole pie will grow.
© 2009 C3B Consulting and IT Market Strategy. All rights reserved.
23 thoughts on “Will AEP Replace RDBMS? A Dialogue With Charles Brett”
A very interesting dialogue Merv, I like the format.
Analytic Event Processing is a new term for me, though not a new concept. Do you see this emerging as a category on its own? Where do the existing Event Processing vendors like StreamBase fit in?
In my experience with StreamBase, most applications of event processing have a significant analytic component. Algorithmic trading, for example, is all about analyzing the market (often in very complex ways) and immediately reacting to the results of analysis. I think you’ll find that, as usual, financial applications are at the leading edge of any trend around real time data analysis. Market Data Management and Liquidity Detection are all about sophisticated analysis. Outside of capital markets, clickstream and monitoring applications actually use less sophisticated (though still quiet interesting) analytics.
Real time analysis is only one part of these applications though. Applications that focus on analytics still require connectivity to data sources, rapid development tools, business rules, performance, scalability, reliability, visualization of results, and all the other capabilities of StreamBase.
So I guess I’ll come back to the original question. Is Analytical Event Processing really different, or just one aspect of what customers should expect from any Event Processing Platform?
Delighted to hear from one of the pioneers, Richard. Streambase was out there ahead of most, and you have clearly defined the early market. The term is Charles’ and I find it very useful as a way to thimk about the use cases. Some streams, it seems to me, will have more immediacy than others in control applications connected to process engines of one sort or another, while others will clearly be devoted to analysis either for decisioning in real or near-real time or for longer term analytics in the “classic BI” sense.
So is this a separate market yet? My opinion is that we’re a bit ahead of that, but can see it coming. That said, I need to prompt Charles to jump in. Hopefully, we’ll see his reply soon too.
I have been talking about the concept of event analytics for some time. These types of analytics are produced by analyzing events as they stream from sensors and across networks and as they flow through and between business processes. These analytics are produced by analyzing moving data unlike traditional DW-based data analytics which analyze at rest data.
There are several different approaches to creating event analytics. When analyzing events flowing through a single business process, the BAM component of an application server may be used. These analytics report what is happening right now. If we want to analyze and correlate several different event streams and predict possible outcomes then a CEP-based approach can be used. I sense the discussion above on AEP is focused more on this approach. It is important to point out that CEP engines are used for tasks other than helping produce analytics. Financial and web marketing applications are currently the main users of these two latter approaches.
For the majority of applications that don’t need real-time analytics, events can captured into a data store for subsequent creation of data analytics. Capturing and analyzing CDRs is an example here. This data store can be an event store or a data warehouse. The problem with capturing straight into a data warehouse is increasing event data volumes. Think of web logs. Events have to be filtered before they can be moved into a data warehouse. This filtering can be done using an event store as a data source (MapReduce is good for this) or dynamically by BAM/CEP engines.
Most people that employ event analytics want to use then in conjunction with other data. Web event analytics are not sufficient for web marketing by themselves. It is important to relate these analytics to customer profile data and possibly customer data analytics.
Sorry, but it is ridiculous to suggest AEP will replace relational DBMSs. There are some exciting new technologies that extend existing approaches, AEP is one, MapReduce is another. But we need evolution not revolution. Just my 2 cents. Colin.
Thanks, Colin – it’s great to have you here. And thanks for the time to add all that content about use cases. This is an exciting new adjacent space, I agree.
Interesting “stream” of thought, lots of issues to dissect here. But I’ll just attack a couple.
The SQL database is just as likely to die as the mainframe. We all know the mainframe is “dead” but yet it continues to attract more workloads. Too entrenched to dislodge. Ditto with SQL, though ironically evidence of its continued life comes from the low end: popularity of the LAMP stack and emergence of RESTful services proves that 80% or whatever percent (it will be large) of the workloads are still based around access to structured data.
Similarly in the analytic space, just because there are new forward-facing, real-time predictive analytics doesn’t mean that the need for for SQL-based historical analytics goes away. All too often in business, politics and life we fail to learn lessons from history.
That doesn’t minimize impact of advanced real-time or predictive analytics. Just the opposite, it complements and reinforces the need for historical. All that said, I’m quite jazzed about two trends in the CEP space that have incredible potential for BI/analytics. The first is commoditization at the low end, as Msft does to CEP what it did with OLAP; any SMB with a website will have use for commodity CEP. Secondly, at the other end is what IBM, Tibco, Progress, and now Oracle are saying, which is the need for an array of analytic techniques for parsing incredibly huge torrents of real time data coming from people, things, devices, webclicks, you name it. It will both leverage SQL and the rules-based stream analyses around which modern CEP emerged.
CEP and BI, or AEP or whatever you call it, represents a pretty exciting new frontier for BI. It’s an area we’re busily researching at Ovum, and you’ll see a major report on it in Q1 next year.
Thanks, Tony. And great to have you here too. I’m increasingly convinced that there is an in-stream and post-stream way of thinking about this. In-stream processes react and classify, and post-stream processes depend on the persisting of the results of that analysis. Simply operating on “raw” stream data that is dumped into some file somewhere is the simplest case, and won’t drive the real value we’ll see going forward. Creation of classification metadata and other derived information is a key in-stream process, and some analytics will happen then. And while we don’t change the stream data itself later – as Colin White likes to point out it is not changed once stored – we will enhance it post-stream by storing other things “next to” it as various analytic processes leave their own footprints.
I wish I could claim that the term ‘AEP’ was mine but that would be incorrect and unfair. I have heard it used loosely and specifically by both Sybase and by Starview Technology. Like Merv, I think it is a term which rings descriptive positive bells.
I would, and will comment more. However, I am at EWR about to board (I hope) a flight to London. More later (but it may be a day or two …)
At Starview, we talk a lot about Analytical Event processing (AEP) as opposed to CEP when we’re talking about a class of stream analytic problems that don’t fall into the general category of “discrimination / decision networks” but still need to operate in a real-time, stream-driven fashion.
Bayesian Belief networks, Principle Component Analysis (PCA), Schedule Optimization and other space-searching algorithms are all examples that fall into what we call Analytical Event Processing. In order to do their work, they need to operate within the context of a realtime event stream environment. This means they also, as Richard mentioned earlier, still require data source inputs, output adapters, management and all the other necessities of life of any stream processing application.
While somewhat esoteric from a conventional enterprise software point of view, these algorithms have realtime application in mission-critical use cases as diverse as fraud detection, factory automation, dynamic powergrid balancing and root-cause analysis.
One of the most enjoyable things about having a blog that people read is that I learn things I didn’t know. Thanks, Mack, for a good reminder about these use cases. Look forward to talking with you about them.
Good discussion; I’ll add a couple of points.
1. We must separate disk-based data management from in-memory data management. Most of the CEP and event-stream analysis apps we’re seeing (at Forrester Research) occur in memory. To Charles’ point, these apps don’t use RDBMS, even in-memory RDBMS. So there’s got to be some impact on the demand for RDBMS as these apps take off.
2. We must separate the OLTP/query scenarios that are RDBMS’ sweet spot from the event-processing, classification, search, etc. scenarios that are finding a home in CEP, Hadoop, etc. Event stream processing involves query and so is a gray area. But in general, the two groups of technologies you discuss are used for different apps. RDBMS growth is not at issue if the products expand to embrace the “new” scenarios. If not, growth will be an issue.
3. In-memory architectures challenge today’s roles. DBAs play less of a role in the in-memory architectures than they do with disk-based processing. Developers and architects have more control.
Hope this helps.
John, it’s great to have you here, and your points are well made (no surprise.) As regards your first point I believe ther downstream use of the orignial stream data or of data derived from it will often drive new RDBMS instances for further work, and different work.
I believe many apps will be used against the streams, some in real time, and some later, even if they are persisted in stores other than RDBMS at first, and when they are used in those other apps, the power of in-database analytics, especially on new MPP machines and memory-based systems, will be the right vehicle for many of them.
Keep ’em coming!
Pardon my being late for this great discussion. Obviously, I have a problem with my real-time detection and processing.
I’m with Tony on the likelihood of the RDBMS going away. It’s too entrenched. However, its no longer the only game in town, as everyone here has acknowledged.
What caught my attention — similar to Richard Tibbets — was the term “Analytic Event Processing”. I think we only do the industry (read customers) a disservice by creating new product categories for use cases.
I suggest sticking to the umbrella term (category) of “Event Processing”, and then drop into scenarios and solutions, such as Event Processing in Operations, Event Processing for Analytics, Complex Event Processing and such.
The more clarity we can provide, the sooner customers can attain value.
My 2 cents,
Very interesting point – AEP was not first coined by Charles, but he’s certainly an advocate of it as a meaningful term and I am supportive. Still, as you suggest, label proliferation is an issue. Merits further thought.. It’s great to have you here.
We blogged about what do we mean by AEP at http://starviewtech.wordpress.com/ .
Senior Director, Product Management
Starview Technology, Inc
Thanks, Debu, for jumping in here. It’s a worthwhile conversation and it’s good to have your point of view directly available.
I agree with Brenda why keep reinventing the wheel and creating new terms. We want to process operational events in motion in the same fashion as operational transactions. In fact many events generate operational transactions. There is nothing new here. Event processing is the right term for this.
Often we want to also analyze events. To date we filter, transform and consolidate events in a data store (e.g. CDR data store) before analyzing them. Here we are analyzing data at rest. For more realtime work we may want to analyze (and possibly correlate) the events in motion, display the results in a UI and then store the analytical results in a data store so that can be compared with other analytics. This is a analyze and store model for creating event analytics (as opposed to the traditional store and analyze model for creating data analytics).
The bottom line is all we need are the terms event processing and event analytics. There are a variety of different way of supporting these two tasks. All have confusing marketing buzzwords (BAM, CEP, AEP, etc.) All these terms do is confuse people. They are basically marketing ploys for making something look new that isn’t.
Well, better late than never. Great debate going on here. I agree with Colin and Brenda. Let’s keep the terminology simple and clear.
One thing not yet raised here is that in order to analyse events in motion I believe that we need more than the data in the event message. For this reason I believe event-driven data integration is key and many DI vendors can support this today. But there is a problem here and that is that invoking event-driven data integration thousands of times a second is not going to scale. Hence the need for in-memory data that can be integrated at memory speed (no need for I/O. There is not doubt also that columnar data integration will be adopted to minimise the cost even further. Even better is if that data had been pre-integrated and held in-memory or even if in-memory data could be integrated in parallel (relevant columns only) entirely in memory when an event or instance of an event correlation is detected. In my opinion therefore we will see in-memory data fabrics appearing, to make this possible plus columnar in-memory event drive data integration. I know of four emerging already and I am writing an article about this. Event processing needs access to in-memory data to speed data integration (if integration is needed) in my opinion.
The SAS strategy to push models into DBMSs assumes the data is all in the DBMS. Not necessarily true. Colin referred to the need to analyse and then store. How does having models in the database help here?? I don’t see how it helps unless a DBMS structure can overlay data in motion to allow event driven analytics from within the DBMS to kick in.
I see the answer in Agent technology. Deployed look-outs all over the enterprise on top of an in-memory data fabric that sucks data in from the DBMS and can be used by event processing agents that exploit columnar event-driven data integration. Is relational going away? Not a chance. Not a chance. RDBMS vendors consumed object-oriented DB, consumed XML DB and are likely to add columnar as another string to their bow. I agree with Colin wholeheartedly on this. RDBMS vendors and other 3rd party vendors will fuel the in-memory caching space. It is already happening.
Just my 2 cents. Mike
P.S. Great debate Merv / Charles.
Mike, we heard some discussion of that very topic from the InfoSphere Streams team here at the IBM Connect event this week; they discussed how their system distributes work in a fabric of processing nodes, not unklike the model you allude to. More to come in coverage as I digest it all. But I could not agree more that the value of the information in the incoming data in motion will often not be enough and that look-aside will be needed, as well as scoring and tagging for subsequent processing when some data is persisted – or temporarily persisted, perhaps in an in-memory database that can be joined to. That model is not unlike how scoring models are done in data mining and statistical applications, sometimes in conjunction with MapReduce, as we’ve seen in some of the recent announcments from SAS with a variety of database partners in the last few weeks.
Merv and I are going to write a follow up dialog – probably to appear in about 10-12 days. One of the issues that is being raised (or hijacked) is the relationship between BI and ‘AEP’ and what we think this may mean. I am starting to put some ideas down on this on my blog (see http://www.charlesbrett.wordpress.com). More anon.