Strata Standards Stories: Different Stores For Different Chores

Has HDFS joined MapReduce in the emerging “legacy Hadoop project” category, continuing the swap-out of components that formerly answered the question “what is Hadoop?” Stores for data were certainly a focus at Strata/Hadoop World in NY, O’Reilly’s well-run, well-attended, and always impactful fall event. The limitations of HDFS, including its append-only nature, have become inconvenient enough to push the community to “invent” something DBMS vendors like Oracle did decades ago: a bypass. After some pre-event leaks about its arrival, Cloudera chose its Strata keynote to announce Kudu, a new columnstore written in C++, bypassing HDFS entirely. Kudu will use an Apache license and will be submitted to the Apache process at some undetermined future time.


Handicapping Hadoop Helpers – Unsupported Projects

The Apache Software Foundation has over 350 projects underway, and others are being developed in the open source community for use with (and without)  the Hadoop stack. In recent posts Now, What is Hadoop? And What’s Supported? and Hadoop Projects Supported by Only One Distribution, I mapped those supported by commercial distributors. But there’s another group – a dozen (so far) not supported by any of them.


Hadoop Projects Supported By Only One Distribution

The Apache Software Foundation has succeeded admirably in becoming a place where new software ideas are developed: today over 350 projects are underway. The challenges for the Hadoop user are twofold: trying to decide which projects might be useful in big data-related cases, and determining which are supported by commercial distributors. In Now, What is Hadoop? And What’s Supported? I list 10 supported by only one: Atlas, Calcite, Crunch, Drill, Falcon, Kite, LLAMA, Lucene, Phoenix and Presto. Let’s look at them a little more.


Now, What is Hadoop?

This perennial question resurfaced recently in a thoughtful blog post by Andreas Neumann, Chief Architect of Cask, called What is Hadoop, anyway?. Ultimately, after a careful deconstruction of the terms in the question, Andreas concludes with

“Does it really matter to agree on the answer to that question? In the end, everybody who builds an application or solution on Hadoop must pick the technologies that are right for the use case.”

We’ve agreed from the beginning – that is the only answer that really matters. Still, the question continues to come up for  end users of the stack and for vendors like Cask (it helps them think about what to support in their application development offering Cask Data App Platform (CDAP).

Analysts too: I’ve discussed it several times, including a post a year ago called What Is Hadoop….Now? tracking the path from 6 commonly supported projects in 2012 to 15 in June 2014, across a set of distributors that included Cloudera, Hortonworks, MapR and IBM. “Support” here means you pay for subscription that explicitly includes the named project.

This year, the expansion process has continued – and it does matter.

–more on Gartner blog–



Perspectives on Hadoop Part Two: Pausing Plans

By Merv Adrian and Nick Heudecker 

In the first post in this series , I looked at the size of revenue streams for RDBMS software and maintenance/support and noted that they amount to $33B, pointing out that pure play Hadoop vendors had a high hill to climb. (I didn’t say so specifically, but in 2014, Gartner estimates that the three leading vendors generated less than $150M.)

In this post, Nick and I turn from Procurement to Plans and examine the buying intentions uncovered in Gartner surveys.


–more in Gartner blog–

Perspectives on Hadoop: Procurement, Plans, and Positioning

I have the privilege of working for the world’s leading information technology research and advisory company, covering information management with a strong focus for the past few years on an emerging software stack called Hadoop. In the early part of 2015, that particular technology is moving from early adopter status to early majority in its marketplace adoption. The discussions and published work around it have been exciting and controversial, so in this post (and a couple to follow) I describe three interlocking research perspectives on Hadoop: procurement (counting real money actually spent); plans (surveys of intentions to invest) and positioning (subjective interpretations of what the first two mean.)

Procurement Perspective: Hadoop is a (Very) Small Market Today

–more on Gartner blog–



Hadoop Questions from Recent Webinar Span Spectrum

This is a joint post authored with Nick Heudecker
There were many questions asked after the last quarterly Hadoop webinar, and Nick and I have picked a few that were asked several times to respond to here.

–More on my Gartner blog

Which SQL on Hadoop? Poll Still Says “Whatever” But DBMS Providers Gain

Since Nick Heudecker and I began our quarterly Hadoop webinars, we have asked our audiences what they expected to do about SQL several times, first in January 2014. With 164 respondents in that survey, 32% said “we’ll use what our existing BI tool provider gives us,” reflecting the fact that most adopters seem not to want to concern themselves overmuch with the details.

–More on my Gartner blog

Who Asked for an Open Data Platform?

This is a joint blog post between Nick Heudecker and Merv Adrian.

It’s Strata week here in San Jose, and with that comes a flood of new announcements on products, partners and funding. Today’s big announcement came in the form of the Open Data Platform (ODP). A number of companies have signed on, but in short, it’s got some Hadoopers, some service providers and systems integrators, as well as some analytics apps vendors.

–more on my Gartner blog

Hadoop Adoption? Moving, But Not Necessarily Forward

Gartner’s quarterly Hadoop webinar in February 2015 showed that adoption of Hadoop is not rising quite as dramatically as some might believe. It’s flat compared to Q42014. Of nearly 1200 attendees, 465 shared their thinking with us via the usual polling, and the Deployed percentage was the same. Not that surprising for only 3 months between polls. And Q1 is not a big month for most software, especially a category that is at best generating a few hundred million dollars in revenues.

–more on my Gartner blog


Get every new post delivered to your Inbox.

Join 21,684 other followers