Strata Standards Stories: Different Stores For Different Chores

Has HDFS joined MapReduce in the emerging “legacy Hadoop project” category, continuing the swap-out of components that formerly answered the question “what is Hadoop?” Stores for data were certainly a focus at Strata/Hadoop World in NY, O’Reilly’s well-run, well-attended, and always impactful fall event. The limitations of HDFS, including its append-only nature, have become inconvenient enough to push the community to “invent” something DBMS vendors like Oracle did decades ago: a bypass. After some pre-event leaks about its arrival, Cloudera chose its Strata keynote to announce Kudu, a new columnstore written in C++, bypassing HDFS entirely. Kudu will use an Apache license and will be submitted to the Apache process at some undetermined future time.


Hadoop Projects Supported By Only One Distribution

The Apache Software Foundation has succeeded admirably in becoming a place where new software ideas are developed: today over 350 projects are underway. The challenges for the Hadoop user are twofold: trying to decide which projects might be useful in big data-related cases, and determining which are supported by commercial distributors. In Now, What is Hadoop? And What’s Supported? I list 10 supported by only one: Atlas, Calcite, Crunch, Drill, Falcon, Kite, LLAMA, Lucene, Phoenix and Presto. Let’s look at them a little more.


Now, What is Hadoop?

This perennial question resurfaced recently in a thoughtful blog post by Andreas Neumann, Chief Architect of Cask, called What is Hadoop, anyway?. Ultimately, after a careful deconstruction of the terms in the question, Andreas concludes with

“Does it really matter to agree on the answer to that question? In the end, everybody who builds an application or solution on Hadoop must pick the technologies that are right for the use case.”

We’ve agreed from the beginning – that is the only answer that really matters. Still, the question continues to come up for  end users of the stack and for vendors like Cask (it helps them think about what to support in their application development offering Cask Data App Platform (CDAP).

Analysts too: I’ve discussed it several times, including a post a year ago called What Is Hadoop….Now? tracking the path from 6 commonly supported projects in 2012 to 15 in June 2014, across a set of distributors that included Cloudera, Hortonworks, MapR and IBM. “Support” here means you pay for subscription that explicitly includes the named project.

This year, the expansion process has continued – and it does matter.

–more on Gartner blog–



Hortonworks IPO – Why Now?

Last week, many observers were surprised when Hortonworks’ S1 for an initial public offering (IPO) was filed. And there are good reasons to be surprised. Why now? CEO Rob Bearden told VentureWire not long ago that he expected to exit 2014 “at a strong $100 million run rate” in preparation for a 2015 IPO. What changed? Perhaps one answer to that question might be answered by asking another question: for whom?

— for more, see my Gartner blog post

Strata Spark Tsunami – Hadoop World, Part One

New York’s Javits Center is a cavernous triumph of form over function. Giant empty spaces were everywhere at this year’s empty-though-sold-out Strata/Hadoop World, but the strangely-numbered, hard to find, typically inadequately-sized rooms were packed. Some redesign will be needed next year, because the event was huge in impact and demand will only grow. A few of those big tent pavilions you see at Oracle Open World or Dreamforce would drop into the giant halls without a trace – I’d expect to see some next year to make some usable space available.

So much happened, I’ll post a couple of pieces here. Last year’s news was all about promises: Hadoop 2.0 brought the promise of YARN enabling new kinds of processing, and there was promise in the multiple emerging SQL-on-HDFS plays. The Hadoop community was clearly ready to crown a new hype king for 2014.

This year, all that noise had jumped the Spark.

— This post is continued on my Gartner blog —

What Is Hadoop….Now?

In February 2012, Gartner published How to Choose The Right Apache Hadoop Distribution (available to clients). At the time, the leading distributors were Cloudera, EMC (now Pivotal), Hortonworks (pre-GA), IBM, and MapR. These players all supported six Apache projects: HDFS, MapReduce, Pig, Hive, HBase, and Zookeeper. Things have changed.


Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.


Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

— more —

Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.


Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post


Get every new post delivered to your Inbox.

Join 21,542 other followers