Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?

In early January 2012, the world of big data was treated to an interesting series of product releases, press announcements, and blog posts about Hadoop versions.  To begin with, we had the announcement of Apache version 1.0 at long last, in a press release. Although there were grumblings here and there in the twittersphere that changes to release numbers are meaningless, my discussions with Gartner’s enterprise customers indicate otherwise. Products with release numbers like 0.20.2 make the hair on Procurement’s neck stand on end, and as Hadoop begins to get mainstream attention (Gartner’s clients, see Hype Cycle for Data Management 2011), IT architects and executives find such optics quite important. Hadoop is moving beyond pioneers like Amazon, Yahoo! and LinkedIn into shops like JP Morgan Chase, and they pay attention to such things.

…more…

Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.

more

Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post

Cloudera Convenes Colleagues to Crunch Content (Make Mine Membase)

Over the past two years, Cloudera has demonstrated the power of surrounding emerging open source software with support services, expertise and its own IP. The firm has  racked up over 30 customers since its founding in late 2008, and emerged as the leading source of Apache Hadoop. Cloudera’s recent C round of financing brought its funding to $36 million, and it has been investing aggressively, with 45 employees, a very visible voice on the Big Data circuit and a stellar, experienced leadership team. It evangelizes through training, thought leadership, and increasingly through a growing sales and marketing team. Cloudera deserves a full post of its own; I hope to get to that before yearend.

One indicator of Cloudera’s precocity has been its prioritization of key alliances – higher than many firms its size – and that strategy is likely to have a big payoff if the partnerships are well executed and bring the marketplace momentum and the value they promise to fruition. Two key recent announcements involved Membase and Informatica. I’ll discuss the latter in another post – here I’ll talk about why the Membase deal makes so much sense. Read more of this post