What, Exactly, Is “Proprietary Hadoop”? Proposed: “distribution-specific.”

Many things have changed in the software industry in an era when the use of open source software has pervaded the mainstream IT shop. One of them is the significance – and descriptive adequacy – of the word “proprietary.” Merriam-Webster defines it as “something that is used, produced, or marketed under exclusive legal right of the inventor or maker.” In the Hadoop marketplace, it has come to be used – even by me, I must admit – to mean “not Apache, even though it’s open source.”

—more–

Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

That Exciting New Stuff? Yeah… Wait Till It Ships.

A brief rant here: I am asked with great frequency how this RDBMS will hold off that big data play, how data warehouses will survive in a world where Hadoop exists, or whether Apple is done now that Android is doing well. There is a fundamental fallacy implicit in these questions.

–more–

Hadoop Summit Recap Part One – A Ripping YARN

I had the privilege of keynoting this year’s Hadoop Summit, so I may be a bit prejudiced when I say the event confirmed my assertion that we have arrived at a turning point in Hadoop’s maturation. The large number of attendees (2500, a big increase – and more “suits”) and sponsors (70, also a significant uptick) made it clear that the growth is continuing apace. Gartner’s data confirms this – my inquiry rate continues to grow, and my colleagues covering big data and Hadoop are all seeing steady growth too. But it’s not all sweetness and light. There are issues – and here we’ll look at the centerpeice of the technical messaging: YARN. Much is expected – and we seem to be doomed to wait a while longer.

— more — 

Open Source “Purity,” Hadoop, and Market Realities

I don’t often do a pure opinion piece but I feel compelled to weigh in on a queston I’ve been asked several times since EMC released its Pivotal HD recently. The question is whether it is somehow inappropriate, even “evil,” for EMC to enter the market without having “enough” committers to open source Apache projects. More broadly, it’s about whether other people can use, incorporate, add to and profit from Apache Hadoop.

–more–

Hadoop 2013 – Part Four: Players

The first three posts in this series talked about performance projects and platforms as key themes in what is beginning to feel like a  watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco, and Data Delivery Networks recently entered the distribution game, making it clear that capitalizing on potential differentiators (real or perceived)  in a hot market is still a powerful magnet. And in a space where much of the IP in the stack is open source, why not go for it? These introductions could all fall into the performance theme as well – they are all driven by innovations intended to improve Hadoop speed.

– more — 

Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

– more –

Hadoop 2013 – Part Two: Projects

In Part One of this series, I pointed out that how significant attention is being lavished on performance in 2013. In this installment, the topic is projects, which are proliferating precipitously. One of my most frequent client inquiries is “which of these pieces make Hadoop?” As recently as a year ago, the question was pretty simple for most people: MapReduce, HDFS, maybe Sqoop and even Flume, Hive, Pig, HBase, Lucene/Solr, Oozie, Zookeeper. When I published the Gartner piece How to Choose the Right Apache Hadoop Distribution, that was pretty much it.

–more–

Hadoop 2013 – Part One: Performance

It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.

– more –

Hadoop and DI – A Platform Is Not A Solution

“Hadoop people” and “RDBMS people” – including some DBAs who have contacted me recently –  clearly have different ideas about what Data Integration is. And both may  differ from what Ted Friedman and I were talking about in our Gartner research note Hadoop Is Not a Data Integration Solution , although I think the DBAs’ concept is far closer to ours.

- more -

Follow

Get every new post delivered to your Inbox.

Join 140 other followers