This post was jointly authored by Merv Adrian (@merv) and Nick Heudecker (@nheudecker) and appears on both of our Gartner blogs. In the early days of Hadoop (versions up through 1.x), the project consisted of two primary components: HDFS and MapReduce. One thing to store the data in an append-only file model, distributed across anContinue reading “Hadoop is in the Mind of the Beholder”
Tag Archives: MapReduce
BYOH – Hadoop’s a Platform. Get Used To It.
When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us. Specific support for components such as PigContinue reading “BYOH – Hadoop’s a Platform. Get Used To It.”
What, Exactly, Is “Proprietary Hadoop”? Proposed: “distribution-specific.”
Many things have changed in the software industry in an era when the use of open source software has pervaded the mainstream IT shop. One of them is the significance – and descriptive adequacy – of the word “proprietary.” Merriam-Webster defines it as “something that is used, produced, or marketed under exclusive legal right of theContinue reading “What, Exactly, Is “Proprietary Hadoop”? Proposed: “distribution-specific.””
Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL
Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings todayContinue reading “Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL”
Hadoop Summit Recap Part One – A Ripping YARN
I had the privilege of keynoting this year’s Hadoop Summit, so I may be a bit prejudiced when I say the event confirmed my assertion that we have arrived at a turning point in Hadoop’s maturation. The large number of attendees (2500, a big increase – and more “suits”) and sponsors (70, also a significant uptick) madeContinue reading “Hadoop Summit Recap Part One – A Ripping YARN”
Hadoop 2013 – Part Four: Players
The first three posts in this series talked about performance, projects and platforms as key themes in what is beginning to feel like a watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco,Continue reading “Hadoop 2013 – Part Four: Players”
Hadoop 2013 – Part Three: Platforms
In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – thatContinue reading “Hadoop 2013 – Part Three: Platforms”
Hadoop 2013 – Part One: Performance
It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see asContinue reading “Hadoop 2013 – Part One: Performance”
Stack Up Hadoop to Find Its Place in Your Architecture
2013 promises to be a banner year for Apache Hadoop, platform providers, related technologies – and analysts who try to sort it out. I’ve been wrestling with ways to make sense of it for Gartner clients bewildered by a new set of choices, and for them and myself, I’ve built a stack diagram that describesContinue reading “Stack Up Hadoop to Find Its Place in Your Architecture”
Amazon Redshift Disrupts DW Economics – But Nothing Comes Without Costs
At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’ steady inroads toward enterprise, mainstream application acceptance very visible. Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running Paraccel’s high-performance columnar, compressedContinue reading “Amazon Redshift Disrupts DW Economics – But Nothing Comes Without Costs”