Hadoop is in the Mind of the Beholder

This post was jointly authored by Merv Adrian (@merv) and Nick Heudecker (@nheudecker) and appears on both of our Gartner blogs.

In the early days of Hadoop (versions up through 1.x), the project consisted of two primary components: HDFS and MapReduce. One thing to store the data in an append-only file model, distributed across an arbitrarily large number of inexpensive nodes with disk and processing power; another to process it, in batch, with a relatively small number of available function calls. And some other stuff called Commons to handle bits of the plumbing. But early adopters demanded more functionality, so the Hadoop footprint grew. The result was an identity crisis that grows progressively more challenging for decisionmakers with almost every new announcement.


Aspirational Marketing and Enterprise Data Hubs

In the Hadoop community there is a great deal of talk of late about its positioning as an Enterprise Data Hub. My description of this is “aspirational marketing;” it addresses the ambition its advocates have for how Hadoop will be used, when it realizes the vision of capabilities currently in early development. There’s nothing wrong with this, but it does need to be kept in perspective. It’s a long way off.


AAA is Not Enough Security in the Big Data Era

Talk to security folks, especially network ones, and AAA will likely come up. It stands for authentication, authorization and accounting (sometimes audit). There are even protocols such as Radius (Remote Authentication Dial In User Service, much evolved from its first uses) and Diameter, its significantly expanded (and punnily named) newer cousin, implemented in commercial and open source versions, included in hardware for networks and storage. AAA is and will remain a key foundation of security in the big data era, but as a longtime information management person, I believe it’s time to acknowledge that it’s not enough, and we need a new A – anonymization.


Hadoop 2013 – Part Four: Players

The first three posts in this series talked about performance projects and platforms as key themes in what is beginning to feel like a  watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco, and Data Delivery Networks recently entered the distribution game, making it clear that capitalizing on potential differentiators (real or perceived)  in a hot market is still a powerful magnet. And in a space where much of the IP in the stack is open source, why not go for it? These introductions could all fall into the performance theme as well – they are all driven by innovations intended to improve Hadoop speed.

– more – 

Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

– more –

Hadoop and DI – A Platform Is Not A Solution

“Hadoop people” and “RDBMS people” – including some DBAs who have contacted me recently –  clearly have different ideas about what Data Integration is. And both may  differ from what Ted Friedman and I were talking about in our Gartner research note Hadoop Is Not a Data Integration Solution , although I think the DBAs’ concept is far closer to ours.

- more -

Stack Up Hadoop to Find Its Place in Your Architecture

2013 promises to be a banner year for Apache Hadoop, platform providers, related technologies – and analysts who try to sort it out. I’ve been wrestling with ways to make sense of it for Gartner clients bewildered by a new set of choices, and for them and myself, I’ve built a stack diagram that describes the functional layers of a Hadoop-based model.


2013 Data Resolution: Avoid Architectural Cul-de-Sacs

I had an inquiry today from a client using packaged software for a business system that is built on a proprietary, non-relational datastore (in this case an object-oriented DBMS.) They have an older version of the product – having “failed” with a recent upgrade attempt.

The client contacted me to ask about ways to integrate this OODBMS-based system with others in their environment. They said the vendor-provided utilities were not very good and hard to use, and the vendor has not given them any confidence it will improve. The few staff programmers who have learned enough internals have already built a number of one-off connections using multiple methods, and were looking for a more generalizable way to create a layer for other systems to use when they need data from the underlying database. They expect more such requests, and foresee chaos, challenges hiring and retaining people with the right skills, and cycles of increasing cost and operational complexity.
My reply: “you’re absolutely right.”

Diary of an Asian Swing: Day 3

This was a day of transition. No meetings in Hong Kong, so after a leisurely breakfast and a look at the news, I settled down for a rare session of uninterrupted writing. It was still Sunday back home, so the email was relatively caught up and I could focus. Finished first drafts of some Gartner Magic Quadrant DW DBMS content and sent them off to colleagues for review and assembly into our eventual document.

This MQ is my second, and I’m really enjoying the process this time now that I’m not trying to figure out what happens next. I’m especially pleased with the process of combining interview data from customer interviews and analysis of our inquiry traffic – hundreds for each of the four authors – with surveys we conducted specifically for the report.

Mark Beyer built a fantastic link for feeding survey criteria measured by numeric scores from customers directly into relevant cells on our underlying spreadsheet. We had already done some collective scoring of our own in those cells, and the new exercise showed us how customers read the same issues. And it moved some of the scores significantly, with some vendors doing better than we expected in some areas, and others getting hammered. When a sizable number of survey respondents highlight an issue like support as a serious weakness, one has to take notice.

Several hours of uninterrupted time, a luxury that made the work move quickly, gave way to a decision about what to do with a free afternoon. I decided to use it for more work, so instead of an excursion I headed to the airport hours ahead of schedule to work in the attractive Cathay Pacific lounge. But I was surprised by a helpful check-in agent who told me there was an earlier flight I could get onto. As a result, I arrived in spectacular Singapore late in the evening instead of well into the night, and was in my hotel for a good night’s rest before early morning meetings the next day.

And of course, working on the place – even without wi-fi – was just as good as working in the lounge. So I had the chance to complete a new draft of a forthcoming Hadoop Pilot Best Practices piece and send it off to a collaborator. A good day indeed.

Diary of an Asian Swing: Day 1

I’ve never been a diarist. But as an exercise, I’m going to document this trip: two weeks on the road to Asia and Australia. Almost all work, though there is one day of weekend and recovery time built in.

Friday, Nov 2. Cathay Pacific flight to Hong Kong. Business class. Comfortable, well-appointed cabin. Friendly, courteous staff.

Learned system, table, storage, seat. NO WIFI! OMG. 14 unconnected hours – that’s what stimulated the idea to blog. Otherwise I would have been tweeting a lot. Which is fun, but this will be a change. Let’s start with some entertainment and settle in….

Video presentation of Sir Paul McCartney’s Kisses on the Bottom. Diana Krall. John Pizzarelli. Joe Walsh! It’s delightful – commentary, moments of studio play, tenderness. No real edge, but that’s not what it’s about. Luscious melody, beautiful jazz harmonies. Diana is a sideman here and a brilliant one. McCartney is who he is – always a touch too smooth and a touch too sweet, but it works so well here. Wonderful selection of songs and I love his own Valentine one.

Next up is a video piece about the Who’s Quadrophenia. Clearly part of Townshend’s current campaign to remind the world of his brilliant work – and well done. A good narrative sprinkled with revelatory bits about Moon, and Daltrey, and the terrible chaos they endured. Live performance footage and some nice stuff at the control board with Pete and the engineers highlighting dimensions of the music.

Now, it’s time to get to work – pick some music for background, get the computer out. Found Xuefei Yang – guitarist, playing Bach. Lovely tone, tempos, clarity. Don’t know her. Will fix that. And now to work – Magic Quadrant writing.

Wait. First a survey by Cathay Pacific. Why not?

A few Magic Quadrant hours reviewing interviews, surveys, briefing content, and our own scoring and it’s on to writing up a draft for one of the vendors and sending it off for my colleagues to comment and collaborate on the content.

Now, a break. I’ve earned one. Stretch the legs, a few exercises…

Treated to a reboot of the airplane’s computer system. Red Hat, it turns out. A parade of system level messages marched across the screen, unfathomable to anyone unfamiliar with Linux. The darkness, and eventually, a blessed progress bar, followed by the return of the flight map. We’ve passed over Siberia and the Gulf of Shelekhova; now we’re over the Sea of Okhotsk, only 6 and a half hours to go.

A moment to update this diary and a little reading – music choice Wynton Marsalis & Eric Clapton. Really? Didn’t know about this one. Would love to look it up on the net, but NO WIFI. Oh well, a little Time magazine, and then some Hadoop Operations by Eric Sammer.

Eric’s book is highly recommended – lucid, well written and doesn’t demand extraordinary technical depth to understand. Find it here: http://www.amazon.com/Hadoop-Operations-Eric-Sammer/dp/1449327052/ref=sr_1_1?s=books&ie=UTF8&qid=1351944059&sr=1-1&keywords=hadoop+operations 

Arrived on time. Hong Kong continues to impress: efficient, clean, modern as I remembered. Nice hotel. Good wifi. Sent off MQ writeups to colleagues and synchronized all the email I did while offline. Time to crash. Day One is done.


Get every new post delivered to your Inbox.

Join 134 other followers