Hadoop Summit Recap Part One – A Ripping YARN

I had the privilege of keynoting this year’s Hadoop Summit, so I may be a bit prejudiced when I say the event confirmed my assertion that we have arrived at a turning point in Hadoop’s maturation. The large number of attendees (2500, a big increase – and more “suits”) and sponsors (70, also a significant uptick) made it clear that the growth is continuing apace. Gartner’s data confirms this – my inquiry rate continues to grow, and my colleagues covering big data and Hadoop are all seeing steady growth too. But it’s not all sweetness and light. There are issues – and here we’ll look at the centerpeice of the technical messaging: YARN. Much is expected – and we seem to be doomed to wait a while longer.

— more — 

Hadoop 2013 – Part Four: Players

The first three posts in this series talked about performance projects and platforms as key themes in what is beginning to feel like a  watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco, and Data Delivery Networks recently entered the distribution game, making it clear that capitalizing on potential differentiators (real or perceived)  in a hot market is still a powerful magnet. And in a space where much of the IP in the stack is open source, why not go for it? These introductions could all fall into the performance theme as well – they are all driven by innovations intended to improve Hadoop speed.

– more – 

Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

– more –

Hadoop 2013 – Part Two: Projects

In Part One of this series, I pointed out that how significant attention is being lavished on performance in 2013. In this installment, the topic is projects, which are proliferating precipitously. One of my most frequent client inquiries is “which of these pieces make Hadoop?” As recently as a year ago, the question was pretty simple for most people: MapReduce, HDFS, maybe Sqoop and even Flume, Hive, Pig, HBase, Lucene/Solr, Oozie, Zookeeper. When I published the Gartner piece How to Choose the Right Apache Hadoop Distribution, that was pretty much it.

–more–

Hadoop 2013 – Part One: Performance

It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.

– more –

Diary of an Asian Swing: Day 4

Halfway across the world you go to breakfast and see a neighbor is in your hotel too. How often does it happen? Today I saw an SAP colleague I worked with two decades ago at Sybase – and his colleague, with whom I’ll meet while in Singapore. Great start to the day.

This day was all business. Met several Gartner clients to talk Big Data (since that was my billing.) Interest is high, and like North American firms, one of the key questions, as always, is Value. “What are people doing? What is proving useful from a business perspective?”

Gartner’s local office is beautiful – two floors in a thriving business neighborhood in one of the world’s most vibrant cities. I was told per capita income here is the second highest in the world, and the way the city is kept continues to impress: clean, efficient, beautifully designed and planted with fabulous flora everywhere. Our people here are professional, motivated, friendly and prepared for all our meetings, making sure I know who we’re meeting with and why.

It was a busy, stimulating day capped with dinner with my colleague Arun Chandrasekaran in the Pan Pacific Hotel’s restaurant. Multiple serving stations with different cuisines: Indian, Cantonese, Japanese…. that marvelous Singaporean polyglot cuisine I love. And if the food was good, the conversation was even better. Arun and I talked about how his infrastructure research and my software focus converged in big data and what our next collaboration should be after the Hadoop pilots piece we’re nearing completion on now.

Closing the day with a little BBC World in my room, I watched the pre-election coverage, amused by the overloading of the “battleground states” metaphor when I switched to CNN. They even referred to reporters “embedded” there. Please. Thank goodness this overpriced, overheated exercise will soon be complete. And after all the sound and fury, I don’t expect much will have changed.

Diary of an Asian Swing: Day 2

(Written on iPhone) Typical Eurostyle breakfast. Learn at desk that I DO have a TV, but have to slide picture over to reveal it. Feel remarkably stupid.
A little work, then onto subway. Like everything else here, clean, efficient, modern and packed with people – most playing with their smartphones. I’m the only Anglo in my car. Signs in cars that illuminate stops, connections, which side door will open. Off toward Victoria Peak, near Sheung Wan station at end of line. But think better of it: Macau ferry is here. Change of plans and 10 minutes later I’m on an equally crowded ferry (but with a [lousy] reserved seat.) It’s very hazy and I can’t get to an open window or outside anyway, so no pictures.

(Back on computer) Lovely ride, bit a disappointment when I got there. Entering Macau through the ferry terminal is like going to NY and arriving via the Port Authority Bus Terminal. You won’t form a good impression. There is no nice harborside park. The place seems designed to draw the gambling crowd – the elevated walkway I took after the 45 minute immigration process took me right onto a casino floor. (I could have taken a cab, or a courtesy bus to the Wynn or one of the other new splashy joints, but that’s not what I was there for.) After a few minutes determining there was nothing within reasonable walking distance I wanted to see, I turned around and went back to Hong Kong. Another nice boat ride, easy transfer back to the subway and the hotel. Not much of an adventure. But I enjoyed myself nonetheless.

Next up, a few hours of work – more Magic Quadrant stuff. Then a light dinner and an early night. Feeling a little jet lagged in that disconnected sort of way, although I’m not tired.

Guest Post: Leading the Logical Data Warehouse Charge Has its Challenges

From my colleague Mark Beyer, who speculates about how leadership in moving toward the logical data warehouse (LDW) will be received: 

The logical data warehouse is already creating a stir in the traditional data warehouse market space. Less than 5% of clients with implemented warehouses that we speak with are pursuing three or more of the six aspects of a logical warehouse: 

  • repositories
  • data virtualization
  • distributed processes
  • active auditing and optimization
  • service level negotiation
  • ontological and taxonomic metadata

That means we are in a very early stage regarding the adoption trend, and vendors who are aggressively moving toward it are ahead of their customers.

..more…

Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?

In early January 2012, the world of big data was treated to an interesting series of product releases, press announcements, and blog posts about Hadoop versions.  To begin with, we had the announcement of Apache version 1.0 at long last, in a press release. Although there were grumblings here and there in the twittersphere that changes to release numbers are meaningless, my discussions with Gartner’s enterprise customers indicate otherwise. Products with release numbers like 0.20.2 make the hair on Procurement’s neck stand on end, and as Hadoop begins to get mainstream attention (Gartner’s clients, see Hype Cycle for Data Management 2011), IT architects and executives find such optics quite important. Hadoop is moving beyond pioneers like Amazon, Yahoo! and LinkedIn into shops like JP Morgan Chase, and they pay attention to such things.

…more…

Mark Beyer, Father of the Logical Data Warehouse, Guest Post

Another guest post, this time from my colleague and friend Mark Beyer.

My name is Mark Beyer, and I am the “father of the logical data warehouse”. So, what does that mean? First,  if like any father, you are not willing to address your ancestry with full candor you will lose your place in the universe and wither away without making a meaningful contribution. As an implementer in the field, I was a student and practitioner of both Inmon and Kimball. I learned as much or more from my clients and my colleagues during multiple implementations as I did from studying any methodology. My Gartner colleagues challenged my concepts and helped hammer them into a comprehensive and complete concept. Simply put, I was willing to consider DNA contributions from anyone and anywhere, but through a form of unnatural selection, persisted in choosing to include the good genes and actively removing the undesirable elements.

more…

Follow

Get every new post delivered to your Inbox.

Join 137 other followers