Hadoop 2013 – Part Four: Players

The first three posts in this series talked about performance projects and platforms as key themes in what is beginning to feel like a  watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco, and Data Delivery Networks recently entered the distribution game, making it clear that capitalizing on potential differentiators (real or perceived)  in a hot market is still a powerful magnet. And in a space where much of the IP in the stack is open source, why not go for it? These introductions could all fall into the performance theme as well – they are all driven by innovations intended to improve Hadoop speed.

– more – 

Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

– more –

Hadoop 2013 – Part One: Performance

It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.

– more –

Stack Up Hadoop to Find Its Place in Your Architecture

2013 promises to be a banner year for Apache Hadoop, platform providers, related technologies – and analysts who try to sort it out. I’ve been wrestling with ways to make sense of it for Gartner clients bewildered by a new set of choices, and for them and myself, I’ve built a stack diagram that describes the functional layers of a Hadoop-based model.

more

Amazon Redshift Disrupts DW Economics – But Nothing Comes Without Costs

At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’  steady inroads toward enterprise, mainstream application acceptance very visible.

Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running Paraccel’s high-performance columnar, compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16 virtual cores and 120 GB of memory and operate on 10Gigabit ethernet.

Reserved pricing (the more likely scenario, involving a commitment of 1 year or 3 years) is set at “under $1000 per TB per year” for a 3 year commitment, combining upfront and hourly charges. Continuous, automated backup for up to 100% of the provisioned storage is free. Amazon does not charge for data transfer into or out of the data clusters. Network connections, of course, are not free  - see Doug Henschen’s Information Week story for details.

This is a dramatic thrust in pricing, but it does not come without giving up some things.

More…

Diary of an Asian Swing: Day 3

This was a day of transition. No meetings in Hong Kong, so after a leisurely breakfast and a look at the news, I settled down for a rare session of uninterrupted writing. It was still Sunday back home, so the email was relatively caught up and I could focus. Finished first drafts of some Gartner Magic Quadrant DW DBMS content and sent them off to colleagues for review and assembly into our eventual document.

This MQ is my second, and I’m really enjoying the process this time now that I’m not trying to figure out what happens next. I’m especially pleased with the process of combining interview data from customer interviews and analysis of our inquiry traffic – hundreds for each of the four authors – with surveys we conducted specifically for the report.

Mark Beyer built a fantastic link for feeding survey criteria measured by numeric scores from customers directly into relevant cells on our underlying spreadsheet. We had already done some collective scoring of our own in those cells, and the new exercise showed us how customers read the same issues. And it moved some of the scores significantly, with some vendors doing better than we expected in some areas, and others getting hammered. When a sizable number of survey respondents highlight an issue like support as a serious weakness, one has to take notice.

Several hours of uninterrupted time, a luxury that made the work move quickly, gave way to a decision about what to do with a free afternoon. I decided to use it for more work, so instead of an excursion I headed to the airport hours ahead of schedule to work in the attractive Cathay Pacific lounge. But I was surprised by a helpful check-in agent who told me there was an earlier flight I could get onto. As a result, I arrived in spectacular Singapore late in the evening instead of well into the night, and was in my hotel for a good night’s rest before early morning meetings the next day.

And of course, working on the place – even without wi-fi – was just as good as working in the lounge. So I had the chance to complete a new draft of a forthcoming Hadoop Pilot Best Practices piece and send it off to a collaborator. A good day indeed.

Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?

In early January 2012, the world of big data was treated to an interesting series of product releases, press announcements, and blog posts about Hadoop versions.  To begin with, we had the announcement of Apache version 1.0 at long last, in a press release. Although there were grumblings here and there in the twittersphere that changes to release numbers are meaningless, my discussions with Gartner’s enterprise customers indicate otherwise. Products with release numbers like 0.20.2 make the hair on Procurement’s neck stand on end, and as Hadoop begins to get mainstream attention (Gartner’s clients, see Hype Cycle for Data Management 2011), IT architects and executives find such optics quite important. Hadoop is moving beyond pioneers like Amazon, Yahoo! and LinkedIn into shops like JP Morgan Chase, and they pay attention to such things.

…more…

Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.

more

IBM Fills Out Netezza Lineup With High Capacity Appliance

In the months since IBM closed its Netezza acquisition, the data warehouse appliance pioneer has been busy, if the announcements at this week’s Enzee are any indication. An enthusiastic crowd – 1000 strong – heard CEO Jim Baum deliver the news: new hardware, software and partnerships.The biggest news was The Appliance Formerly Known As Cruiser, now known as the Netezza High Capacity Appliance (HCA). A wag made up some t-shirts bearing the acronym TAFKAC and did quite well. IBM is aiming to push the size perception for Netezza higher. How high? Half a PB in a rack. You can scale it to 10PB.

more

Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 110 other followers