In the Hadoop community there is a great deal of talk of late about its positioning as an Enterprise Data Hub. My description of this is “aspirational marketing;” it addresses the ambition its advocates have for how Hadoop will be used, when it realizes the vision of capabilities currently in early development. There’s nothing wrong with this, but it does need to be kept in perspective. It’s a long way off.


Talk to security folks, especially network ones, and AAA will likely come up. It stands for authentication, authorization and accounting (sometimes audit). There are even protocols such as Radius (Remote Authentication Dial In User Service, much evolved from its first uses) and Diameter, its significantly expanded (and punnily named) newer cousin, implemented in commercial and open source versions, included in hardware for networks and storage. AAA is and will remain a key foundation of security in the big data era, but as a longtime information management person, I believe it’s time to acknowledge that it’s not enough, and we need a new A – anonymization.


When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us.  Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:


It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.

I had an inquiry today from a client using packaged software for a business system that is built on a proprietary, non-relational datastore (in this case an object-oriented DBMS.) They have an older version of the product – having “failed” with a recent upgrade attempt.

The client contacted me to ask about ways to integrate this OODBMS-based system with others in their environment. They said the vendor-provided utilities were not very good and hard to use, and the vendor has not given them any confidence it will improve. The few staff programmers who have learned enough internals have already built a number of one-off connections using multiple methods, and were looking for a more generalizable way to create a layer for other systems to use when they need data from the underlying database. They expect more such requests, and foresee chaos, challenges hiring and retaining people with the right skills, and cycles of increasing cost and operational complexity.
My reply: “you’re absolutely right.”

At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’  steady inroads toward enterprise, mainstream application acceptance very visible.

Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running Paraccel’s high-performance columnar, compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16 virtual cores and 120 GB of memory and operate on 10Gigabit ethernet.

Reserved pricing (the more likely scenario, involving a commitment of 1 year or 3 years) is set at “under $1000 per TB per year” for a 3 year commitment, combining upfront and hourly charges. Continuous, automated backup for up to 100% of the provisioned storage is free. Amazon does not charge for data transfer into or out of the data clusters. Network connections, of course, are not free  - see Doug Henschen’s Information Week story for details.

This is a dramatic thrust in pricing, but it does not come without giving up some things.


This was a day of transition. No meetings in Hong Kong, so after a leisurely breakfast and a look at the news, I settled down for a rare session of uninterrupted writing. It was still Sunday back home, so the email was relatively caught up and I could focus. Finished first drafts of some Gartner Magic Quadrant DW DBMS content and sent them off to colleagues for review and assembly into our eventual document.

This MQ is my second, and I’m really enjoying the process this time now that I’m not trying to figure out what happens next. I’m especially pleased with the process of combining interview data from customer interviews and analysis of our inquiry traffic – hundreds for each of the four authors – with surveys we conducted specifically for the report.

Mark Beyer built a fantastic link for feeding survey criteria measured by numeric scores from customers directly into relevant cells on our underlying spreadsheet. We had already done some collective scoring of our own in those cells, and the new exercise showed us how customers read the same issues. And it moved some of the scores significantly, with some vendors doing better than we expected in some areas, and others getting hammered. When a sizable number of survey respondents highlight an issue like support as a serious weakness, one has to take notice.

Several hours of uninterrupted time, a luxury that made the work move quickly, gave way to a decision about what to do with a free afternoon. I decided to use it for more work, so instead of an excursion I headed to the airport hours ahead of schedule to work in the attractive Cathay Pacific lounge. But I was surprised by a helpful check-in agent who told me there was an earlier flight I could get onto. As a result, I arrived in spectacular Singapore late in the evening instead of well into the night, and was in my hotel for a good night’s rest before early morning meetings the next day.

And of course, working on the place – even without wi-fi – was just as good as working in the lounge. So I had the chance to complete a new draft of a forthcoming Hadoop Pilot Best Practices piece and send it off to a collaborator. A good day indeed.

I’ve never been a diarist. But as an exercise, I’m going to document this trip: two weeks on the road to Asia and Australia. Almost all work, though there is one day of weekend and recovery time built in.

Friday, Nov 2. Cathay Pacific flight to Hong Kong. Business class. Comfortable, well-appointed cabin. Friendly, courteous staff.

Learned system, table, storage, seat. NO WIFI! OMG. 14 unconnected hours – that’s what stimulated the idea to blog. Otherwise I would have been tweeting a lot. Which is fun, but this will be a change. Let’s start with some entertainment and settle in….

Video presentation of Sir Paul McCartney’s Kisses on the Bottom. Diana Krall. John Pizzarelli. Joe Walsh! It’s delightful – commentary, moments of studio play, tenderness. No real edge, but that’s not what it’s about. Luscious melody, beautiful jazz harmonies. Diana is a sideman here and a brilliant one. McCartney is who he is – always a touch too smooth and a touch too sweet, but it works so well here. Wonderful selection of songs and I love his own Valentine one.

Next up is a video piece about the Who’s Quadrophenia. Clearly part of Townshend’s current campaign to remind the world of his brilliant work – and well done. A good narrative sprinkled with revelatory bits about Moon, and Daltrey, and the terrible chaos they endured. Live performance footage and some nice stuff at the control board with Pete and the engineers highlighting dimensions of the music.

Now, it’s time to get to work – pick some music for background, get the computer out. Found Xuefei Yang – guitarist, playing Bach. Lovely tone, tempos, clarity. Don’t know her. Will fix that. And now to work – Magic Quadrant writing.

Wait. First a survey by Cathay Pacific. Why not?

A few Magic Quadrant hours reviewing interviews, surveys, briefing content, and our own scoring and it’s on to writing up a draft for one of the vendors and sending it off for my colleagues to comment and collaborate on the content.

Now, a break. I’ve earned one. Stretch the legs, a few exercises…

Treated to a reboot of the airplane’s computer system. Red Hat, it turns out. A parade of system level messages marched across the screen, unfathomable to anyone unfamiliar with Linux. The darkness, and eventually, a blessed progress bar, followed by the return of the flight map. We’ve passed over Siberia and the Gulf of Shelekhova; now we’re over the Sea of Okhotsk, only 6 and a half hours to go.

A moment to update this diary and a little reading – music choice Wynton Marsalis & Eric Clapton. Really? Didn’t know about this one. Would love to look it up on the net, but NO WIFI. Oh well, a little Time magazine, and then some Hadoop Operations by Eric Sammer.

Eric’s book is highly recommended – lucid, well written and doesn’t demand extraordinary technical depth to understand. Find it here: 

Arrived on time. Hong Kong continues to impress: efficient, clean, modern as I remembered. Nice hotel. Good wifi. Sent off MQ writeups to colleagues and synchronized all the email I did while offline. Time to crash. Day One is done.

In the months since IBM closed its Netezza acquisition, the data warehouse appliance pioneer has been busy, if the announcements at this week’s Enzee are any indication. An enthusiastic crowd – 1000 strong – heard CEO Jim Baum deliver the news: new hardware, software and partnerships.The biggest news was The Appliance Formerly Known As Cruiser, now known as the Netezza High Capacity Appliance (HCA). A wag made up some t-shirts bearing the acronym TAFKAC and did quite well. IBM is aiming to push the size perception for Netezza higher. How high? Half a PB in a rack. You can scale it to 10PB.


It was hard to decide where to look first in Las Vegas this year at IBM’s flagship information management event. Coming as it did on the heels of a massive, sprawling Oracle Open World, it was also overwhelming, but distinguished itself immediately by its focus. Whereas Oracle has smashed together hardware systems, apps, middleware, java and development, systems management and database into a bewildering multi-site show, IBM continues to run separate events for Websphere, Rational, Tivoli, and Lotus. No single IBM event trumpets “we’re the biggest,” and they don’t take over the towns they’re in; the content seems a bit more manageable. And as an attendee who hopes to get a broad view, I’m happy with that. However, as I’ll discuss below, Oracle is winning the messaging war nonetheless.

There was indeed talk of systems at IoD this year, as Smart Analytics Systems got a refresh and some added units on x-based platforms. Flash memory additions to the x-based 5600, bundling InfoSphere and Cognos along with an updated Linux release, provide the basis for a good story along with more cores, memory and storage. A similar story is possible for the POWER-based 7700, which also added the new Blue Darter solid state disk (SSD.) And the z audience gets the 9600, with its sidecar, the transparent offload to the Smart Analytics Optimizer. Yes, IBM has a column-based database, with innovative storage tweaks and an optimizer that knows when to use it and when not to. Great promise there.

So what’s wrong with this picture? Try this: ask 10 IT people what Exadata is, and what Smart Analytics Systems are. Ask them who makes the offerings, and what they do. Go ahead…I’ll wait….

Back? OK. Here’s what I learned, after doing that experiment at 3 events attended by IT people (data people, in fact.) 8 of 10 I asked knew Oracle makes Exadata and it’s a wicked fast platform for data. 4 of 10 knew who makes the other one, and fewer knew why. On visibility and buzz, game Oracle.

There is much more to talk about, and visibility and buzz are not everything. IBM’s numbers continue to be good, and nobody in Armonk is complaining. But the IBM Software brand needs to get more attention, more investment, and a tighter, more focused story. The good news? Conversations I’ve been having suggest that it will in 2011, and it’s about time. Read more of this post


