Hadoop is in the Mind of the Beholder

This post was jointly authored by Merv Adrian (@merv) and Nick Heudecker (@nheudecker) and appears on both of our Gartner blogs.

In the early days of Hadoop (versions up through 1.x), the project consisted of two primary components: HDFS and MapReduce. One thing to store the data in an append-only file model, distributed across an arbitrarily large number of inexpensive nodes with disk and processing power; another to process it, in batch, with a relatively small number of available function calls. And some other stuff called Commons to handle bits of the plumbing. But early adopters demanded more functionality, so the Hadoop footprint grew. The result was an identity crisis that grows progressively more challenging for decisionmakers with almost every new announcement.

–more–

Data Security for Hadoop – Add-on Choices Proliferating

In my post about the BYOH market last October, I noted that increasing numbers of existing players are connecting their offerings to Apache Hadoop, even as upstarts enter their markets with a singular focus. And last month, I pointed out that Nick Heudecker and I detected a surprising lack of concern about security in a recent Hadoop webinar. Clearly, these two topics have an important intersection – both Hadoop specialists (including distribution vendors) and existing security vendors will need to expand their efforts to drive awareness if they are to capture an opportunity that is clearly going begging today. Security for big data will be a key issue in 2014 and beyond.

 

–more–

Aspirational Marketing and Enterprise Data Hubs

In the Hadoop community there is a great deal of talk of late about its positioning as an Enterprise Data Hub. My description of this is “aspirational marketing;” it addresses the ambition its advocates have for how Hadoop will be used, when it realizes the vision of capabilities currently in early development. There’s nothing wrong with this, but it does need to be kept in perspective. It’s a long way off.

–more–

AAA is Not Enough Security in the Big Data Era

Talk to security folks, especially network ones, and AAA will likely come up. It stands for authentication, authorization and accounting (sometimes audit). There are even protocols such as Radius (Remote Authentication Dial In User Service, much evolved from its first uses) and Diameter, its significantly expanded (and punnily named) newer cousin, implemented in commercial and open source versions, included in hardware for networks and storage. AAA is and will remain a key foundation of security in the big data era, but as a longtime information management person, I believe it’s time to acknowledge that it’s not enough, and we need a new A – anonymization.

More

BYOH – Hadoop’s a Platform. Get Used To It.

When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us.  Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:

—more—

Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

That Exciting New Stuff? Yeah… Wait Till It Ships.

A brief rant here: I am asked with great frequency how this RDBMS will hold off that big data play, how data warehouses will survive in a world where Hadoop exists, or whether Apple is done now that Android is doing well. There is a fundamental fallacy implicit in these questions.

–more–

Hadoop Summit Recap Part One – A Ripping YARN

I had the privilege of keynoting this year’s Hadoop Summit, so I may be a bit prejudiced when I say the event confirmed my assertion that we have arrived at a turning point in Hadoop’s maturation. The large number of attendees (2500, a big increase – and more “suits”) and sponsors (70, also a significant uptick) made it clear that the growth is continuing apace. Gartner’s data confirms this – my inquiry rate continues to grow, and my colleagues covering big data and Hadoop are all seeing steady growth too. But it’s not all sweetness and light. There are issues – and here we’ll look at the centerpeice of the technical messaging: YARN. Much is expected – and we seem to be doomed to wait a while longer.

— more — 

Open Source “Purity,” Hadoop, and Market Realities

I don’t often do a pure opinion piece but I feel compelled to weigh in on a queston I’ve been asked several times since EMC released its Pivotal HD recently. The question is whether it is somehow inappropriate, even “evil,” for EMC to enter the market without having “enough” committers to open source Apache projects. More broadly, it’s about whether other people can use, incorporate, add to and profit from Apache Hadoop.

–more–

Hadoop 2013 – Part Four: Players

The first three posts in this series talked about performance projects and platforms as key themes in what is beginning to feel like a  watershed year for Hadoop. All three are reflected in the surprising emergence of a number of new players on the scene, as well as some new offerings from additional ones, which I’ll cover in another post. Intel, WANdisco, and Data Delivery Networks recently entered the distribution game, making it clear that capitalizing on potential differentiators (real or perceived)  in a hot market is still a powerful magnet. And in a space where much of the IP in the stack is open source, why not go for it? These introductions could all fall into the performance theme as well – they are all driven by innovations intended to improve Hadoop speed.

– more – 

Follow

Get every new post delivered to your Inbox.

Join 134 other followers