Hadoop 2013 – Part One: Performance

It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.

– more –

Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post

Cloudera Convenes Colleagues to Crunch Content (Make Mine Membase)

Over the past two years, Cloudera has demonstrated the power of surrounding emerging open source software with support services, expertise and its own IP. The firm has  racked up over 30 customers since its founding in late 2008, and emerged as the leading source of Apache Hadoop. Cloudera’s recent C round of financing brought its funding to $36 million, and it has been investing aggressively, with 45 employees, a very visible voice on the Big Data circuit and a stellar, experienced leadership team. It evangelizes through training, thought leadership, and increasingly through a growing sales and marketing team. Cloudera deserves a full post of its own; I hope to get to that before yearend.

One indicator of Cloudera’s precocity has been its prioritization of key alliances – higher than many firms its size – and that strategy is likely to have a big payoff if the partnerships are well executed and bring the marketplace momentum and the value they promise to fruition. Two key recent announcements involved Membase and Informatica. I’ll discuss the latter in another post – here I’ll talk about why the Membase deal makes so much sense. Read more of this post

EMC Jumps Into ADBMS Appliance Game

The Data Computing Appliance, first deliverable from EMC’s acquisition of Greenplum, was announced last month, only 75 days after the acquisition closed, and it doesn’t lack for ambition.  Pat Gelsinger, President and Chief Operating Officer, EMC Information Infrastructure, pointed to the high level opportunity: unlocking the “hidden value” of enormous and growing data assets every company is increasingly holding, and often failing to leverage. The appliance will reach many hitherto untapped resources in the data centers that EMC occupies. Adding EMC’s manufacturing, sales and marketing, and reference architectures to the Greenplum IP brings what Gelsinger calls Greenplum’s “first phase” to its completion. And begins what is likely to be a sizable battle with Oracle, Teradata and IBM, if EMC mounts campaigns and spending to match its ambitious vision. Read more of this post

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space - like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

Oracle Exadata: Early Signs Promising

Exadata is looking good. In the past few months, I’ve had the chance to talk to several early adopters of Oracle Exadata V2, some in connection with a sponsored white paper Oracle has just published. It’s still early, but I see this product as a milestone, regardless of its commercial success. That is still to be determined, although I wouldn’t bet against it. How it will be affected by Oracle’s execution of the Sun acquisition is another open question, and the recent surprise layoffs, which showed that either the announced expectations were laughably off base or Ellison’s early announcements about  hiring plans were less than candid, don’t bode too well for the near term. Rob Enderle made some strong and provocative points in his guest post here. Read more of this post

VoltDB – DIY OLTP. Open Source. Win.

In a seemingly perfect marriage of product and target market, database pioneer Mike Stonebraker’s new in-memory database company VoltDB has emerged from stealth mode using the open source model, soon to be open core. Its first release, GPL licensed Community Edition will appeal to developers who need blindingly fast transaction processing and are willing to do a lot of work themselves to get there – the do it yourself (DIY) database. Who better than the Gluecon community? Gluecon was the perfect place to do the formal roll out, filled as it is with hands-on folks looking to work with NoSQL products (like Cassandra, CouchDB, MongoDB, Riak, Voldemort, etc.)

Read more of this post

IBM Gets Feisty — Mobilizes Analytics for Oracle Battle

In July 2009, IBM announced the Smart Analytics System 7600, a workload-optimized, pre-integrated bundle of hardware and software targeted at the business analytics market. Included in that package are an IBM POWER 550 running AIX, storage, plus InfoSphere Warehouse Enterprise Edition (which consists of DB2, Warehouse design and management tools + Cubing, Data Mining and Text Analytics services), and Cognos 8 Business Intelligence, configured and tuned, and “health check” features. Accommodations are made if the customer already has licensed some of the software and wants to use it on the platform; in this sense, the software is described as “optional.” This month, IBM broadened the story and upped the ante, making Smart Analytics System a key weapon in its widening battle with Oracle.

This post is a slightly updated version of a piece that appeared in the PUND-IT newsletter. Read more of this post

Sybase Rolls On – Make Some Noise!

Sybase has announced yet another record revenue result for the third quarter of 2009.  Like other leading data management firms, its database business demonstrated continuing vitality in a difficult economic period. With 32% growth in database licensing revenues against a strong year over year comparison, the venerable DBMS provider continued a string of recent strong results.

Read more of this post

Oracle on Database: It’s On. And They’re Not Kidding.

Oracle is the company that led the industry into making RDBMS the data persistence vehicle of choice, and though its flagship is still Number One, many other topics floated around as 35,000 people attended Oracle Open World (OOW) in San Francisco recently. The spotlight stayed firmly planted: “What will Larry say about clouds/IBM/Fusion apps?”; Marc Benioff and Larry; Arnold and Larry. But if there’s anything Larry Ellison is passionate about, even as he sets his sights on IBM (hardware) and SAP (apps) – his two most important competitors, he said at the Churchill Club recently – it’s database, and he’s energized by the appliance opportunity. Andy Mendelsohn, SVP of Database Server Technologies put it simply in a conversation: “the only product Larry has spoken of in the last 3 earnings calls is Exadata.” He is more involved than in recent years, and that means one thing: everyone else had better watch out. What analysts learned about the new release makes that very clear: Oracle has been busy, and there is a lot of exciting new technology coming. Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 110 other followers