Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?

In early January 2012, the world of big data was treated to an interesting series of product releases, press announcements, and blog posts about Hadoop versions.  To begin with, we had the announcement of Apache version 1.0 at long last, in a press release. Although there were grumblings here and there in the twittersphere that changes to release numbers are meaningless, my discussions with Gartner’s enterprise customers indicate otherwise. Products with release numbers like 0.20.2 make the hair on Procurement’s neck stand on end, and as Hadoop begins to get mainstream attention (Gartner’s clients, see Hype Cycle for Data Management 2011), IT architects and executives find such optics quite important. Hadoop is moving beyond pioneers like Amazon, Yahoo! and LinkedIn into shops like JP Morgan Chase, and they pay attention to such things.

…more…

Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.

more

IBM Fills Out Netezza Lineup With High Capacity Appliance

In the months since IBM closed its Netezza acquisition, the data warehouse appliance pioneer has been busy, if the announcements at this week’s Enzee are any indication. An enthusiastic crowd – 1000 strong – heard CEO Jim Baum deliver the news: new hardware, software and partnerships.The biggest news was The Appliance Formerly Known As Cruiser, now known as the Netezza High Capacity Appliance (HCA). A wag made up some t-shirts bearing the acronym TAFKAC and did quite well. IBM is aiming to push the size perception for Netezza higher. How high? Half a PB in a rack. You can scale it to 10PB.

more

Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post

Cloudera Convenes Colleagues to Crunch Content (Make Mine Membase)

Over the past two years, Cloudera has demonstrated the power of surrounding emerging open source software with support services, expertise and its own IP. The firm has  racked up over 30 customers since its founding in late 2008, and emerged as the leading source of Apache Hadoop. Cloudera’s recent C round of financing brought its funding to $36 million, and it has been investing aggressively, with 45 employees, a very visible voice on the Big Data circuit and a stellar, experienced leadership team. It evangelizes through training, thought leadership, and increasingly through a growing sales and marketing team. Cloudera deserves a full post of its own; I hope to get to that before yearend.

One indicator of Cloudera’s precocity has been its prioritization of key alliances – higher than many firms its size – and that strategy is likely to have a big payoff if the partnerships are well executed and bring the marketplace momentum and the value they promise to fruition. Two key recent announcements involved Membase and Informatica. I’ll discuss the latter in another post – here I’ll talk about why the Membase deal makes so much sense. Read more of this post

Calpont’s InfiniDB – Another ADBMS Insurgent Arises

Calpont, rapidly emerging as yet another contender in the ADBMS sweepstakes, has announced version 2.0 of InfiniDB, its columnar MPP offering over shared storage. The value proposition hits now-familiar themes: high-performance query, fast data loading, data compression, and parallelized user defined functions (UDFs), all of which are becoming key checkoff capabilities. InfiniDB also hits hard on pricing, which it says dramatically undercuts that of its competitors. And a 30-day free trial of the enterprise edition sweetens the offer. For those comfortable with open source, the 2.0 release of the  community edition is available as well. Calpont says the community edition (which is limited to a single server but is otherwise database feature-complete) has had 15,000 downloads. But the company’s relationship with Oracle for its MySQL components must be considered a risk going forward.

InfiniDB, like Infobright, is built atop Oracle’s MySQL. (I posted about Infobright last year, and it also has made significant progress, drawing favorable comment in the open source community for its continuing maturation.)  Calpont’s relationship with Oracle must be seen as a risk factor..Oracle’s recent decisions about support raise questions about its interest in supporting anyone who is not an enterprise-class user of the Oracle-branded MySQL offering. Calpont has a deal through 2012 that includes an OEM license to integrate and use MySQL as the InfiniDB branded solution, and access to the MySQL channel. What will happen beyond that is clearly a concern. Read more of this post

IBM Acquires Netezza – ADBMS Consolidation Heats Up

IBM’s bid to acquire Netezza makes it official; the insurgents are at the gates. A pioneering and leading ADBMS player, Netezza is in play for approximately $1.7 billion or 6 times revenues [edited 9/30; previously said "earnings," which is incorrect.] When it entered the market in 2001, it catalyzed an economic and architectural shift with an appliance form factor at a dramatically different price point. Titans like Teradata and Oracle (and yes, IBM) found themselves outmaneuvered as Netezza mounted a steadily improving business, adding dozens of new names every quarter, continuing to validate its market positioning as a dedicated analytic appliance. It’s no longer alone there; some analytic appliance play is now in the portfolio of most sizable vendors serious about the market. Read more of this post

Aster Data Adds Columnar Storage, Puts Stake in Ground for Hybrid Multistores

Aster Data has announced its new version, nCluster 4.6, which now includes a column data store, staking a claim as the first ADBMS to combine SQL and MapReduce on a hybrid row and column MPP system. While its R&D has hitherto been focused on enabling advanced in-database analytic processing in its flagship “Data-Analytics Server, ” Aster has clearly had other irons in the fire. CTO Tasso Argyros tells me that the new column store is entirely new, written from scratch to ensure that Aster’s SQL-MR is a universal programming layer atop storage, and that its 1000+ MapReduce-ready analytic functions (and UDFs) will run on both row- and column-based data. Read more of this post

Wrapping Up TDWI – Agile? You Bet. And You Should.

I’ve posted twice about TDWI’s San Diego event, and I still haven’t exhausted the thoughts I wanted to share. That’s a measure of just how important and successful I think the show was. Three things jumped out at me:

  • The audience is back, and it’s ready to spend. The event was buzzing; I was told by organizers that the numbers significantly exceeded expectations. That was easy to see; speeches, booths, and hallways were packed. Vendors told me booth traffic was great, and that visitors (although typically not budget holders) were in or preparing for projects and product acquisitions.
  • The hunger for content continues. In my session and in others, I saw show-of-hands responses to questions like “how many of you have been here before?” “How many of you have built this kind of system?” “How many of you have been trained on [pick a DW-related topic]?”  The responses made it clear that like other TDWI events I’ve been to, this one was packed with people who were new or intermediate users with training in mind. TDWI’s basic training mission has never been healthier.
  • Agile matters. A lot. My first post on the event was put up rather quickly and as the event progressed, I heard the theme flesh out well, with real stories from users who applied the techniques to their projects. My initial impression that we might be looking at another buzzword poorly applied was wrong. Agile’s real, and TDWI’s coverage and guidance is rich and well worth investigating. The vendors? Well, they’re doing what they always do. Caveat emptor. I repeat: it’s not an adjective.  Learn what it means and apply it. You can’t buy it. Read more of this post

More TDWI Notes – ParAccel Rolling On, HP Stalled, Vertica Leading Insurgents

On my second day at TDWI, I was in meetings all day – events like this are a great opportunity for analysts to catch up with many of the companies they follow at one time, and this particular one was packed with sponsors. Congrats to the folks who sell sponsorships – they had a packed exhibit hall, and a lot of very interested attendees. I got a chance to chat at a few booths (all buzzing), ask a few attendees some real-world questions (and was asked some surprising ones myself), and get a sense of the workload in the trenches (heavy and growing.)

Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 86 other followers