Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?

In early January 2012, the world of big data was treated to an interesting series of product releases, press announcements, and blog posts about Hadoop versions.  To begin with, we had the announcement of Apache version 1.0 at long last, in a press release. Although there were grumblings here and there in the twittersphere that changes to release numbers are meaningless, my discussions with Gartner’s enterprise customers indicate otherwise. Products with release numbers like 0.20.2 make the hair on Procurement’s neck stand on end, and as Hadoop begins to get mainstream attention (Gartner’s clients, see Hype Cycle for Data Management 2011), IT architects and executives find such optics quite important. Hadoop is moving beyond pioneers like Amazon, Yahoo! and LinkedIn into shops like JP Morgan Chase, and they pay attention to such things.

…more…

Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.

more

Microsoft Leaps Late, Lags with SQL Server PDW

Microsoft chose a user group meeting, Professional Association for SQL Server (PASS), for the rollout of its long-awaited, and late, SQL Server 2008 R2 Parallel Data Warehouse (note, yet again, how foolish it is for vendors to trap themselves with dates in product names.) PDW is late to market; there are other MPP DBMS players there already, and Microsoft is behind in functionality compared to some of them. Some of the most eagerly–awaited features are evidently not slated for the first release. It’s also far behind its originally planned ship date following the acquisition of DatAllegro in 2008. Read more of this post

Calpont’s InfiniDB – Another ADBMS Insurgent Arises

Calpont, rapidly emerging as yet another contender in the ADBMS sweepstakes, has announced version 2.0 of InfiniDB, its columnar MPP offering over shared storage. The value proposition hits now-familiar themes: high-performance query, fast data loading, data compression, and parallelized user defined functions (UDFs), all of which are becoming key checkoff capabilities. InfiniDB also hits hard on pricing, which it says dramatically undercuts that of its competitors. And a 30-day free trial of the enterprise edition sweetens the offer. For those comfortable with open source, the 2.0 release of the  community edition is available as well. Calpont says the community edition (which is limited to a single server but is otherwise database feature-complete) has had 15,000 downloads. But the company’s relationship with Oracle for its MySQL components must be considered a risk going forward.

InfiniDB, like Infobright, is built atop Oracle’s MySQL. (I posted about Infobright last year, and it also has made significant progress, drawing favorable comment in the open source community for its continuing maturation.)  Calpont’s relationship with Oracle must be seen as a risk factor..Oracle’s recent decisions about support raise questions about its interest in supporting anyone who is not an enterprise-class user of the Oracle-branded MySQL offering. Calpont has a deal through 2012 that includes an OEM license to integrate and use MySQL as the InfiniDB branded solution, and access to the MySQL channel. What will happen beyond that is clearly a concern. Read more of this post

EMC Jumps Into ADBMS Appliance Game

The Data Computing Appliance, first deliverable from EMC’s acquisition of Greenplum, was announced last month, only 75 days after the acquisition closed, and it doesn’t lack for ambition.  Pat Gelsinger, President and Chief Operating Officer, EMC Information Infrastructure, pointed to the high level opportunity: unlocking the “hidden value” of enormous and growing data assets every company is increasingly holding, and often failing to leverage. The appliance will reach many hitherto untapped resources in the data centers that EMC occupies. Adding EMC’s manufacturing, sales and marketing, and reference architectures to the Greenplum IP brings what Gelsinger calls Greenplum’s “first phase” to its completion. And begins what is likely to be a sizable battle with Oracle, Teradata and IBM, if EMC mounts campaigns and spending to match its ambitious vision. Read more of this post

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space - like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

Microsoft’s Parallel DW – Still Waiting

Microsoft’s SQL Server Parallel Data Warehouse (PDW) has been eagerly awaited for a long time. It still is. Though much of the news at the BI Conference running in parallel with TechEd in New Orleans (discussed here) was generally quite good, the PDW story was much less so. It’s late, and it’s not all there. Read more of this post

RainStor Ramp Rolls On

Click to see larger version

When I last spoke to Rainstor, a new round of funding had just come in and prospects seemed bright. It could hardly have happened at a better time. A recent Information Week study of 437 business technology professionals showed that more than half are managing over 10 TB of data, 7% managing 201-500 TB, and 8% more than 500 TB. The study says that “for the first time  enterprise storage architects are more worried about meeting capacity demands than they are about data security.” On the heels of the Economist’s recent assertion that more data is being generated than storage being built to contain it, the issue is more critical than ever. Read more of this post

EMC World 2010 and IT Vendor Evolution

From Charles King, Pund-IT, Inc.

IT vendor conferences offer a variety of amusements and educational opportunities, and EMC World 2010 was no exception. But the most interesting aspect of this year’s event focused on how things have changed for EMC during the past year. Consider this: EMC World 2009 kicked off with a keynote co-hosted by company President and CEO Joe Tucci and VMware President and CEO Paul Maritz, emphasizing the companies’ common vision of virtualization as the foundation for cloud computing. Last week in Boston, Tucci used his solo keynote to highlight EMC’s notion of private cloud computing as the rightful future of enterprise datacenters and discussed the partnerships EMC is pursuing to make that vision a reality. Read more of this post

VDI Market Heats Up – and So Do Vendor Rivalries

I’m pleased to welcome Laura DiDio of ITIC as a contributor. ITIC is a rich source of data and insightful commentary. This piece originally appeared in the PUND-IT newsletter.

There’s no hotter market in high tech this year than Virtual Desktop Infrastructure (VDI) and you don’t need sales and unit shipment statistics to prove it.  No, the best measurement of VDI’s hotness is the sudden flurry of vendor announcements accompanied by a concomitant rise in vitriol. The main players in the VDI market are actually two sets of pairs. It’s Citrix and Microsoft lining up against VMware and EMC for Round 2 in the ongoing virtualization wars. On March 18, Citrix and Microsoft came out swinging, landing the first potent, preemptive punches right where they hope will hurt VMware the most: in its pocketbook. Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 86 other followers