Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

Calpont’s InfiniDB – Another ADBMS Insurgent Arises

Calpont, rapidly emerging as yet another contender in the ADBMS sweepstakes, has announced version 2.0 of InfiniDB, its columnar MPP offering over shared storage. The value proposition hits now-familiar themes: high-performance query, fast data loading, data compression, and parallelized user defined functions (UDFs), all of which are becoming key checkoff capabilities. InfiniDB also hits hard on pricing, which it says dramatically undercuts that of its competitors. And a 30-day free trial of the enterprise edition sweetens the offer. For those comfortable with open source, the 2.0 release of the  community edition is available as well. Calpont says the community edition (which is limited to a single server but is otherwise database feature-complete) has had 15,000 downloads. But the company’s relationship with Oracle for its MySQL components must be considered a risk going forward.

InfiniDB, like Infobright, is built atop Oracle’s MySQL. (I posted about Infobright last year, and it also has made significant progress, drawing favorable comment in the open source community for its continuing maturation.)  Calpont’s relationship with Oracle must be seen as a risk factor..Oracle’s recent decisions about support raise questions about its interest in supporting anyone who is not an enterprise-class user of the Oracle-branded MySQL offering. Calpont has a deal through 2012 that includes an OEM license to integrate and use MySQL as the InfiniDB branded solution, and access to the MySQL channel. What will happen beyond that is clearly a concern. Read more of this post

EMC Jumps Into ADBMS Appliance Game

The Data Computing Appliance, first deliverable from EMC’s acquisition of Greenplum, was announced last month, only 75 days after the acquisition closed, and it doesn’t lack for ambition.  Pat Gelsinger, President and Chief Operating Officer, EMC Information Infrastructure, pointed to the high level opportunity: unlocking the “hidden value” of enormous and growing data assets every company is increasingly holding, and often failing to leverage. The appliance will reach many hitherto untapped resources in the data centers that EMC occupies. Adding EMC’s manufacturing, sales and marketing, and reference architectures to the Greenplum IP brings what Gelsinger calls Greenplum’s “first phase” to its completion. And begins what is likely to be a sizable battle with Oracle, Teradata and IBM, if EMC mounts campaigns and spending to match its ambitious vision. Read more of this post

Aster Data Adds Columnar Storage, Puts Stake in Ground for Hybrid Multistores

Aster Data has announced its new version, nCluster 4.6, which now includes a column data store, staking a claim as the first ADBMS to combine SQL and MapReduce on a hybrid row and column MPP system. While its R&D has hitherto been focused on enabling advanced in-database analytic processing in its flagship “Data-Analytics Server, ” Aster has clearly had other irons in the fire. CTO Tasso Argyros tells me that the new column store is entirely new, written from scratch to ensure that Aster’s SQL-MR is a universal programming layer atop storage, and that its 1000+ MapReduce-ready analytic functions (and UDFs) will run on both row- and column-based data. Read more of this post

More TDWI Notes – ParAccel Rolling On, HP Stalled, Vertica Leading Insurgents

On my second day at TDWI, I was in meetings all day – events like this are a great opportunity for analysts to catch up with many of the companies they follow at one time, and this particular one was packed with sponsors. Congrats to the folks who sell sponsorships – they had a packed exhibit hall, and a lot of very interested attendees. I got a chance to chat at a few booths (all buzzing), ask a few attendees some real-world questions (and was asked some surprising ones myself), and get a sense of the workload in the trenches (heavy and growing.)

Read more of this post

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space – like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

Microsoft’s Parallel DW – Still Waiting

Microsoft’s SQL Server Parallel Data Warehouse (PDW) has been eagerly awaited for a long time. It still is. Though much of the news at the BI Conference running in parallel with TechEd in New Orleans (discussed here) was generally quite good, the PDW story was much less so. It’s late, and it’s not all there. Read more of this post

VoltDB – DIY OLTP. Open Source. Win.

In a seemingly perfect marriage of product and target market, database pioneer Mike Stonebraker’s new in-memory database company VoltDB has emerged from stealth mode using the open source model, soon to be open core. Its first release, GPL licensed Community Edition will appeal to developers who need blindingly fast transaction processing and are willing to do a lot of work themselves to get there – the do it yourself (DIY) database. Who better than the Gluecon community? Gluecon was the perfect place to do the formal roll out, filled as it is with hands-on folks looking to work with NoSQL products (like Cassandra, CouchDB, MongoDB, Riak, Voldemort, etc.)

Read more of this post

Microsoft and HP Announce New Application-to-Infrastructure Model/Partnership [Yawn]

(Co-authored with Charles King of PUND-IT, Inc.)

Microsoft and HP announced a new investment of $250M into their Frontline Partnership, designed to deliver integrated stacks supporting applications from Microsoft’s Exchange and SQL Server and beyond into the cloud. As part of this effort, the companies plan to deliver solutions built on what they defined as a “next generation infrastructure-to-application model” which will help speed implementation, eliminate IT management complexities and lower overall costs by automating manual processes. With this strategic partnership, HP and Microsoft will also collaborate on an engineering road map for joint products including data management machines using the new SQL Server MPP database option when it is announced, pre-packaged application solution bundles, comprehensive virtualization offerings and integrated management tools. Read more of this post

Dataupia – Optimism for 2009

I recently had the chance to chat with John O’Brien, CTO and co-founder of MPP data warehouse appliance vendor Dataupia (pronounced like “utopia”). He was in an upbeat mood, as the company leverages the recent addition to its B round of financing secured late last year to drive business to the next level. With a new CEO (former Cognos senior vice president of world operations Tony Sirianni), a growing number of references, prospects turning into customers, and OEM partners supplementing its growing direct sales force, prospects appear good. Now fielding some 60 employees in Cambridge, Massachusetts, Dataupia can press their value proposition of being “well matched to prospects’ needs for lower price, flexibility, and minimal execution costs for changing or supplementing existing architectures.”

Dataupia is climbing the scale lists – its largest install is 150 TB at Subex, hosting an OSS system for British Telecom. Marketing VP Samantha Stone has begun to push out press releases touting customer wins, always an encouraging sign. The wins in telecom are being supplemented by opportunities in other spaces such as the intriguing traffic information analysis system at ITIS (details on the company’s web site). New solution categories highlight the emerging opportunities that follow an economic change like the one appliances are driving. “We’ve taken another zero off the cost,” says O’Brien. “Now it’s a matter of only a few tens of thousands to get started on applications that seemed out of reach before for many firms.”

O’Brien believes that a key differentiator is that customers don’t connect to Dataupia directly, but through their primary platform: Microsoft SQL Server, Oracle, and now IBM DB2 (although no production references for the latter are available yet.) “When customers hit a pain point, architecture is a constraint for other DBMSs. We appear to be a data store for that database, so the style of application design and usage doesn’t need to change.” Dataupia can use transactional tables from the primary DBMS  – their optimizers sit atop its added one.  Multidimensional aggregates can replace materialized views. So, “agility” becomes a key message. Teardown and reconfiguration are easier, hence faster and cheaper. Less DBA optimization time and quick install are powerful value propositions.

To get to the next level, Dataupia will have to add some features: replication, disaster recovery and internationalization top customer wish lists. As Dataupia turns its sights from getting early reference customers to using its improving finances to drive growth, its more formalized structure and sales processes should help it move towards another financing round as the economy turns upward next year. Then, O’Brien asserts, the firm will aspire to much more rapid expansion.