Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

Calpont’s InfiniDB – Another ADBMS Insurgent Arises

Calpont, rapidly emerging as yet another contender in the ADBMS sweepstakes, has announced version 2.0 of InfiniDB, its columnar MPP offering over shared storage. The value proposition hits now-familiar themes: high-performance query, fast data loading, data compression, and parallelized user defined functions (UDFs), all of which are becoming key checkoff capabilities. InfiniDB also hits hard on pricing, which it says dramatically undercuts that of its competitors. And a 30-day free trial of the enterprise edition sweetens the offer. For those comfortable with open source, the 2.0 release of the  community edition is available as well. Calpont says the community edition (which is limited to a single server but is otherwise database feature-complete) has had 15,000 downloads. But the company’s relationship with Oracle for its MySQL components must be considered a risk going forward.

InfiniDB, like Infobright, is built atop Oracle’s MySQL. (I posted about Infobright last year, and it also has made significant progress, drawing favorable comment in the open source community for its continuing maturation.)  Calpont’s relationship with Oracle must be seen as a risk factor..Oracle’s recent decisions about support raise questions about its interest in supporting anyone who is not an enterprise-class user of the Oracle-branded MySQL offering. Calpont has a deal through 2012 that includes an OEM license to integrate and use MySQL as the InfiniDB branded solution, and access to the MySQL channel. What will happen beyond that is clearly a concern. Read more of this post

EMC Jumps Into ADBMS Appliance Game

The Data Computing Appliance, first deliverable from EMC’s acquisition of Greenplum, was announced last month, only 75 days after the acquisition closed, and it doesn’t lack for ambition.  Pat Gelsinger, President and Chief Operating Officer, EMC Information Infrastructure, pointed to the high level opportunity: unlocking the “hidden value” of enormous and growing data assets every company is increasingly holding, and often failing to leverage. The appliance will reach many hitherto untapped resources in the data centers that EMC occupies. Adding EMC’s manufacturing, sales and marketing, and reference architectures to the Greenplum IP brings what Gelsinger calls Greenplum’s “first phase” to its completion. And begins what is likely to be a sizable battle with Oracle, Teradata and IBM, if EMC mounts campaigns and spending to match its ambitious vision. Read more of this post

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space – like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

Vertica Projects Leadership, Embraces MapReduce (Sorta)

With the August announcement of Vertica Analytic Database 3.5, Vertica is laying claim to leadership of the new ADBMS vendors. With its most recent numbers – several dozens of customers are now in production and the company expects to pass 100 this year – the assertion bears thinking about. Driving forward with an aggressive release strategy, Vertica is showing its maturity and increasing ability to challenge the old school leaders like Teradata and Netezza – but with a software-only strategy. This agility allowed it to offer early support for release 3.5 in quick succession after its last release, with GA scheduled for later this year.  Read more of this post

GoldenGate Software Buy a Win for Oracle

Oracle today announced it is buying GoldenGate Software for an undisclosed sum, likely a couple of hundred million dollars. To revisit some facts from an earlier post, Goldengate had been in business 15 years, with some 500 customers, 4000 solutions deployed, and strong partnerships with Oracle, Teradata and Ingres on the database side, and Microstrategy and Amdocs in the app and BI space. Their message revolved around 3 key attributes of their changed-data-based replication technology: heterogeneity, real-time (log-based) performance, and high-volume transactional support. Read more of this post

Aster Appliance Elevates MapReduce Chatter, ADBMS Visibility

Since my last post about Aster, the analytic DBMS (ADBMS) vendor has added another arrow to its quiver. Its new MapReduce Data Warehouse Appliance Express Edition starts at $50,000, and includes Aster nCluster on Dell hardware and a copy of MicroStrategy BI software for up to 1 Tb of user data, which Aster clearly sees as a sweet spot. (MicroStrategy has been doing a lot of seeding with the ADBMSs lately; it also has  an introductory bundling deal with Sybase IQ.)  Delivering a ‘compute rich’ appliance on commodity hardware, with reduced operating costs, certainly hits all the right notes. But is 1 Tb  the sweet spot for MapReduce? I think not – although it makes a great starting point, and that may be Aster’s real opportunity – give ‘em a taste of what SQL plus MapReduce can do, and watch them demand more and more. And sell it to them. Dell and MicroStrategy should love this strategy – if it works. Read more of this post

ParAccel Secures $22 Million – The Game’s Afoot

Recently, ParAccel published a TPC-H benchmark, and I said here that it was a coup that ought to get them significant attention. The blizzard of discussion that ensued was no doubt gratifying for ParAccel – Google reported 182 hits for “the past week” for them as of 6/28.

Now, Google hits – and visibility in general – aren’t everything. In a relatively crowded field, ParAccel will need more than just a fairly well-received press release – they will need money. Money to drive marketing, money to turn interest into leads, and money to fund a sales and field force to convert those leads into business. The good news? They just got some. On June 29th the firm announced a C round of venture capital has been secured, to the tune of $22 million led by Menlo Ventures; ParAccel’s previous investors participated as well. Read more of this post

ParAccel Rocks the TPC-H – Will See Added Momentum

ParAccel, another of the analytic database upstarts, has weighed in on Sun hardware with a record-shattering benchmark that its competitors have thus far avoided – the 30 TB TPC-H. It’s been two years since anyone has published a 30 TB TPC-H, and only 10 of any size (all smaller) have been published in the past year. One can scoff (many do) at this venerable institution, but TPC benchmarks are a rite of passage, and a badge of engineering prowess. The ParAccel Analytic Database (PADB) has set new records, raising its profile dramatically in one fell swoop. PADB came in at 16x the price/performance of Oracle, the prior leader (and only other vendor willing to tackle the 30Tb benchmark to date.) PADB, running on Sun Opteron 2356 servers, Sun Fire™ X4540 storage servers and OpenSolaris™, was 7x faster on queries and 4.6x faster loading the data than the 2 year old Oracle result. And because of its architecture, the construction and tuning of indexes and partitioning strategies were not needed. TPC rules are specific about having product in GA within 90 days, so one can expect to see PADB version 2.0, on which the benchmark was based, out in Q3.

ParAccel has seen some skepticism in the analyst community because of its relatively small published number of customers. It claims a dozen, and half are listed on its web site. Other vendors, like Vertica and Greenplum, have been very forthcoming promoting theirs, but both have more time in the market. PADB was released in Q4 2007 and really began its arc in 2008; Vertica has a year head start, and Greenplum even more. Rumors have also floated about whether CTO and founder Barry Zane was leaving. I had a conversation with Barry in late June to discuss the business and the benchmarks. He was clearly excited about the benchmarks, in which he was very involved, even working on the full disclosure report personally  – “It got to be like a hobby for me,” he said – and he was quite clear that he is not going anywhere. Read more of this post

Can GoldenGate Software Continue to Grow Transactional Replication?

GoldenGate Software may not be a well-known name, except in circles where transactional replication is a hot topic, but after 15 years in business, they have assembled a sizable base of some 500 customers, with 4000 solutions deployed, and partnerships with vendors as diverse as Teradata and Ingres on the database side, and Microstrategy and Amdocs in the app and BI space. Their message revolves around 3 key attributes of their changed-data-based replication technology: heterogeneity, real-time (log-based) performance, and high-volume transactional support (committed only.) And despite their notoriously closed-mouthed approach to their finances, it’s fair to say that they are generating tens of millions of dollars in revenue yearly (Hoover’s says $9.7M in 2007, but I believe that’s very low), so it’s evident the marketplace is interested. The big question is whether GoldenGate will invest to sustain and grow sales, or watch larger competitors competitors take their market away, now that they’re on the radar. Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 16,317 other followers