Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.


EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space – like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

ParAccel Rocks the TPC-H – Will See Added Momentum

ParAccel, another of the analytic database upstarts, has weighed in on Sun hardware with a record-shattering benchmark that its competitors have thus far avoided – the 30 TB TPC-H. It’s been two years since anyone has published a 30 TB TPC-H, and only 10 of any size (all smaller) have been published in the past year. One can scoff (many do) at this venerable institution, but TPC benchmarks are a rite of passage, and a badge of engineering prowess. The ParAccel Analytic Database (PADB) has set new records, raising its profile dramatically in one fell swoop. PADB came in at 16x the price/performance of Oracle, the prior leader (and only other vendor willing to tackle the 30Tb benchmark to date.) PADB, running on Sun Opteron 2356 servers, Sun Fire™ X4540 storage servers and OpenSolaris™, was 7x faster on queries and 4.6x faster loading the data than the 2 year old Oracle result. And because of its architecture, the construction and tuning of indexes and partitioning strategies were not needed. TPC rules are specific about having product in GA within 90 days, so one can expect to see PADB version 2.0, on which the benchmark was based, out in Q3.

ParAccel has seen some skepticism in the analyst community because of its relatively small published number of customers. It claims a dozen, and half are listed on its web site. Other vendors, like Vertica and Greenplum, have been very forthcoming promoting theirs, but both have more time in the market. PADB was released in Q4 2007 and really began its arc in 2008; Vertica has a year head start, and Greenplum even more. Rumors have also floated about whether CTO and founder Barry Zane was leaving. I had a conversation with Barry in late June to discuss the business and the benchmarks. He was clearly excited about the benchmarks, in which he was very involved, even working on the full disclosure report personally  – “It got to be like a hobby for me,” he said – and he was quite clear that he is not going anywhere. Read more of this post

Greenplum – Reaching Escape Velocity

Greenplum is one of several companies who have defied the notion that “RDBMS has been done,” and one of the most successful of late on the high end (of scale, but not necessarily price.) The argument goes that it’s a waste of time to build a new enterprise class RDBMS – kernel, optimizer, and associated feature set  – because there is no room left for real innovation. It takes years, deep engineering expertise, and money – and when you’re done, your reward is to enter a crowded market dominated by players who have multi-billion dollar deep pockets, massive sales and engineering teams, and legions of loyal customers. A losing proposition. And yet, Greenplum has done it, and is winning deals. Regularly, and at an increasing rate. Read more of this post

EnterpriseDB’s Big Boost From IBM Only Part of the Story

EnterpriseDB has had a steady build as an Oracle-compatible alternative DBMS. IT Market Strategy had a chance to catch up with Andy Astor, co-founder and EVP of business development, in the midst of the frenzy around the launch of IBM’s DB2 version 9.7 (discussed here). Andy was gracious enough to make himself available late (very late) in the evening to clarify a few questions about the IBM licensing and use of EnterpriseDB’s technology, and cleared up a few points of confusion we had. Read more of this post

DB2 Runs PL/SQL. Say WHAT?

Today IBM announced new features, products, and solution packages in its DB2 9.7 (Cobra ) release. And a new version of InfoSphere, including Informix and z versions. I’ll post about those later, but here I’d like to just highlight a buried item that got little play: DB2 can now run PL/SQL.


In the engine. Read more of this post