Database Benchmarks – The Gift That Keeps on Giving

Yes, I know – not everyone believes database benchmarks are useful. My position is that there is value in benchmarks' role in helping engineers wring out bottlenecks, bugs and performance impediments in their products. Berni Schiefer, Technical Executive , Information Management Performance and Benchmarks for DB2, MDM and SolidDB, recently told me that "every time we run [TPC-C] we are astonished at how effectively it hammers every element of the system. We always find bugs, room for tuning. It's the nastiest, most punishing combination there is."

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space – like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one.

New TPC-H Record – Virtualized by ParAccel, VMware

You can set performance records in a virtualized environment – that's the message of the new 1 Tb TPC-H benchmark record (scroll down to see the 1Tb results) just released by ParAccel and VMware. Running on VMware's vSphere 4, the ParAccel Analytic Database (PADB) delivered a one-two punch: not only the top performance number for a 1 terabyte (TB) benchmark, but the top price-performance number as well. The results in a nutshell: 1,316,882 Composite Queries per Hour (QphH), a price/performance of 70 cents/QphH, and a data load rate of over 3.5 TBs per hour. ParAccel moved quickly to promote the result; oddly, VMware seems to have been asleep at the switch, with no promotion on its site as the release hit the wires, and a bland quote from a partner exec in the release itself.

Read more of this post

ParAccel Rocks the TPC-H – Will See Added Momentum

ParAccel, another of the analytic database upstarts, has weighed in on Sun hardware with a record-shattering benchmark that its competitors have thus far avoided – the 30 TB TPC-H. It’s been two years since anyone has published a 30 TB TPC-H, and only 10 of any size (all smaller) have been published in the past year. One can scoff (many do) at this venerable institution, but TPC benchmarks are a rite of passage, and a badge of engineering prowess. The ParAccel Analytic Database (PADB) has set new records, raising its profile dramatically in one fell swoop. PADB came in at 16x the price/performance of Oracle, the prior leader (and only other vendor willing to tackle the 30Tb benchmark to date.) PADB, running on Sun Opteron 2356 servers, Sun Fire™ X4540 storage servers and OpenSolaris™, was 7x faster on queries and 4.6x faster loading the data than the 2 year old Oracle result. And because of its architecture, the construction and tuning of indexes and partitioning strategies were not needed. TPC rules are specific about having product in GA within 90 days, so one can expect to see PADB version 2.0, on which the benchmark was based, out in Q3.

ParAccel has seen some skepticism in the analyst community because of its relatively small published number of customers. It claims a dozen, and half are listed on its web site. Other vendors, like Vertica and Greenplum, have been very forthcoming promoting theirs, but both have more time in the market. PADB was released in Q4 2007 and really began its arc in 2008; Vertica has a year head start, and Greenplum even more. Rumors have also floated about whether CTO and founder Barry Zane was leaving. I had a conversation with Barry in late June to discuss the business and the benchmarks. He was clearly excited about the benchmarks, in which he was very involved, even working on the full disclosure report personally  – "It got to be like a hobby for me," he said – and he was quite clear that he is not going anywhere.