Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.


Hadoop 2013 – Part Three: Platforms

In the first two posts in this series, I talked about performance and projects as key themes in Hadoop’s watershed year. As it moves squarely into the mainstream, organizations making their first move to experiment will have to make a choice of platform. And – arguably for the first time in the early mainstreaming of an information technology wave – that choice is about more than who made the box where the software will run, and the spinning metal platters the bits will be stored on.There are three options, and choosing among them will have dramatically different implications on the budget, on the available capabilities, and on the fortunes of some vendors seeking to carve out a place in the IT landscape with their offerings.

– more –

Aster Data Adds Columnar Storage, Puts Stake in Ground for Hybrid Multistores

Aster Data has announced its new version, nCluster 4.6, which now includes a column data store, staking a claim as the first ADBMS to combine SQL and MapReduce on a hybrid row and column MPP system. While its R&D has hitherto been focused on enabling advanced in-database analytic processing in its flagship “Data-Analytics Server, ” Aster has clearly had other irons in the fire. CTO Tasso Argyros tells me that the new column store is entirely new, written from scratch to ensure that Aster’s SQL-MR is a universal programming layer atop storage, and that its 1000+ MapReduce-ready analytic functions (and UDFs) will run on both row- and column-based data. Read more of this post

Wrapping Up TDWI – Agile? You Bet. And You Should.

I’ve posted twice about TDWI’s San Diego event, and I still haven’t exhausted the thoughts I wanted to share. That’s a measure of just how important and successful I think the show was. Three things jumped out at me:

  • The audience is back, and it’s ready to spend. The event was buzzing; I was told by organizers that the numbers significantly exceeded expectations. That was easy to see; speeches, booths, and hallways were packed. Vendors told me booth traffic was great, and that visitors (although typically not budget holders) were in or preparing for projects and product acquisitions.
  • The hunger for content continues. In my session and in others, I saw show-of-hands responses to questions like “how many of you have been here before?” “How many of you have built this kind of system?” “How many of you have been trained on [pick a DW-related topic]?”  The responses made it clear that like other TDWI events I’ve been to, this one was packed with people who were new or intermediate users with training in mind. TDWI’s basic training mission has never been healthier.
  • Agile matters. A lot. My first post on the event was put up rather quickly and as the event progressed, I heard the theme flesh out well, with real stories from users who applied the techniques to their projects. My initial impression that we might be looking at another buzzword poorly applied was wrong. Agile’s real, and TDWI’s coverage and guidance is rich and well worth investigating. The vendors? Well, they’re doing what they always do. Caveat emptor. I repeat: it’s not an adjective.  Learn what it means and apply it. You can’t buy it. Read more of this post

TDWI Event Focuses on Agile BI. What’s That?

I’m at the Data Warehouse Institute’s San Diego conference this week, and experimenting with an incremental approach to blogging for this event; I’ll try to get on a few times in the next 2 days (unfortunately that’s all the time I’ll have here) and communicate some quick thoughts, as opposed to my more typical style, which is longer and more in depth. That will no doubt follow on many of the topics later.

I begin with the keynote from Wayne Eckerson this morning, where he offered his thoughts on Agile BI. Agile is a loaded word; for developers it means a very specific set of techniques and methodologies. Data folk are not part of that culture in most cases, and they use the word as an adjective. Wayne attempted to bridge the gap in a few places, but by and large, his hints at best practices were not particularly new, or surprising, or tied closely to the Agile playbook. Read more of this post

EMC Buys Greenplum – Big Data Realignment Continues

EMC’s acquisition of Greenplum, announced today as a cash transaction, reaffirms the obvious: the Big Data tsunami upends conventional wisdom. It has already reshaped the market, spawning the most ferment in the RDBMS (and non-R DBMS via the noSQL players) space in years. When I first posted on Greenplum over a year ago, I said that

Open source + capital has created an intriguing new model of rapid innovation in “mature” markets, and the database space - like BI – is not a done deal. It is indeed possible to escape the gravity well, if you execute. Greenplum is getting it done, and is among the new stars to watch.”

Why the open source reference? Greenplum uses a parallelization layer atop PostgreSQL (like Aster, another of the new breed of ADBMS.)

Now EMC has written the next chapter in that story. In the process, it adds a new piece (after literally dozens of others in the past few years) to its own portfolio, which already includes unstructured data (via Documentum) and virtualization (via VMWare), layered in among the industry-leading storage and information management pieces. Disruptive? You bet. Is EMC finished? I doubt it. Candidates? BI tools, ETL, MDM, data integration come to mind. Losers? At least one big one. Read on. Read more of this post

Will AEP Replace RDBMS? A Dialogue With Charles Brett

Analytic Event Processing (AEP) is hot. But does it mean RDBMS begins to decline in importance? Charles Brett of C3B Consulting and I recently had a quick dialogue about it and came up with different conclusions. That conversation is reproduced here. It’s only the beginning – l hope you will weigh in with your thoughts. Read more of this post

Vertica Projects Leadership, Embraces MapReduce (Sorta)

With the August announcement of Vertica Analytic Database 3.5, Vertica is laying claim to leadership of the new ADBMS vendors. With its most recent numbers – several dozens of customers are now in production and the company expects to pass 100 this year – the assertion bears thinking about. Driving forward with an aggressive release strategy, Vertica is showing its maturity and increasing ability to challenge the old school leaders like Teradata and Netezza – but with a software-only strategy. This agility allowed it to offer early support for release 3.5 in quick succession after its last release, with GA scheduled for later this year.  Read more of this post

Aster Appliance Elevates MapReduce Chatter, ADBMS Visibility

Since my last post about Aster, the analytic DBMS (ADBMS) vendor has added another arrow to its quiver. Its new MapReduce Data Warehouse Appliance Express Edition starts at $50,000, and includes Aster nCluster on Dell hardware and a copy of MicroStrategy BI software for up to 1 Tb of user data, which Aster clearly sees as a sweet spot. (MicroStrategy has been doing a lot of seeding with the ADBMSs lately; it also has  an introductory bundling deal with Sybase IQ.)  Delivering a ‘compute rich’ appliance on commodity hardware, with reduced operating costs, certainly hits all the right notes. But is 1 Tb  the sweet spot for MapReduce? I think not – although it makes a great starting point, and that may be Aster’s real opportunity – give ‘em a taste of what SQL plus MapReduce can do, and watch them demand more and more. And sell it to them. Dell and MicroStrategy should love this strategy - if it works. Read more of this post

ParAccel Rocks the TPC-H – Will See Added Momentum

ParAccel, another of the analytic database upstarts, has weighed in on Sun hardware with a record-shattering benchmark that its competitors have thus far avoided – the 30 TB TPC-H. It’s been two years since anyone has published a 30 TB TPC-H, and only 10 of any size (all smaller) have been published in the past year. One can scoff (many do) at this venerable institution, but TPC benchmarks are a rite of passage, and a badge of engineering prowess. The ParAccel Analytic Database (PADB) has set new records, raising its profile dramatically in one fell swoop. PADB came in at 16x the price/performance of Oracle, the prior leader (and only other vendor willing to tackle the 30Tb benchmark to date.) PADB, running on Sun Opteron 2356 servers, Sun Fire™ X4540 storage servers and OpenSolaris™, was 7x faster on queries and 4.6x faster loading the data than the 2 year old Oracle result. And because of its architecture, the construction and tuning of indexes and partitioning strategies were not needed. TPC rules are specific about having product in GA within 90 days, so one can expect to see PADB version 2.0, on which the benchmark was based, out in Q3.

ParAccel has seen some skepticism in the analyst community because of its relatively small published number of customers. It claims a dozen, and half are listed on its web site. Other vendors, like Vertica and Greenplum, have been very forthcoming promoting theirs, but both have more time in the market. PADB was released in Q4 2007 and really began its arc in 2008; Vertica has a year head start, and Greenplum even more. Rumors have also floated about whether CTO and founder Barry Zane was leaving. I had a conversation with Barry in late June to discuss the business and the benchmarks. He was clearly excited about the benchmarks, in which he was very involved, even working on the full disclosure report personally  – “It got to be like a hobby for me,” he said – and he was quite clear that he is not going anywhere. Read more of this post


Get every new post delivered to your Inbox.

Join 134 other followers