Strata Standards Stories: Different Stores For Different Chores

Has HDFS joined MapReduce in the emerging “legacy Hadoop project” category, continuing the swap-out of components that formerly answered the question “what is Hadoop?” Stores for data were certainly a focus at Strata/Hadoop World in NY, O’Reilly’s well-run, well-attended, and always impactful fall event. The limitations of HDFS, including its append-only nature, have become inconvenient enough to push the community to “invent” something DBMS vendors like Oracle did decades ago: a bypass. After some pre-event leaks about its arrival, Cloudera chose its Strata keynote to announce Kudu, a new columnstore written in C++, bypassing HDFS entirely. Kudu will use an Apache license and will be submitted to the Apache process at some undetermined future time.


Hadoop Projects Supported By Only One Distribution

The Apache Software Foundation has succeeded admirably in becoming a place where new software ideas are developed: today over 350 projects are underway. The challenges for the Hadoop user are twofold: trying to decide which projects might be useful in big data-related cases, and determining which are supported by commercial distributors. In Now, What is Hadoop? And What’s Supported? I list 10 supported by only one: Atlas, Calcite, Crunch, Drill, Falcon, Kite, LLAMA, Lucene, Phoenix and Presto. Let’s look at them a little more.


Strata Spark Tsunami – Hadoop World, Part One

New York’s Javits Center is a cavernous triumph of form over function. Giant empty spaces were everywhere at this year’s empty-though-sold-out Strata/Hadoop World, but the strangely-numbered, hard to find, typically inadequately-sized rooms were packed. Some redesign will be needed next year, because the event was huge in impact and demand will only grow. A few of those big tent pavilions you see at Oracle Open World or Dreamforce would drop into the giant halls without a trace – I’d expect to see some next year to make some usable space available.

So much happened, I’ll post a couple of pieces here. Last year’s news was all about promises: Hadoop 2.0 brought the promise of YARN enabling new kinds of processing, and there was promise in the multiple emerging SQL-on-HDFS plays. The Hadoop community was clearly ready to crown a new hype king for 2014.

This year, all that noise had jumped the Spark.

— This post is continued on my Gartner blog —

Amazon Redshift Disrupts DW Economics – But Nothing Comes Without Costs

At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’  steady inroads toward enterprise, mainstream application acceptance very visible.

Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running Paraccel’s high-performance columnar, compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16 virtual cores and 120 GB of memory and operate on 10Gigabit ethernet.

Reserved pricing (the more likely scenario, involving a commitment of 1 year or 3 years) is set at “under $1000 per TB per year” for a 3 year commitment, combining upfront and hourly charges. Continuous, automated backup for up to 100% of the provisioned storage is free. Amazon does not charge for data transfer into or out of the data clusters. Network connections, of course, are not free  – see Doug Henschen’s Information Week story for details.

This is a dramatic thrust in pricing, but it does not come without giving up some things.


Cloudera-Informatica Deal Opens Broader Horizons for Both

Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future. Read more of this post

Living in the Present is SO Yesterday

It’s an occupational hazard of living in the future that analysts can begin to ignore the present – unless we make it a practice to seek it out. Here in the Valley, that can be difficult, when being a week behind the latest version of something the rest of the world hasn’t heard of yet equates to being a luddite. That can lead to AADD (analyst attention deficit disorder.) Read more of this post

How the Cloud Will Lead Us to Industrial Computing

From  Judith Hurwitz, president, Hurwitz & Associates (

I spent the other week at a new conference called Cloud Connect. Being able to spend four days emerged in an industry discussion about cloud computing really allows you to step back and think about where we are with this emerging industry. While it would be possible to write endlessly about all the meeting and conversations I had, you probably wouldn’t have enough time to read all that. So, I’ll spare you and give you the top four things I learned at Cloud Connect. I recommend that you also take a look at Brenda Michelson’s blogs from the event for a lot more detail. I would also refer you to Joe McKendrick’s blog from the event. Read more of this post

Judith Hurwitz Comments on Cloud Impact on HW Biz

My longtime friend and colleague Judith Hurwitz and I have decided to cross-post on one another’s blogs (hers is at I’m delighted to have her here. For me, this is another step in the continuing evolution of the loosely coupled independent analyst collaborations I find myself participating in more and more, and a very exciting development. Welcome, Judith!


I am thrilled to be contributing my “cloudy” observations to your blog. I have been an analyst and consultant focusing on distributed software. I look at everything from service oriented architectures, service management, and even information management. My philosophy is that cloud computing, in all its iterations, is the future of a significant portion of enterprise software.  Judith Hurwitz, President, Hurwitz & Associates

I thought I would provide my thoughts on the future of hardware in the context of where software is headed.

It is easy to assume that with the excitement around cloud computing would put a damper on the hardware market. But I have news for you. I am predicting that over the next few years hardware will be front and center.  Why would I make such a wild prediction? Here are my three reasons: Read more of this post

Informatica Passes Half-Billion Mark, Buys Siperian, Targets Cloud

Informatica has announced another, long-rumored acquisition: Siperian, thus continuing a steady march toward a comprehensive portfolio play. In 2009, its strong growth path made it the clear independent leader in data integration.  With Release 9, its vision of a data integration platform grew to providing a comprehensive approach to everything from data discovery services to data quality. While growth slowed during a tough year for the economy overall, Informatica grew revenue in every quarter, and made key acquisitions in 3 successive quarters (Applimation, AddressDoctor and Agent Logic) and began to make significant moves into the cloud via partnerships with Amazon, and others. Agent Logic added event detection and processing to support real-time alerting and response. As 2010 begins, this latest move is synergistic from the outset; Rob Karel points out in his excellent blog post that “Siperian MDM technology…already is deeply integrated with Informatica’s identity resolution and postal address technology. In addition…Siperian MDM customers [are] using Informatica for data integration and data quality, meaning there is a lot of existing experience and know-how on integrating Informatica’s portfolio with Siperian.” Read more of this post

Xkoto’s Database Virtualization Expands Cloud Opportunities

Xkoto, the database virtualization pioneer, has generated substantial interest since its first deployments in 2006. Still privately held and in investment mode, Xkoto sees profitability on the horizon, but offers no target date, and appears in no hurry. Its progress has been steady: in early 2008, a B round of financing led by GrandBanks Capital allowed a step up to 50 employees as the company crossed the 50 customer mark. 2008 also saw Xkoto adding support for Microsoft SQL Server to its IBM DB2 base. Charlie Ungashick, VP of marketing for Xkoto, says that 2009 has been going well, and the third quarter was quite strong. And at the end of September 2009, Xkoto announced GRIDSCALE version 5.1, which adds new cluster management capabilities to its active-active configuration model, as well as Amazon EC2 availability. Read more of this post


Get every new post delivered to your Inbox.

Join 21,684 other followers