Symposium Notes – Day Four Returns to Data Security, and to Hadoop

Thursday, the final day, reinforced a theme for the week: data security is heating up, and organizations are not ready. It came up in half of today’s final 10 meetings.

“Is my data more secure, or less, in the cloud?”

“Does using open source software for data management compromise how well I can protect it?”

“I’m a public utility – can I put meter data in the cloud safely? What about if it is used to drive actions at the edge?”

“I’m using drones for mapping and the data is in the cloud – am I exposed?”

–more–

Symposium Notes – Day Three Features Data Assembly

With 24 meetings under my belt from the first two days at Orlando Symposium, Wednesday’s 13 (and a presentation) didn’t look quite as daunting. It began well, with enough time for a muffin and some tea at 730 AM in the analyst workroom near to the cubicle I’d spend the day in. Then I launched right in to a couple of predictive analytics discussions.

–more–

Symposium Notes – Day Two Jumps in the (Data) Lake

My second day of Symposium 1:1 meetings continued the “security of big data” theme (4 of the day’s 15 conversations – usually, but not always, about HDFS-based data), with a data lake flavor. The concerns were retroactive – often driven by an internal audit. “We built it, now how do we secure it?” is a common question. And “it’s almost all structured data so far,” confirming what Gartner found in the 2016 big data survey. Vendor conversations (4 of the day’s 1:1s) also included a look at security – “how much is this going to matter to my customers? Who can I partner with?” has been a typical thread, and I met with a security consultancy whose practice seems to be ramping rapidly.

–more–

Prediction Is Hard – Especially About the Future

OK, I admit it – I stole the title from a much smarter man. I thought that man was Yogi Berra, but maybe not – more about that at the end of this post.

Every year, Gartner issues a series of Predicts documents. This year I had the pleasure of doing one for my team on Information Infrastructure Technology. Now, I’m a software guy, and the team I’m on is all software people, so a document assigned to our team would typically be about – well, information software technology. But that would have missed the point rather dramatically, so I connected with a few colleagues and got their OK to use some of their predictions in the small set any document can include.

— more on Gartner blog —

Hadoop Investments Continue: Teradata, HP Jockey For Position

Interest from the leading players continues to drive investment in the Hadoop marketplace. This week Teradata made two acquisitions – Revelytix and Hadapt – that enrich its already sophisticated big data portfolio, while HP made a $50M investment in, and joined the board of, Hortonworks. These moves continue the ongoing effort by leading players. 4 of the top 5 DBMS players (Oracle, Microsoft, IBM, SAP and Teradata) and 3 of the top 7 IT companies (Samsung, Apple, Foxconn, HP, IBM, Hitachi, Microsoft) have now made direct moves into the Hadoop space. Oracle’s recent Big Data Appliance and Big Data SQL, and Microsoft’s HDInsight represent substantial moves to target Hadoop opportunities, and these Teradata and HP moves mean they don’t want to be left behind.

more

 

Aspirational Marketing and Enterprise Data Hubs

In the Hadoop community there is a great deal of talk of late about its positioning as an Enterprise Data Hub. My description of this is “aspirational marketing;” it addresses the ambition its advocates have for how Hadoop will be used, when it realizes the vision of capabilities currently in early development. There’s nothing wrong with this, but it does need to be kept in perspective. It’s a long way off.

–more–

Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today range from the not-even-submitted to GA – if you’re interested, a bit of familiarity will help. Even more useful: patience.

–more–

That Exciting New Stuff? Yeah… Wait Till It Ships.

A brief rant here: I am asked with great frequency how this RDBMS will hold off that big data play, how data warehouses will survive in a world where Hadoop exists, or whether Apple is done now that Android is doing well. There is a fundamental fallacy implicit in these questions.

–more–

2013 Data Resolution: Avoid Architectural Cul-de-Sacs

I had an inquiry today from a client using packaged software for a business system that is built on a proprietary, non-relational datastore (in this case an object-oriented DBMS.) They have an older version of the product – having “failed” with a recent upgrade attempt.

The client contacted me to ask about ways to integrate this OODBMS-based system with others in their environment. They said the vendor-provided utilities were not very good and hard to use, and the vendor has not given them any confidence it will improve. The few staff programmers who have learned enough internals have already built a number of one-off connections using multiple methods, and were looking for a more generalizable way to create a layer for other systems to use when they need data from the underlying database. They expect more such requests, and foresee chaos, challenges hiring and retaining people with the right skills, and cycles of increasing cost and operational complexity.
My reply: “you’re absolutely right.”

Amazon Redshift Disrupts DW Economics – But Nothing Comes Without Costs

At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’  steady inroads toward enterprise, mainstream application acceptance very visible.

Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running Paraccel’s high-performance columnar, compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16 virtual cores and 120 GB of memory and operate on 10Gigabit ethernet.

Reserved pricing (the more likely scenario, involving a commitment of 1 year or 3 years) is set at “under $1000 per TB per year” for a 3 year commitment, combining upfront and hourly charges. Continuous, automated backup for up to 100% of the provisioned storage is free. Amazon does not charge for data transfer into or out of the data clusters. Network connections, of course, are not free  – see Doug Henschen’s Information Week story for details.

This is a dramatic thrust in pricing, but it does not come without giving up some things.

More…