Hadoop Distributions And Kids’ Soccer

The big players are moving in for a piece of the big data action.  IBM, EMC, and NetApp have stepped up their messaging, in part to prevent startup upstarts like Cloudera from cornering the Apache Hadoop distribution market. They are all elbowing one another to get closest to “pure Apache” while still “adding value.” Numerous other startups have emerged, with greater or lesser reliance on, and extensions or substitutions for, the core Apache distribution. Yahoo! has found a funding partner and spun its team out, forming a new firm called Hortonworks, whose claim to fame begins with an impressive roster responsible for most of the code in the core Hadoop projects. Think of the Doctor Seuss children’s book featuring that famous elephant, and you’ll understand the name.

While we’re talking about kids – ever watch young kids play soccer? Everyone surrounds the ball. It takes years to learn their position on the field and play accordingly. There are emerging alphas, a few stragglers on the sidelines hoping for a chance to play, community participants – and a clear need for governance. Tech markets can be like that, and with 1600 attendees packing late June’s Hadoop Summit event, all of those scenarios were playing out. Leaders, new entrants, and the big silents, like the absent Oracle and Microsoft.

more

Programmers: Pervasive’s Parallelization Provides Punch, Profit

After 27 years of steady growth, Austin, Texas-based Pervasive (PVSW) has become a $47M annual run rate software provider. Its portfolio includes a “zero admin, light footprint database” (the former BTrieve, now PervasiveSQL), data integration software (for SaaS and on premises applications), and data synchronization products for such apps as salesforce.com, Quickbooks and Microsoft Dynamics CRM. In 2009, it began leveraging its DataRush processing engine as a product, providing a solution for companies that want to take advantage of multicore architectures to drive dramatically enhanced performance on much smaller footprints, for programming data services tasks such as aggregation, de-duplication, cleansing, integration, matching and sorting, as well as data mining and predictive analytics. Read more of this post