IBM Ends Hadoop Distribution, Hortonworks Expands Hybrid Open Source

by Merv AdrianResearch VP, Gartner |  June 21, 2017

IBM has followed Intel and EMC/Pivotal in abandoning efforts to make a business of Hadoop distributions, and followed Microsoft in making Hortonworks its supplying partner. At the former Hadoop Summit, now called Dataworks (itself a sign of the shift from Hadoop-centric positioning), IBM announced it will discontinue its IBM Open Platform/BigInsights offering, and will instead OEM Hortonworks’ HDP. In a 7 year agreement, IBM will provide its Data Science Experience to Hortonworks, which will make it a part of the marketed HDP distribution. Hortonworks CEO Rob Bearden noted in a discussion with analysts that “we were already shipping the same bits” and that the ODPi relationship they share had helped to make the commonalities more obvious and easily rationalized (though the ODPi platform is still a small subset of the typical stack used by customers.)

Like Hortonworks’ earlier Enterprise Data Warehouse Optimization offering, this new packaging strategy will permit the firm to combine its open source components with partner pieces that are not open source.Hybrid open source combines “the free stuff” everyone shares with proprietary bits that help differentiate and monetize the whole package, the theory goes, and clearly both vendors hope the go to market strategy will play in Peoria as well as it does in the Silicon Valley venture capital community.

The extra-cost, non-Apache piece here is IBM’s Big SQL, which IBM calls the ultimate hybrid engine, permitting concurrent, optimized use of Hive, HBase and Spark and other sources using a single database connection. This echoes the Pivotal exit last year – Pivotal’s HAWQ was supposed to blunt the performance advantage of Cloudera’s Impala over Hive. It didn’t happen. Hortonworks has had to keep plowing R&D into Hive. Whether BigSQL will be different is hardly clear – IBM has not made convincing progress in the market with it.

The two firms will also up the ante on the  data governance front, which both believe will be a key driver of demand in mainstream firms in the months ahead. They will advance the integration of IBM BigIntegrate, BigQuality and Information Governance Catalog with the Apache Atlas project Hortonworks has spearheaded. Being the go-to providers for a large base of IBM customers who are already relying on IBM for much of their governance stack and hoping to extend its coverage to new data appears promising. The deal signals that the elevation of governance and security as issues will not only continue, but be increasingly pursued by players with real credibility and experience with both. The huge majority of Hadoop adopters who are currently stalled in attempting to get to broad production use will need to deal with those issues, as well as the serious skills and performance gaps they struggle with today, if and when their pilots and first projects gain internal acceptance.

For IBM, the partnership will permit redirecting its internal resources from BigInsights to work on machine learning, Spark and governance. IBM likely had nearly as many developers as customers on BigInsights, and many of its users were apparently given the offering as part of larger deals – usage stories have been few and far between, despite some suggestions in the press (though not directly from IBM) of hundreds of users. Both companies can focus their story higher up the stack and continue the evolution of the Hadoop narrative from “components and platforms” to “use cases and solutions.” Will that drive footprint expansion in today’s customer base, and break down the barriers Hadoop has been bouncing off? It remains to be seen.

Such packages demonstrate what Gartner calls the “disaggregation” of the Hadoop stack – the recognition that building a solution requires both more and less than the Hadoop distribution itself. This is a commercial dilemma for the pure plays – and is one of the reasons their numbers are shrinking even as they move away from being “Hadoop companies” to something else.

There are other challenges ahead for Hortonworks – they’ll have to accommodate and convert some IBM customers, and maybe convince them to pay. Moreover,  the IBM deal does not help them in the cloud much. Their Microsoft partnership will continue to do that, but the choice between those two partners will be another interesting challenge for the Hortonworks sales force to navigate in the months ahead.

Merv Adrian


Merv Adrian is an Analyst on the Data Management team following operational DBMS, Hadoop, Spark, NoSQL and adjacent technologies. Mr. Adrian also tracks the increasing impact of open source on data management software and monitors the changing requirements for data security in information platforms.

Originally published here.

You may also like...