Looking back at Big Data in 2015
By Cynthia M. Saracco, IBM Senior Solution Architect
As 2015 draws to a close, I find myself occasionally looking back at the year’s highlights and reflecting on what’s changed — and what hasn’t — in the world of Big Data. Here, in no particular order, are several conclusions I’ve reached based on my work at IBM and my interactions with customers, partners, and other third parties this year. While my views are admittedly biased by these activities, perhaps one or more of these topics will resonate with you.
- IBM bets big on Big Data and analytics. Starting with a major corporate reorganization in January that created a new Analytics business unit, IBM began boosting its investment in Big Data and analytics through acquisitions, alliances, and technology development. Examples include its plans to acquire The Weather Company and use weather data for industry-specific analytic solutions, its delivery of a new cloud service for Twitter data (which can be leveraged from Watson Analytics), and a new architecture for BigInsights, its Big Data platform based on Apache Hadoop, which now features separate offerings for data analysts, data scientists, and system administrators atop a foundation of common open source components.
- Solution users and providers form alliance to advance Hadoop. While initial adopters of Hadoop tolerated the technology’s inevitable growing pains in its early days, this year brought increased emphasis on stability and compatibility through the formation of ODPi, an industry effort to define, test, and certify a core set of Big Data open source projects. The initial focus of this effort involves Apache Hadoop (including HDFS, YARN, and MapReduce) and Apache Ambari. Members of ODPi include end users and solution providers, including IBM.
- The allure of Spark grows among potential users. Although interest in Apache Spark isn’t new, this year marked a period of heightened interest.
Spark’s performance characteristics and popular built-in libraries (for machine learning, streams computing, SQL access, and other areas) haven’t escaped prospective users. Indeed, at nearly every customer briefing and workshop I delivered, people wanted to hear something about Spark. Some members of the trade press and analyst community noticed this trend, too. Spark became a key Big Data initiative for IBM this year, as evidenced by the recent opening of its Spark Technology Center in San Francisco, active technical participation in Spark-related efforts, and recent donation of its System ML technology to Spark.
- Hadoop integration emerges in enterprise data strategies. Perhaps it’s just a by-product of the maturation of Hadoop use within various enterprises, but more and more firms are assessing — and implementing — technologies for integrating Hadoop with enterprise software. Data movement (ranging from simple data transfers to large-scale extract/transform/load jobs) and dynamic data retrieval are two broad areas where customers often focus.
Indeed, at the recent Insight 2015 conference, popular sessions included a customer presentation on query federation and data movement between Hadoop and non-Hadoop data sources, a collection of use cases on integrating mainframe data with BigInsights, an introduction to Big SQL (which offers SQL query federation, data access to Spark programmers, and support for popular business intelligence tools), and demos of IBM BigInsights BigIntegrate.
- Big Data skills remain in short supply. Finding data scientists and Big Data specialists (including Hadoop and Spark programmers) continues to be a tough task for hiring managers. Some firms are investing in internal training efforts (e.g., publishing best practices and sample code on intranet sites), while others are turning to competitive hires or service providers to fill the gaps. Fortunately, the gap hasn’t gone unnoticed by students and software professionals, who seem enthusiastic about enrolling in online courses, attending MeetUps and industry conferences, experimenting with free Hadoop or Spark downloads, etc. For example, membership in Big Data Developer MeetUps more than tripled this year and the average number of daily visits to IBM’s web site for Hadoop developers increased more than 50%.
What’s in store for Big Data? Most likely, technologies in this space will continue to evolve rapidly, and early adopters will continue to push the boundaries of what’s possible, driving vendors to do the same. Ultimately, though, the most effective Big Data technologies will be those that enable their users to quickly and easily analyze and derive value from the wide range of internal and public domain data available today.