RIP Big Data
By Carl Olofson, Research Vice President, Data Management Software Research, IDC
When the design of computers and networks reached a critical state in terms of memory size, processor capacity, and speed of inter-system communication, and when the cost of hardware had dropped to a point where most enterprises could afford systems that could, collectively, process petabytes of data, the door was opened for a new era in data collection, management, and analysis. As a result, a wave of technologies appeared, some new, some enhancements of older forms, that enabled us affordably to collect and process data at higher volume, velocity, and variety than ever before (IDC added a fourth “v”, “value”, to highlight the affordability aspect). We needed a term for these of technologies: Big Data.
Over the past 4-5 years, as this technology has matured, developers have come up with an amazing variety of applications.
Some of these have been quite practical. Others fanciful. Still others have been proof-of-concept applications that demonstrate the power of the technology without doing anything actually useful. We have treated Big Data technologies as an undifferentiated hodge-podge, and the distributors and lead committers to the projects (most of the new Big Data technologies are open source) have made exotic claims of being all things for all people. If you took seriously what many vendors in this space were saying, you would think Oracle, IBM, Microsoft, SAP, and Teradata should all just pack it in now.
Over the past couple of years, and especially during2015, however, this has all changed. As users start applying the technologies to practical tasks, the vendors of these technologies have become more business-focused. As they have done so, they have concentrated more on the core strengths of their products, rather than attempting to displace existing DBMSs, especially where well established workloads, such as back-office transaction processing, or strategic decision support through data warehousing, are concerned.
The truth is, that most of the opportunities for the success of these emerging Big Data technology vendors lie in new workloads that didn’t exist, or at least were not so sophisticated and demanding, a few years ago. These include Web-based retail at a much higher level of sophistication in terms of product recommendation and user profiling than in the past. They also include online gaming, machine-generated data analysis, social media data analysis, and management of data across millions of instances of apps on smart hand-held devices, including smart phones and tablets.
So, if Big Data is getting more practical and valuable, why RIP? Not because there’s anything wrong with the technology it describes, but because the term itself is becoming obsolete. A few years ago, these technologies and use cases were so new that we needed an umbrella term to capture them all. Now, we are getting down to cases, and the term is less useful. For instance, document-oriented DBMSs like MongoDB and Couchbase are commonly used for operational data management involving Web retail, online gaming, and similar applications, wide column stores like Cassandra are commonly used for scalable and highly variable analytics, and Hadoop is used for collecting, curating (or “wrangling”), and distributing extracts of large amounts of highly variable data, and so on. These products are no longer lumped together as Big Data products, but are viewed as products that solve real business problems, and are classified by the problems they solve.
Users and implementation consultants increasingly are combining these products with each other, and with “conventional” database technologies such as transactional and analytic relational DBMSs, to create end-to-end solutions. In 2016, we can expect this trend to accelerate. We can also expect, and this trend is already underway, that NoSQL vendors will extend the functionality of their existing products. The term du jour for this is multi-modal operation, and all the leading vendors in this space are claiming this capability. At the same time, however, they are also touting their integration capabilities with other technologies, especially for data warehousing.
It is likely that, in the end, enterprises will turn to well-managed combinations that include Hadoop (or something like it; Apache Spark is increasingly favored for this work, and Spark can run on Hadoop, but does not require it), document-oriented DBMS, graph DBMS, and wide column stores, together with scalable, memory-optimized RDBMS.
Some examples of how these technologies may be used include the following:
• Hadoop will be used for initial large scale data ingestion, data organization, filtering, and distribution.
• Document-oriented databases will handle Web-based operational data for cloud applications, especially those that are customer facing or user-intimate, as well as coordination of data from the Internet of Things (IoT).
• Graph databases will be used for relationship pattern discovery (leading to predictive analytics), cognitive computing, and fraud detection.
• Wide column stores will be used for dynamic operational decision support, especially with respect to marketing campaigns and customer trend analysis.
• Memory-optimized RDBMSs for combined analytical transaction processing (transactions that are driven by the results of
analytic queries against live operational data) as well as classic data warehousing.
These technologies will be knit together in solution frameworks for practical use. As we focus more and more on these discrete technologies and their roles in the big picture, the term Big Data will become a quaint relic of a simpler time.
Carl Olofson has performed research and analysis for IDC since 1997, and manages IDC’s Database Management Software service, as well as advising and guiding the Data Integration Software service. Mr. Olofson’s research involves following sales and technical developments in the structured data management (SDM) markets, including database management systems (DBMS), database development and management software, and data integration and access software, including the vendors of related tools and software systems. Mr. Olofson also contributes to the Big Data Overview report series and provides specialized coverage of Hadoop and other Big Data technologies. Mr. Olofson advises clients on market and technology directions as well as performing supply and demand-side primary research to size, forecast, and segment the database and related software markets. In 2000, Mr. Olofson received IDC’s highest award, the James Peacock Memorial Award for professional excellence in market research.
Mr. Olofson has worked in the IT industry since 1978, including two years of application development consulting, 10 years of database and tools software development, four years of product consulting, and three years as a senior product manager.
Mr. Olofson has been quoted in a variety of publications, including The New York Times, The Wall Street Journal, San Francisco Chronicle, Investor’s Business Daily, San Jose Mercury-News, ZDNet, USA Today, and Computerworld. He has presented at numerous conferences, including Teradata PARTNERS in 2013 and 2014, Percona LIVE 2014, IBM Insight 2014, IBM’s Information On Demand 2008 and 2013, and numerous others. Mr. Olofson received a B.S. from Boston University.