When the dust clears… Is it clearing, by the way?
When the dust clears… Is it clearing, by the way?
Authors:
Gilbert Harrus, Founder, Antegria (gilbert.harrus@antegria.com),
Laurence Hubert, CEO, Hurence (laurence.hubert@hurence.com)
We will not try here to define “Big Data” neither try to explore the various aspects the term encompasses.
The Wikipedia article [https://en.wikipedia.org/wiki/Big_data] gives a perfect entry point to the topic, also putting it in a historical context (by history, we mean 10-15 years!).
Hype about “Big Data” started a few years back, 6 years for sure, maybe more; the term has been known by the authors at least from that time; Gartner’s Hype Cycle for Emerging Technologies started listing Big Data in July 2011, along with a 2-5 years time to mainstream adoption.
At some point, a set of critical masses have been reached that could change the rules of the game: storage capacity and architecture (Hadoop-based solutions, NoSQL approach to databases), availability of data and computing power for using predictive analytics with reasonable efforts. This period opened for the usual extreme claims and inflated expectations; as usual with hype, a race has started among providers, new entrants, start-ups, etc, whether as providers of software solutions, services, consulting, and design of new solutions.
Since then, a lot has happened: many companies have significantly invested time and effort in the Big Data related domains, some experiments or real-size deployments have provided results (ROI).
The latest Hype Cycle in 2014 gives a 5-10 years adoption time, Big Data entering the Trough of Disillusionment [http://www.gartner.com/newsroom/id/2819918].
At the same time companies are adopting or increasing their effort on Big Data, technology proposals continue to evolve. For example, the term NoSQL hides a large number of DBMS that relax the traditional constraints of relational DBMS, among which HBase, Cassandra, MongoDB, DynamoDB, CouchBase, and many others.
Moreover, traditional DBMS providers extend their own offering toward Big Data.
In an attempt to keep data processing on their appliances, versus seeingHadoop winning it all), DBMS providers have opened their DBMS SQL engine to the NoSQL world.
In particular, they now allow queries on Hadoop data from their traditional distributed databases (IBM BigSQL, Oracle Big Data SQL, Pivotal Hawq, Teradata Query Grid:SQL-H). Some extend their federation beyond Hadoop (Teradata Query Grid and MongoDB).
Reading the Wikipedia article on NoSQL quickly gives the reader the sense of complexity of the NoSQL ecosystem.
The careful reader of these two paragraphs will also object that not every SQL-based name addresses the same category of tools: for example, how to compare IBM BigSQL and e.g., Cassandra? The marketing hype around Big Data does not help clearing the understanding
On the architectural side, things have evolved too: from Hadoop MapReduce with Java and Pig for processing batch jobs only, new architectural approaches open to stream processing, e.g., with Spark, an in memory Massively Parallel Processing framework, or with Storm over a distributed computing systems.
This evolution allows for dreaming about real time BI dashboards that would integrate every company’s indicators. We are not there yet, though: the technicality of Big Data tools is not helping; new specialized professionals are needed; and, moreover, the capability of companies in that domain is to be built (internal processes and supporting workflows).
Since the beginning of the emergence of the Big Data trend, a number of large companies have created high level Big Data related positions. Will CIOs become CDOs (Chief Data Officers)? Maybe, but not so fast. The roles of CIOs emerged in the 1990’s during the transformation of the IT industry, when companies realized that Information/Knowledge Management could provide them with a competitive advantage ; the advantage became, as usual, a “must have”. What looked like as hype would eventually prove a good, solid, with a proven ROI initiative. Will it be the case with this new trend?
The authors believe so, the first step being a radical, even though slow, evolution of the concept of data warehouses.
We are in a similar transformation as we were in the 1990’s. One aspect of the transformation is that, even more than before, open non-proprietary solutions will have an increased role in the future. The complexity is much higher than in the past, so relying on a set of proprietary solutions will be even more risky than ten years ago.
Today’s CIOs are also more sensitive to the latter. Last, a recent strategy+business study [The Imagination Gap, 2015/04/20] points out that another, seemingly simpler transformation is not yet fully embraced by large important industry sectors.
Also, it is extremely difficult to have a clear view on the pace of adoption of Big Data stacks. Some companies already take benefit of these technologies but, due to the type of data being processed, no or little information gets out.
Not only the type of data being processed, but also what kind of processing and the expected results, are trade secrets. The dust will not clear soon from that point of view: Big Data is before everything else sensitive data.
So, to answer the question is the title of this note: the dust may be clearing, but a lot will happen before it has completely. Big Data will evolve again and again. Do not expect convergence or standardization yet, but do expect many calls for rallying to one or another standardization efforts (as usual, a number will emerge). Technologies will continue to evolve, either new implementations or “Darwinian” evolutions of the existing ones. A number of companies will compete to bring the solution to their customers, while also learning about the new emerging evolutions and capabilities.
In the meantime, the best approach for companies is to learn, plan, and maybe act for the change. As during the previous IT transformations, the potential ROI for the company or organization should be the driver of any large money spending, while not spending for the learning and planning phases could be very detrimental.