From Classical Analytics to Big Data Analytics
by Peter Weidl, IT-Architect, Zürcher Kantonalbank
The longer you live in the past, the less future you have to enjoy.
This sentence summarizes the current dilemma of many companies regarding their capabilities to analyze and manage data. Huge amounts of money were invested for example in data warehouse and business intelligence solutions. From the discussions about Big Data we learn that there is a joyful future with many new business opportunities but also investments. This article gives an overview of the major gaps between classical analytics and analytics based on Big Data technology. Additionally some ideas are listed how to start living in the Big Data future.
The gaps are analyzed according to five dimensions: Data, Compliance, Process, People and Technology. For each dimension follows a brief description of the new challenges with Big Data.
Data: There are at least three new aspects of data in the Big Data times. The first one is described via the well-known three Vs: Velocity, Variety und Volume. Thinking in terms of classic data warehousing we are not prepared for theses Vs especially the „bigger“ they get.
Secondly, having enough data about data, i.e. meta data, is the next demanding issue when you want to use data correctly. Many efforts have been made so far to establish some common semantic layers in companies or industries. These layers help Business and IT staff to understand and find the data they need. It is clear that the more data comes from different sources the more important are common semantic layers. And even the number of sources grows when you look at mobile devices, sensors in things, cloud solutions or social media.
Finally data has to be categorized with respect to trust. Each use case needs data that satisfies a certain quality level. This categorization is not new but a lot more demanding with Big Data.
Compliance: First of all it is indispensable to be compliant with the law when you use data of persons or institutions. By combining data from many internal and external sources in combination with more „clever“ analytical tools Big Data technologies threaten privacy in general. Existing regulations about the use of customer data must be extended to shelter privacy. The exploitation of such private data is far ahead of applicable regulations. But this is not only a gap regarding regulations, each company needs some kind of committee to define und control ethical principles about the use of customer data.
Another aspect of compliance is satisfying the growing amount of regulations. For example in the financial industry national and international regulations imply a more and more complex data and product management. Offer a product only to customers who satisfy certain preconditions and make sure that the customer receives comprehensible and complete information for a contract agreement. How can one guarantee that all important data for a customer decision is available to the customer? Checklists and disclaimers are a growing market.
Process: The traditional analysis process of data which is based on a data warehouse infrastructure is mainly batch oriented and less change tolerant. New business requirements like providing additional data to answer a certain question need strong support from IT-people. The new process can be called explorative analysis of data. In simple terms, first identify the relevant data sources, then bring the data needed on an appropriate analysis platform und finally let the business people autonomously analyze the data. My assumption is that at most 20% of the analysis use cases demand an explorative analysis. The traditional analysis process still covers most of the business requirements. In addition established disciplines like enterprise information management and information governance must be adjusted to handle Big Data.
People: A good change management is necessary to establish the new roles with Big Data. Most remarkable is a new C-level role „CDO“ that stands for Chief Data Officer. Some estimates say that there are now about 250 CDOs globally. Analysts from Gartner predict that already in 2015, 25% of large global organizations will have appointed a CDO. Further new roles are e.g. data innovator, data scientist or Big Data developer. The data innovator looks for new business models based on Big Data. Selling and buying or helping to find or exchange data has enormous market potentials. A data scientist explores data for new findings. Statisticians and data miners did this so far. But now self-learning machines, better data exploration tools and more calculations in a shorter time with a much bigger data basis justify this new role. Because of the many Big Data technologies there is also need for Big Data developers. All these new roles imply high investments in people and not only technology.
Technology: In the technology sector there are some newer and some older revitalized concepts to handle Big Data issues. The newer ones are Hadoop based on improvements in parallel computing, the increase of self-learning algorithms and better semantic analytics. Concepts like streaming of data or calculating in-memory are not really new. But the improvements in hard- and software make these concepts applicable to a lot more data. The number of new products or tools to work with Big Data is overwhelming. There are specialized products to visualize, analyze, query, store, integrate or govern data in the Big Data world. With the help of Big Data taxonomies it is easier to identify the gaps between classical analytics and advanced (Big Data) analytics. But even if you have identified the gaps in your own IT landscape, it is really tricky to find the right product(s) for a specified business use case and that these products finally work properly together.
After the gap analysis is done, the question is how to evolve from classical analytics to Big Data analytics? There is hardly any methodology available on the market. In the dimensions mentioned above many departments are involved if a gap should be closed. The following recommendations are primarily for those who want to start a Big Data initiative. Those who are familiar in applying Big Data analytics could use the suggestions for review purposes.
A restricted Big Data potential analysis seems to be helpful as a first step. Restricted here means that we look for stakeholders and departments who are familiar with analyzing data and who have already some Big Data use cases in mind. For these use cases identify the Big Data tools necessary for implementation with the help of Big Data experts. Then check if a proof of concept is easier to realize on-premise, as a cloud solution or in some laboratory of a vendor. If the results convince the senior management a brighter collection of Big Data use cases should take place. Feasible to the list of prioritized use cases a Big Data roadmap has to be defined according to the gap dimensions above. The dimensions are clearly not independent from each other.
Here are some ideas for each dimension as a list of steps into a joyful future.
– Identify relevant data sources for your use cases
– Establish common semantic layers also for Big Data
– Add a trust label to data sources and data
– Establish an ethical committee about how to use private data
– Define guidelines about the use of customer related data (master data, transaction data, communication data, etc.)
– Find agreements with customers that their data can be used in Big Data analytics
– Write a concept how to satisfy the duties of disclosure with Big Data
– Establish an explorative analysis process of data
– Extend data life cycle guidelines to cover Big Data
– Extend data governance guidelines to cover Big Data
Initiate change management process:
– Identify high potentials in your staff capable to work with Big Data
– Define an education path from a classical analyst or developer to one of the new roles
– Define employment campaigns to attract Big Data experts
– Establish the new roles and make them attractive
– Define a target architecture combining the classical data warehouse und business intelligence solutions with the new Big Data technologies (concepts like data lake, first stream than save the data, data virtualization, in-memory computing, etc.).
– Look for reference architectures like e.g. the „Logical Data Warehouse“ of Gartner
– Think about new sourcing strategies (BI in the cloud) instead of doing everything on-premise