by Carlos Andre Reis Pinheiro, Data Scientist, Teradata
Over the past few years, analytics has evolved into a major discipline for many corporations, particularly those exposed to a highly competitive marketplace. Analytics is composed of a set of distinctive approaches that use technologies and methods to describe business processes, to support decision making, to solve issues, and to help with improving performance.
Analytics can involve simple business intelligence applications, such as predefined multidimensional databases that are queried, to more advanced interactive visualizations of data across different dimensions, allowing analysts to glean insight from large volumes of data. These types of business analytics tools can be applied to single, built-to-purpose applications all the way up to large data marts that are used to examine questions in particular department(s). They can also be associated with a corporate data warehouse—as either a collection of multidimensional databases or as a single, high-performance business intelligence platform from which multipurpose tools are used to examine business problems across the whole company. Typically, these types of applications are associated with raw data collected from transactional systems, aggregated based on a particular business perspective, and examined through a user friendly front-end—allowing analysts to execute several types of queries in a short period of time. Business insights very often come from such multidimensional examinations of data as deep inquiries of operational inputs—formatted in a digestible, managed layout.
Analytics also goes a step further to include statistical and mathematical analyses. These types of analytic endeavors are often completed on a more ad hoc basis, demanding timely examination associated with a particular business purpose. And usually, this type of analysis is not periodic, nor demands frequent reprocessing of the task. It oftentimes does, however, require output monitoring to foresee if changes are occurring that require adjustments to the model over time. Such a model could be developed to examine the overall marketplace, a snapshot inquiry about customer behavior, or a forecast study in relation to costs, sales, or market growth, for example. This sort of analysis is often performed by using statistical and/or mathematical procedures and is commonly done in a stand-alone environment. As an ad hoc process it usually does not require the same infrastructure of a production environment. It is a set of tasks and procedures that are used to understand a particular business question and, once deployed, raise useful knowledge that can change operations and activities.
Finally, there is a third layer of analytics that is composed of data mining models. The data mining discipline includes artificial intelligence algorithms such as neural networks, decision trees, association rules, and genetic algorithms, among others. These artificial intelligence methods often use a more mathematical approach and are well suited to specific business issues, like predicting a particular business event from large volumes of inputs. Association rules, also known as induction rules, are very good at describing particular subject relationships, such as in retail. They are used to understand how products are selling together, highlighting the correlations between products to identify grouping, or sell with relationships. Consumers who buy bread, cheese, and ham usually buy butter as well. Consumers who buy red wine and grana padano cheese commonly also buy honey and cinnamon. These rules highlight product correlations, some of which might be quite useful, others not. Two metrics affiliated with association rules help us understand how relevant and strong the rule is—namely support and confidence. However, if you run an association algorithm over a hospital database, particularly using data from the obstetric department, a rule would probably emerge that women have babies, with both a high level of support and confidence. Of course, that rule has no business value—and although true, is not useful to any application.
This brings to light a very common phenomenon in analytics—you have to build a bridge between the technical procedure and the business results. Outcomes from analytics should raise the knowledge about the market, the customers, the products, and so forth, enabling companies to deploy practical, well-informed actions that improve their businesses. The results are beneficial associations and rules are that are not only informative but can also be applied to business actions. Validation of the value to the business derived from analytics-based knowledge should always be performed and encompasses the measured effect that analytics has on activities, events, and processes.
Artificial intelligence (AI) techniques are often assigned to methods that focus on classification or prediction. Commonly, the predictive AI methods are referred to as supervised learning models, given that they demand historical data and a target variable. The target variable is what the model is trying to predict and the historical information is used to train the model in order to predict the target variable. This historic information drives the model’s learning process, correlating the past behavior to the target. This learning process allows the model to foresee possible values for the target variable in the future. Artificial neural networks, decision trees, and regression equations are the most commonly used data mining techniques. These types of models usually require less business knowledge validation because they are trained in recognizing patterns in the data. So, once this type of model has its premise, which is the target variable, the outcome results are based on the pattern recognized in the historical information, rather than by some particular type of business knowledge itself. AI tasks are more about the training process and the pattern recognition rather than an overall analysis in traditional statistics (the latter of which doesn’t have previous history or a historical premise).
AI classification methods are also pattern recognition procedures—that, instead of predicting a target variable, focus on delineating patterns that describe the data. This type of modeling includes clustering, k-means, and self-organizing map methods—all of which require business validation because there is no premise in relation to the model training; that is, no target variable. Therefore, business expertise validation determines if the results are sound—demonstrating that bridge between the technical process and the business knowledge. A classification model essentially looks at the past, learns from it (creating a pattern), and uses it to apply that pattern to data not involved in the model training process. A clustering model usually requires a calculation that identifies the correlation between the main characteristics of the clusters found and the business threats and opportunities. Each cluster holds some set of characteristics which describes the cluster and which lead to business actions, like protecting the company from threats or exploring some hidden opportunities. Customers can also be stimulated to migrate from one cluster to another, say from a medium-value cluster to a high-value cluster, or from a disloyal cluster to a loyal one.
Many business actions can benefit from the insight gleaned from clustering techniques, particularly those that involve understanding groups of customers, products, and services. Distinct marketing campaigns can be based on the characteristics associated with individual clusters giving different incentives and alternate messages to each. A cluster, in and of itself, highlights some trend, type of behavior, or a particular need—some specific grouping driven by the data itself. Identified trends can be used to create a new product, the type of behavior can be used to establish an incentive, and the need can be used to adjust a service. There are many interactions companies can deploy and benefit from by using the subjective and descriptive information emanating from the clustering process.
Related Articles of Carlos Andre Reis Pinheiro: