Heuristics and Randomness in Analytics
BY Carlos Andre Reis Pinheiro, Data Scientist at EMC2, Brazil, visiting professor at KU Leuven, Belgium, and lecturer and researcher at Getulio Vargas Foundation (FGV), Brazil.
Analytics can be used to address many different types of business problems, in many different industries. It is used to better understand particular processes behavior, identify patterns and predict and classify events. It is also used to describe different marketplaces scenarios and possible impacts in the future. Analytics may be used to describe competitors’ movements and trends. In predictive analytics, it is possible to foresee potential future revenues, detect risk, uncover fraud, reduce bad debt, anticipate collecting, optimize operational processes, and much more.
In business problems which customers are involved, it is important to observe that consumer presents different types of behaviors, and in accordance with the market they are interacting with. As a customer, I can be very aggressive in terms of purchasing high-tech products, often buying cutting-edge gadgets. However, I am quite conservative in terms of investing, putting my money into low-risk accounts. There is no one, overall general behavior for any customer. We each behave in different ways depending upon the situation in which we find ourselves.
Essentially, we wear different hats, having distinct behaviors that are observable—each in relation to the distinct roles we play. And sometimes, even in similar scenarios, we may play different roles and exhibit different behaviors, depending on the other scenario actors that are involved.
All analytical models, whether they are supervised (classification and prediction), semi-supervised (text mining and network analysis), or unsupervised (clustering and association rules), take into consideration most of the structured information that companies currently hold in their databases.
They include information about customer characteristics, the products and services that they offer, and how customers interact with them. They include financial inputs such as credit rating, payment history, late payment durations, and so forth. All of this information, however, describes only a limited part of the end-consumer’s behavior. In other words, we really can’t say too much about an individual customer’s profile, but we can describe how they behave in a single scenario, like when using the company’s products and services.
You could say that, based on my historical data, I am an aggressive buyer of high-tech gadgets. And it is just as possible to state, based on my buying history, which I work hard to purchase high-tech products in advance. But this behavior doesn’t replicate to other situations, like my conservative financial investment behavior. I also might be very price sensitive regarding telecommunications services, but on the other hand, I may not be sensitive to aspects such as quality when it comes to hotel rooms. The important thing to keep in mind is that there isn’t an overall understanding of behavior. Instead, behavior is always in relation to something, to hotels, financial investments, or telecommunication preferences.
Consider for a moment, understanding and even predicting behavior in a telecommunications scenario. Most analytical models consider call frequency and duration, demographic information about the consumer, billing and payment history, and—when available and in reasonable quality—historical contacts with customer care channels. Based on such data, companies are able to build the majority of the analytical models used to examine common business issues such as product/service cross-sell or up-sell, churn, revenue leakage, risk specification, or fraud detection.
Furthermore, for classification problems (that is, the ones that use a target class for training), historical information is quite crucial in that it teaches the model which behavior is most highly relevant to that particular event. What are the main characteristics for all customers who bought some particular product? How do they behave before this purchasing event? Which variables were most relevant to describe the business event or triggered it? Historical data, when it is in relation to a particular business event, teaches the analytical model to foresee the likelihood for each customer to perform when exposed to a similar event.
However, this is a purely mathematical approach. Even more specifically, it is a purely statistical approach. The analytical model teaching, also called the training process, is based on the average customers’ behavior. However all customers with similar past behavior will not proceed in the same way, will not purchase the same product, will not consume the same service in the same way, and so on.
For example, according to my past behavior, and as represented in my historical data, I might be about to purchase a particular bundle of telecommunications services because customers who have been behaving like me have bought this bundle in the past, after a similar sequence of events. So, it is quite reasonable for any company to think that now is my turn.
Then, the week that I’m going to buy that bundle approaches. Most unfortunately, one special Sunday afternoon, my soccer team lost the derby. It was the final match of the championship, and we lost to our biggest rival.
So instead, my forthcoming week is a sequence of five long days of frustration from the loss, and I’m certainly not in the mood to buy anything. Instead, I hide myself and simply wait for time to move on. This completely external event was not considered by the model and yet has changed everything to do with the accuracy of my predicted behavior. Statistically I should have purchase the bundle that week, and the likelihood of it would be around 87 percent. Unfortunately, the analytical model didn’t take into account that possible result for the final match. And with great sadness, this particular variable—the result of the final match—was indeed most relevant in my actualized behavior. It is the single factor that made all difference in me buying something or not.
These external influences happen all the time in our lives. Very often they impact analytical models, especially those that are defined for business purposes. It is not possible to consider all variables, all attributes, all information required to create a particular inference. Everything in modeling is about an approximation. As my historical behavior was quite similar to other customers who did buy that particular bundle, my likelihood of purchasing the same bundle might also be very high. But it isn’t a definite or a sure thing that I will buy it at all. It is just an approximation. It might be a high and accurate approximation, but in the end, it is just a simple approximation. The likelihood assigned to each customer is simply an approximation of how they might eventually behave in relation to a particular business event.
This fact shouldn’t push us to give up on analytical endeavors. As a matter of fact, it should do just the opposite.
It hopefully brings us even closer to understanding the true value of analytics. Unexpected events will always take place, and they will always impact predicted outcomes. Analytical models work well for the majority of cases, for most of the observations, and in most applications. Unexpected or uncontrolled events will always occur and typically affect a few observations within the entire modeling scenario. However, there are some events that will impact the entire analysis population, like a war, an earthquake, or a hurricane, and as such, a new historical behavior is built.
Analytical methods that understand the past and that are prepared to explain present circumstances do provide forecasts into the future that improve business decisions. However, there are several unimagined and unforeseen events that impact analytic results. These events create the boundary between art and science in analytical modelling.
The formula to predict a particular event works a lot like the standard conditions for temperature and pressure in chemistry. If everything is right, if the temperature is in the expected range, as well as the pressure, then the formula forecasts the outcome quite well. While we certainly can have exceptions, the formula is just a way to model a particular scenario and be aware of what is coming next, and what could be expected in certain conditions. Likewise in science, and several other disciplines, this approach is the closest we can get to being in touch with reality. It is much more enlightened than doing nothing. The key is to properly understand what is happening in order to dramatically increase your model’s usefulness.