On Fraud Analytics and Fraud Detection. Interview with Bart Baesens
“Many companies don’t use analytical fraud detection techniques yet. In fact, most still rely on an expert based approach, meaning that they build upon the experience, intuition and business knowledge of the fraud analyst.” –Bart Baesens
Q1. What is exactly Fraud Analytics?
Good question! First of all, in our book we define fraud as an uncommon, well-considered, imperceptibly concealed, time-evolving and often carefully organized crime which appears in many types of forms. The idea of using analytics for fraud detection is catalyzed by the enormous amount of data which is currently being generated in any business process. Think about insurance claim handling, credit card transactions, cash transfers, tax payments, etc. to name a few. In our book, we discuss various ways of analyzing these massive data sets in a descriptive, predictive or social network way to come up with new analytical fraud detection models.
Q2. What are the main challenges in Fraud Analytics?
The definition we gave above highlights the 5 key challenges in fraud analytics. The first one concerns the fact that fraud is uncommon. Independent of the exact setting or application, only a minority of the involved population of cases typically concerns fraud, of which furthermore only a limited number will be known to concern fraud. This seriously complicates the estimation of analytical models.
Fraudsters try to blend into the environment and not behave different from others in order not to get noticed and to remain covered by non-fraudsters. This effectively makes fraud imperceptibly concealed, since fraudsters do succeed in hiding by well considering and planning how to precisely commit fraud.
Fraud detection systems improve and learn by example. Therefore the techniques and tricks fraudsters adopt evolve in time along with, or better ahead of fraud detection mechanisms. This cat and mouse play between fraudsters and fraud fighters may seem to be an endless game, yet there is no alternative solution so far. By adopting and developing advanced analytical fraud detection and prevention mechanisms, organizations do manage to reduce losses due to fraud since fraudsters, like other criminals, tend to look for the easy way and will look for other, easier opportunities.
Fraud is typically a carefully organized crime, meaning that fraudsters often do not operate independently, have allies, and may induce copycats. Moreover, several fraud types such as money laundering and carousel fraud involve complex structures that are set up in order to commit fraud in an organized manner. This makes fraud not to be an isolated event, and as such in order to detect fraud the context (e.g., the social network of fraudsters) should be taken into account. This is also extensively discussed in our book.
A final element in the description of fraud provided in our book indicates the many different types of forms in which fraud occurs. This both refers to the wide set of techniques and approaches used by fraudsters as well as to the many different settings in which fraud occurs or economic activities that are susceptible to fraud.
Q3. What is the current state of the art in ensuring early detection in order to mitigate fraud damage?
Many companies don’t use analytical fraud detection techniques yet. In fact, most still rely on an expert based approach, meaning that they build upon the experience, intuition and business knowledge of the fraud analyst. Such an expert-based approach typically involves a manual investigation of a suspicious case, which may have been signaled for instance by a customer complaining of being charged for transactions he did not do. Such a disputed transaction may indicate a new fraud mechanism to have been discovered or developed by fraudsters, and therefore requires a detailed investigation for the organization to understand and subsequently address the new mechanism.
Comprehension of the fraud mechanism or pattern allows extending the fraud detection and prevention mechanism which is often implemented as a rule base or engine, meaning in the form of a set of IF-THEN rules, by adding rules that describe the newly detected fraud mechanism. These rules, together with rules describing previously detected fraud patterns, are applied to future cases or transactions and trigger an alert or signal when fraud is or may be committed by use of this mechanism. A simple, yet possibly very effective example of a fraud detection rule in an insurance claim fraud setting goes as follows:
- Amount of claim is above threshold OR
- Severe accident, but no police report OR
- Severe injury, but no doctor report OR
- Claimant has multiple versions of the accident OR
- Multiple receipts submitted
- Flag claim as suspicious AND
- Alert fraud investigation officer
Such an expert approach suffers from a number of disadvantages. Rule bases or engines are typically expensive to build, since requiring advanced manual input by the fraud experts, and often turn out to be difficult to maintain and manage. Rules have to be kept up to date and only or mostly trigger real fraudulent cases, since every signaled case requires human follow-up and investigation. Therefore the main challenge concerns keeping the rule base lean and effective, in other words deciding upon when and which rules to add, remove, update, or merge.
By using data-driven analytical models such as descriptive, predictive or social network analytics in a complimentary way, we can improve the performance of our fraud detection approaches in terms of precision, cost efficiency and operational effectiveness.
Q4. Is early detection all that can be done? Are there any other advanced techniques that can be used?
You can do more than just detection. More specifically, two components that are essential parts of almost any effective strategy to fight fraud concern fraud detection and fraud prevention. Fraud detection refers to the ability to recognize or discover fraudulent activities, whereas fraud prevention refers to measures that can be taken aiming to avoid or reduce fraud. The difference between both is clear-cut, the former is an ex post approach whereas the latter an ex ante approach. Both tools may and likely should be used in a complementary manner to pursue the shared objective, being fraud reduction. However, as also discussed in our book, preventive actions will change fraud strategies and consequently impact detection power. Installing a detection system will cause fraudsters to adapt and change their behavior, and so the detection system itself will impair eventually its own detection power. So although complementary, fraud detection and prevention are not independent and therefore should be aligned and considered a whole.
Q5. How do you examine fraud patterns in historical data?
You can examine it in two possible ways: descriptive or predictive. Descriptive analytics or unsupervised learning aims at finding unusual anomalous behavior deviating from the average behavior or norm. This norm can be defined in various ways. It can be defined as the behavior of the average customer at a snapshot in time, or as the average behavior of a given customer across a particular time period, or as a combination of both. Predictive analytics or supervised learning assumes the availability of a historical data set with known fraudulent transactions. The analytical models built can thus only detect fraud patterns as they occurred in the past. Consequently, it will be impossible to detect previously unknown fraud. Predictive analytics can however also be useful to help explain the anomalies found by descriptive analytics.
Q6. How do you typically utilize labeled, unlabeled, and networked data for fraud detection?
Labeled observations or transactions can be analyzed using predictive analytics. Popular techniques here are linear/logistic regression, neural networks and ensemble methods such as random forests. These techniques can be used to predict both fraud incidence, which is a classification problem, as well as fraud intensity, which is a classical regression problem. Unlabeled data can be investigated using descriptive analytics. As said, the aim here is to detect anomalies deviating from the norm. Popular techniques here are: break point analysis, peer group analysis, association rules and clustering. Networked data can be analyzed using social network techniques. We found those to be very useful in our research. Popular techniques here are community detection and featurization. In our research, we developed GOTCHA!, a supervised social network learner for fraud detection. This is also extensively discussed in our book.
Q6. Fraud techniques change over time. How do you handle this?
Good point! A key challenge concerns the dynamic nature of fraud. Fraudsters try to constantly out beat detection and prevention systems by developing new strategies and methods. Therefore adaptive analytical models and detection and prevention systems are required, in order to detect and resolve fraud as soon as possible. Detecting fraud as early as possible is crucial. Hence, we also discuss how to continuously backtest analytical fraud detection models. The key idea here is to verify whether the fraud model still performs satisfactory. Changing fraud tactics creates concept drift implying that the relationship between the target fraud indicator and the data available changes on an on-going basis. Hence, it is important to closely follow-up the performance of the analytical model such that concept drift and any related performance deviation can be detected in a timely way. Depending upon the type of model and its purpose (e.g. descriptive or predictive), various backtesting activities can be undertaken. Examples are backtesting data stability, model stability and model calibration.
Q7. What are the synergies between Fraud Analytics and CyberSecurity?
Fraud analytics creates both opportunities as well as threats for cybersecurity. Think about intrusion detection as an example Predictive methods can be adopted to study known intrusion patterns, whereas descriptive methods or anomaly detection can identify emerging cyber threats. The emergence of the Internet of Things (IoT) will certainly exacerbate the importance of fraud analytics for cybersecurity. Some examples of new fraud treats are:
- Fraudsters might force access to web configurable devices (e.g. Automated Teller Machines (ATMs)) and set up fraudulent transactions;
- Device hacking whereby fraudsters change operational parameters of connected devices (e.g. smart meters are manipulated to make them under register actual usage);
- Denial of Service (DoS) attacks whereby fraudsters massively attack a connected device to stop it from functioning;
- Data breach whereby a user’s log in information is obtained in a malicious way resulting into identity theft;
- Gadget fraud also referred to as gadget lust whereby fraudsters file fraudulent claims to either obtain a new gadget or free upgrade;
- Cyber espionage whereby exchanged data is eavesdropped by an intelligence agency or used by a company for commercial purposes.
More than ever before, fraud will be dynamic and continuously changing in an IoT context. From an analytical perspective, this implies that predictive techniques will continuously lag behind since they are based on a historical data set with known fraud patterns. Hence, as soon as the predictive model has been estimated, it will become outdated even before it has been put into production. Descriptive methods such as anomaly detection, peer group and break point analysis will gain in importance. These methods should be capable of analyzing evolving data streams and perform incremental learning to deal with concept drift. To facilitate (near) real-time fraud detection, the data and algorithms should be processed in-memory instead of relying on slow secondary storage. Furthermore, based upon the results of these analytical models, it should be possible to take fully automated actions such as the shutdown of a smart meter or ATM.
Qx Anything else you wish to add?
We are happy to refer to our book for more information. We also value your opinion and look forward to receiving any feedback (both positive and negative)!
Professor Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on big data & analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals and presented at international top conferences. He is also author of the books Analytics in a Big Data World (goo.gl/k3kBrB), and Fraud Analytics using Descriptive, Predictive and Social Network Techniques (http://goo.gl/nlCjUr). His research is summarised at www.dataminingapps.com. He is also teaching the E-learning course, Advanced Analytics in a Big Data World, see http://goo.gl/WibNPF. He also regularly tutors, advises and provides consulting support to international firms with respect to their analytics and credit risk management strategy.
–Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection (Wiley and SAS Business Series). Authors: Bart Baesens ,Veronique Van Vlasselaer,Wouter Verbeke.
Series: Wiley and SAS Business Series, Hardcover: 400 pages. Publisher: Wiley; 1 edition, September 2015. ISBN-10: 1119133122
–Fraud Analytics:Using Supervised, Unsupervised and Social Network Learning Techniques. Authors: Bart Baesens, Véronique Van Vlasselaer, Wouter Verbeke
Publisher: Wiley 256 pages
ISBN-13: 978-1119133124 | ISBN-10: 1119133122
– Critical Success Factors for Analytical Models: Some Recent Research Insights. Bart Baesens, ODBMS.org,
27 APR, 2015
– Analytics in a Big Data World: The Essential Guide to Data Science and its Applications. Bart Baesens, ODBMS.org, 30 APR, 2014
–The threat from AI is real, but everyone has it wrong, Robert Munro, CEO Idibon. ODBMS.org
Follow ODBMS.org on Twitter: @odbmsorg