AI, Machine Learning and the GDPR. Are the Wild West Days of Advanced Analytics Over?
By Steven Finlay
The General Data Protection Regulation (GDPR) came into effect on 25th May 2018. The primary focus for many organizations has been about how personal data should be collected and maintained in a GDPR compliant way, and the subsequent uses to which the data is put.
However, the GDPR also contains specific articles about automated decision-making, but this has received less attention. This is important because automated decision-making systems, based on AI/Machine learning, are now in widespread use in many walks of life. They are used to predict how people are likely to behave and hence how an organization should treat them. For example, pre-screening job applicants, setting insurance premiums and identifying fraudsters and terrorist suspects.
So, what does the GDPR say about automated decision-making? To summarize, the key elements of the regulation are:
- Using an automated decision-making system, which has a legal or similarly significant impact on someone, is not permitted unless they have given consent that such a system can be used. The main exception to this is where automated processing is necessary to allow a legal contract to be enacted.
- An individual has the right to be provided with an explanation as to how an automated decision has been arrived at and the consequence of that decision.
- Automated decision-making systems must not display unfair discrimination; i.e. treating people differently on the basis of their race, sexual orientation and so on.
For item 1, If a decision is going to have a significant impact on someone, then they have the right to demand that the decision is reviewed manually. The law is clear that this review needs to be fully independent and undertaken by someone empowered to make a different decision to the one made by the system. It can’t just be a case of clerical staff rubber stamping the original decision.
Things then start to get tricky when we try to understand what the legislation means by significant. The GDPR does not provide a clear definition and there is little case law to examine. Most importantly, what may be insignificant for one person may be very significant for another. The assessment of significance has to be made on a case-by-case basis. Blanket assumptions can’t be made. Imagine a supermarket which applies a differential pricing strategy. Customers it believes can’t or won’t shop elsewhere are charged 5% more for their groceries than more fickle customers who shop around. For most customers this won’t be more than a slight inconvenience, but for those on the poverty line it may represent a very significant impact on their quality of life.
What this means in practice, is that it is prudent to always begin from the position that all decisions are potentially significant decisions, and hence require consent, unless there is sufficient evidence to the contrary.
With regard to item 2 – the right for an explanation about how a decision was arrived at – this requires an understanding of how a prediction, used to decide how to treat someone, was arrived at. For instance, why someone’s insurance premiums are a certain amount or why they were declined for a loan. As a minimum, an organization needs to tell an individual making such an enquiry:
- Which data items are used in the decision-making process. For example: “when assessing your application for a new phone contract, we use information about your age, income and marital status, together with information about your credit history to make a decision.”
- The source of those data items. This could be direct from the individual via an app or online form, information that is publicly available such as voter records or information they have given other organizations permission to share, such as social media platforms and credit reference agencies.
- Details of which data items contributed to the decision and in what way. In deciding whether or not to accept someone’s application for a phone contract, one might say that decisions are based on a credit score. The score uses information about people’s credit history, and the number and recency of missed payments negatively impact the score. The applicant missed two mortgage payments last year resulting in a low credit score. Therefore, their application for a new contract has been declined.
In some cases, a full technical explanation may also be necessary, although at the time of writing the exact circumstances when this would be required are unclear. One might assume that it will only be in a relatively few specific cases that reach the courts, rather than being something that is provided whenever a customer makes an enquiry. However, it’s pretty clear under GDPR that: “Complexity is no excuse for failing to provide information to the data subject.” In the UK, the Information Commissioner’s Office has made the following statement with regard to model explicability and the GDPR:
“Big data organisations therefore need to exercise caution before relying on machine learning decisions that cannot be rationalised in human understandable terms. If an insurance company cannot work out the intricate nuances that cause their online application system to turn some people away but accept others (however reasonable those underlying reasons may be), how can it hope to explain this to the individuals affected?”
This creates a potential barrier to using deep learning and other complex “black box” types of AI/Machine Learning unless a suitable explanatory mechanism exists. Such mechanisms can be developed but may add considerably to the overall time and cost of developing a solution.
Finally, there is the issue of discrimination. All decision-making systems (automated or manual) discriminate – that’s the nature of what they do. The question is whether unfair discrimination occurs. The GDPR is clear that specific items such as race, sexual orientation and so on, must not normally be incorporated into automated decision-making systems. However, one must also be able to identify and correct for indirect bias, which is no easy task.
The classic example of indirect bias is gender discrimination and income. You may exclude Gender as a variable in your analysis, but if you include income then Gender bias is likely to result. This is because, all other things being equal, women on average, earn less than men – even in those regions where such discrimination is technically illegal.
Take all these things together and the good old “Wild West” days of AI and Machine Learning are pretty much over when it comes to personal data. The practice of just letting a data scientist knock up a “quick and dirty” predictive model for your business function to use is all but gone. Instead, organizations which are investing in automated decision-making systems need to:
- Assess the risks and issues associated with machine learning and automated decision-making up-front. It’s not an add-on at the end of a project.
- Not expect to do away with all human expertise entirely. There must be a suitably trained operational function that can process decisions manually when required.
- Have suitable governance and audit processes in place to manage and provide oversight of their decision-making systems. In part, this is to ensure compliance with the law, but also to identify and mitigate risks that can arise from reputational damage if an organization is deemed to be acting unethically, even if it is acting legally.
The requirement for governance and manual intervention can have a significant impact on the costs of developing and maintaining automated decision-making systems. Therefore, if you are considering using automation within your organization, this additional overhead needs to be included within the cost-benefit case undertaken before the project begins.
This article is an abridged version of a chapter form Steven Finlay’s latest book: Artificial Intelligence and Machine Learning for Business. A No-Nonsense Guide to Data Driven Technologies. Third Edition. Published by Relativistic Books in July 2018.
 GDPR Article 22.
 GDPR Article 22 gives individual EU governments the ability to define further exceptions. For example, automated decision-making for the purposes of detecting tax evasion or other types of fraud.
 Council of Europe (2016). CONVENTION FOR THE PROTECTION OF INDIVIDUALS WITH REGARD TO AUTOMATIC PROCESSING OF PERSONAL DATA [ETS No. 108.]. Draft Explanatory Report. Para 75.
 Data Protection Working Party (2017). ‘Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679’. pp 11.
 One would probably need to also support this statement by calculating the individual’s score assuming that there had been no missed payments and confirming, that in that situation, the application would have been accepted. If the applicant would also have been declined even if there were no missed payments, then additional reasons for the decision would need to be provided.
 Data Protection Working Party. (2017). ‘Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679’. pp 14.
 Information Commissioner’s Office (2017). ‘Big data, artificial intelligence, machine learning and data protection.’ ICO.
 Directive (EU) 2016/680 OF THE EUROPEAN PARLIMEMT AND OF THE COUNCIL of 27 April 2016, para 38.