The Importance of Ethics in Data Science
By Steven Finlay. 8/2/2015.
One of the biggest mistakes a data scientist can make is to assume that legal compliance with privacy and data protection laws mitigates all risks associated with the use of personal data. Simply maximizing the power of a predictive model using all (legally available) data does not necessarily lead to optimal business outcomes in the long run. It’s easy to forget that there is also an ethical dimension to consider when deciding how people are going to be treated. This applies even when people have given their consent for you to hold and use their data in an agreed way.
Why is having an ethical perspective important for a data scientist? The first thing to appreciate is that legal does not equal ethical. It’s true that The Law seeks to define behaviours which society deems to be right or wrong (i.e. ethical or unethical). However, there are always situations not covered by existing laws, and there are often loopholes that can be used to circumvent legal barriers. This is what we mean when we talk about the letter of the law as opposed to the spirit of the law. The evidence supporting ethical behaviour in business is also pretty clear. Dealing with customers in an ethical way is worthwhile because in the long term, it improves the bottom line.
One reason for thinking about ethical issues when developing automated decision making systems is the risk of reputational damage. Negative public sentiment about how you are using data can devalue a brand immensely. Simply arguing that your predictive model is statistically valid isn’t enough. For instance, if it comes to light that your model puts women, ethnic minorities and the poor at the back of the queue for medical treatment (even though sex, race and income are not explicit variables in your model) then you are going to be challenged about that – even if the decision making process that uses your predictive model generates optimal patient outcomes measured across the population as a whole. Another way to think about this is that ethical considerations should drive a number of constraints within decision making systems, which need to be given appropriate consideration when the system is designed.
So how should a data scientist seek to incorporate and an ethical perspective into their work? The answer I’m afraid is not straight forward. One problem is that things are often problem and domain specific. For example, I don’t think many people would have an issue with information about their DNA, sexual orientation, gender or race being used by their doctor to help diagnose a medical condition. However, using these same data items to offer differential pricing of products is a far more questionable and risky proposition.
A second difficultly is the subjective nature of ethics. It’s very much a personal thing. You and I may hold very different, but equally valid opinions as to what constitutes acceptable behaviour. Likewise, different legislative regimes adopt different approaches to personal data and how it should be used. In the USA, for example, the starting point when it comes to personal data is very much along utilitarian lines. Personal data is there to be harvested and used to maximize organizational goals (e.g. maximize profit or minimize cost). If there is a problem using a specific type of data, or there is unacceptable bias against a specific group, then legislation is enacted to address that particular concern.
This is very different to the European Union perspective, where a rights based approach is taken. The foundation stone of EU data protection law is personal ownership of one’s data. Data about me is mine – you have no right to hold or use my data unless I give you permission to do so. If I don’t want my data to be used for a given purpose, then that’s my decision, even if that leads to sub-optimal outcomes.
These two different perspectives are one reason why US companies such as Google and Facebook, struggle to find common ground with regulators over how personal data can be gathered and used in EU countries.
In my next article I’ll discuss these ideas further, and describe a pragmatic approach that a data scientist can employ to gauge the risks that different types of data and decision making represent.
• Orlitzky, M., Schmidt, F. L., Rynes, S. L. (2003). Corporate Social and Financial Performance: A Meta-analysis. Organization Studies, volume 24, number 3, pages 403-441.