“We Can’t Rely on Machines”
Michael L. Brodie, Research Scientist at the Massachusetts Institute of Technology (MIT), is convinced that Big Data has more potential than the hype suggests, but -also more risks. An interview about opportunities and threats.
Interview by Patricia Faller, Editor in Chief of ZHAW-Impact, the campus magazine of ZHAW Zurich University of Applied Sciences. ZHAW is one of the leading universities of applied sciences in Switzerland.
Q. Everybody is talking about Big Data. Is it more than a hype?
Michael L. Brodie: It is a little bit like at the beginning of the internet. The hype was largely based on people who tried to make money selling their products. But the internet has changed our world in ways the hype couldn’t conceive. So at the moment the hype for Big Data comes- from IBM, SAS, SAP – the large vendors of these solutions. Forecasts of the Big Data Market show a huge market growth from 7.6 billion dollars in 2011 to 84.6 billion dollars in 2026.
Q. So yes, there is a lot of hype?
Michael L. Brodie: But I actually think it is far more profound and powerful than most people are conceiving it at the moment. It has already changed a very large number of operating processes in health care, manufacturing, marketing and stock markets. How-ever, it is not as widely used as one might think. Big Data and Big Data Analytics are in their infancy with respect to operational deployment and our understanding of it.
Q. So Big Data isn’t promoted beyond its value?
Michael L. Brodie: No, these methods are actually seen as the fourth scientific par-adigm, meaning that they have the potential of a completely different and faster way of solving many challenging prob-lems of humanity, of health care and poverty. Gartner, one of the world’s leading information technology research companies, estimates that 80 percent of all business processes worldwide will change within the next five to ten years, all based on Big Data Analytics.
Gartner also predicts that 85 percent of the Fortune 500 will be unable to exploit Big Data in 2015. The much bigger impact will be over the next decades.
Q. Sometimes it seems that there is a blind faith in Big Data.
Michael L. Brodie: Right. My experience shows that people who are very excited about Big Data may not be very familiar with forecasts or statistically based prognosis. In order to use statistical techniques to analyse data you have to understand the power and the limits of statistics. And almost every statistical outcome is probabilistic.
That means it may happen only within the predicted error bounds and confidence level. When you get an answer you still can’t say what SAP’s stock price will be next Wednesday. You say with a probability of 0.75 the stock price will be this or that amount of money. So it is always qualified by some sort of error bars. Error bars give a general idea of how precise a measurement is. But 80 percent of the people I interact with and who want to consume Big Data results don’t know what an error bar or probabilistic answer is.
Q. What does this mean from a customer’s point of view?
Michael L. Brodie: Most customers of data analysis want to know something like: “Will I sell strawberry Pepsi more than vanilla? Because I have to tell my manufacturing line how to switch.” And you say to them: “Well, with this or that probability you are likely to sell more strawberry in New Jersey over these weeks, and in California you are more likely to sell vanilla over the same period.” So they have to understand that the answers can only be probabilistic.
They never get a complete certified answer. Almost every measurement one makes in the world is probabilistic.
That means every computational answer made based on data is also probabilistic. So obviously education is going to be critical. Not only in understand-ing it from a customer’s point of view but also in expressing it from a Data Scientist’s who produces these answers.
Q. Some people are warning that Big Data threats humanity. What is your opinion?
Michael L. Brodie: There is an organization called “Future of Life“. It has just recently been created by very famous scientists and entrepreneurs with a strong commitment to technology like Stephen Hawking, the famous physicist- from Cambridge, Bill Gates from Microsoft and Elon Musk, the CEO of Tesla. Their vision is to limit the risks of automation and Big Data so that they won’t neg-atively impact humanity.
The media have dramatized that by stating that these people were saying “Artificial Intelligence may end life as we know it on the planet”. But of course they didn’t say that. Their objective is to safeguard life and develop optimistic visions of the future in order to mitigate existential risks facing humanity from Artificial Intelligence. The nature of the threat is rather that we in artificial intelligence don’t really understand what it does. We see the outcomes and they seem in many cases to be quite positive. But the most advanced research institutes like those- at MIT, when they talk about automating thinking they are talking about automat-ing relatively simple human activities.
Sure, there is a lot of success, absolutely. But it is like climbing a mountain to get to the moon. When you reach the top of the mountain you see you are closer to the moon but you need to find a new way.
Q. So there is nothing to be afraid of?
Michael L. Brodie: Those of us working in Big Data, should think about how we can improve the things that we do. Do we know that the results we get are correct or complete- or efficient? I’m concerned most about correctness and complete-ness. How do we understand what a machine really does if we can only keep less than ten ideas in our head at one time? A machine can handle billions of variables. I was at the White House last month, where one of the predominant policies of the US government and of 45 other governments around the world is Big Data. How can it be a government policy in England, in America, when we really don’t understand it? If machines and algorithms are making important decisions like running trains or airplanes, choosing medicines to prescribe for patients, do we understand the potential for bad behaviour? At the moment in the Big Data analysis field the vast majority of the practitioners and consumers don’t even realize the nature of the threat. But I, like many people in artificial intelligence, understand the threat and believe that it can be managed.
Q. How can we manage it?
Michael L. Brodie: So far we haven’t seen a big focus on addressing risks. Errors haven’t been in areas that are very important. For example Big Data and Big Data Analytics have been used in market-ing and language translation a lot. If you use Google on a daily basis, you notice advertisements that refer to something you searched for previously.
If Google serves you the wrong ad that’s not going to change the world. If Google Translate misinterprets a phrase it may annoy the customer but no specific harm may come. But there are more significant actions that might threaten an individual or a company.
Q. What are you thinking of?
Michael L. Brodie: For example, what if you get an automated report that says “a company is doing very well, you should invest in it” and you do but then lose your money? If “Algorithmic Responsibility” were a reality you could probably take legal actions against whoever sold you the report, whether a machine generated it or not.
Currently, that may not be possible. A more severe case would be if a medical treatment plan produced by automated personalized medicine caused harm rather than results in a cure. That is why such results are considered as advisory to doctors.
Many people think numbers don’t lie and that algorithms are neutral. But they aren’t. It depends on the kind of data you use and how you do the modeling.
The simplest way to character-ize the significance of algorithms is by noting that the risks this poses have moved to the international legal community, and specifically to the US legal community to propose- a set of laws that’s called “Algorithmic Responsibility”. This means: I don’t care how you came up to the decision you’re providing me, whether you used a machine- or a human or both – you’re still liable for the answer. For example, when a self-driving car has an accident, who is liable? Our society now realizes that as it becomes more data-driven we can’t rely on machines.
It’s the human who has to take the responsibility. In my many years of experience in this area I have seldom seen an application where the machine can uniformly produce a concrete answer that the human completely accepts.
We need a balance of human and machine- intelligence. Let me conclude- with a warning: Big Data is an example of the increas-ing use of algorithms in our lives. They are used to make decisions about what products we buy, the jobs we get, the people we meet, the loans we get, and much more. Algorithms are merely code written mostly by people but with machine learn-ing and Big Data increasing-ly by machines. Do algorithms discriminate? Cynthia Dwork, a researcher at Microsoft is the voice of concern that algorithms learn to discriminate and questions who is responsible when they do, and what are the trade-offs between fairness and privacy?
Q. Others see the benefits, especially the potential impact on the quality of life and health care. How can Big Data help in personalized medicine?
Now that we are beginning to collect data in a massive and consistent way we can tell more and more about what really happened to patients: what drugs they took, what the impact of the drug was and so on.
That’s on the very individual personal level. However, there is even a much greater possibility: If we collect detailed medical data like DNA, medical procedures and their outcomes, prescriptions by doctors and so forth from millions of people, we not only get the information of one patient – let’s call him Fred. But within the millions you can probably find thousands of people that are similar to Fred. That’s called population health. And if they had a disease that Fred now has, you can look at their behaviour and make recommendations like: “Fred, you should probably do this because people like you have been successful with this in the past.” Without personalized medicine doctors say: “In gener-al this treatment has had this outcome on the general population.” But that’s not Fred. And Fred has diabetes, he is 46 and he has only one leg and so he may have a very different prognosis from all the rest of the people who take that drug.
The ideal or ethical outcome of personal-ized medicine is to improve- the health care of people by prescrib-ing better treatments for them. So that’s the good side of the US government’s Precision-Medicine Initiative “Delivering the right treatments at the right time to the right person” for individuals. The side that the gov-ernment certainly sees is, each of the four leading chronical dis-eases in America costs us approximately 200 billion dollars a year.
So if you can increase the health of those patients you can reduce these costs dramatically, at least by half.
Q. Are the big pharma industries really interested in personalized medicine? Aren’t they more interested in big profits?
Michael L. Brodie: There is a disruption coming in pharmaceuticals: their custom-er base is changing from mass markets for big ticket drugs that are increasingly saturat-ed to more focused markets using Big Data. I have advised some big pharma companies, some here in Switzerland. They are going into micro-markets which could be a new source of income. But it would mean changing manufacturing, test-ing or clinical trials- – a lot of things would change.
But Big Data can also help them to discover new drugs in ways that are dramat-ically more effective, faster in turnaround and cheaper to produce. An example: The IBM Watson Program was used by Baylor Medicine. They discovered two potential cancer drugs. These drugs must still undergo clin-ical trials. But in months, rath-er than years, they found two drugs to stimulate what they call kineses that might cure cancer. Typically those kinds of discoveries take five to ten years and cost billions of dollars.
“Big Data and Big Data Analytics are in their infancy with respect to operational deployment and our understanding of it”
“Our society now realizes that as it becomes more data-driven we can’t rely on machines.
It’s the humanwho has to take the responsibility.”
“In my many years of experience in this area I have seldom seen an application where the machine can uniformly produce a concrete answer that the human completely accepts. We need a balance of human and machine intelligence.”
Michael L. Brodie, Research Scientist at MIT.
Michael L. Brodie is a Research Scientist in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology (MIT). With over 40 years of experience he advises startups and is a member of Advisory Boards of national and international research organizations. He is an adjunct professor at the National University of Ireland. For more than 20 years he was Chief Scientist of IT at the US telecom company Verizon, a Fortune 20 company. There he was responsible for advanced technologies, architectures, and methodologies for Information Technology strategies and for guiding industrial scale deployments of emergent technologies. He is concerned with the Big Picture aspects of information ecosystems including business, economic, social and technical applications.
His current research and applied interests include Big Data and Data Science. Brodie holds a PhD in Databases from the University of Toronto and a Doctor of Science from the National University of Ireland. Recently Michael L. Brodie was invited as keynote speaker at the 2nd «Swiss Conference on Data Science» organized by the ZHAW Datalab. The topic of his talk: “The Emerging Discipline of Data Science: Principles and Techniques for Data-Intensive Analysis”.
The ZHAW Datalab
The Data Science Laboratory (Datalab) at ZHAW is an interdisciplinary platform of five institutes and centres to transform deep data science know-how into innovative research projects and vibrant teaching in Switzerland.
Recently at the “Second Swiss Conference on Data Science” with 190 participants organized by ZHAW, Jean-Marc Piveteau, President of ZHAW, acknowledged its role: “The Datalab is an important player in Switzerland at the interface of applied research and innovation. Data Science is a primary field for Universities of Applied Sciences like ZHAW”, he said. The Mission of ZHAW is the transfer of knowledge into applications, to support the innovation process and the success of new technologies on the market. “In line with our strategy, in line with our mission, Datalab is present in the field of Data Science – a separate discipline but interdisciplinary”.