The other side of Big Data. Interview with Michael L. Brodie.
“So it is not even the volume of data that imputes political or economic value. Hence, it is clear that data has enormous political and economic value. Given the increasing digitization of our world it seems inevitable that our legal, economic, and political systems, amongst others, will ascribe to formal measures of value for data.” –Michael L. Brodie.
What is the other side of Big Data? What are the societal benefits, risks, and values of Big Data? These are difficult questions to answer.
On this topic, I have interviewed Dr. Michael L. Brodie, Research Scientist at MIT Computer Science and Artificial Intelligence Laboratory. Dr. Brodie has over 40 years experience in research and industrial practice.
RVZ
Q1. You recently wrote [7] that “we are in the midst of two significant shifts – the shift to Big Data requiring new computational solutions, and the more profound shift in societal benefits, risks, and values”. Can you please elaborate on this?
Michael L. Brodie: The database world deals with data that is bounded, even if vast and growing beyond belief, and used for known, discrete models of our world most of which support a single version of truth. While Big Data expands the existing scale (volume, velocity, variety) it does far more as it takes us into a world that we experience in life but not in computing. I call the vision, the direction that Big Data is taking us, Computing Reality. A simple explanation is that in the database world, we work top-down with schemas that define how the data should behave. For example, Telecom billing systems are essentially all in an equivalence class of the same billing model and require that billing data conform. Billing databases have a single version of truth so that telecom bills have justifiable charges. Not so with Big Data.
If we impose a model or our biases on the data we may prelude the very value that we are trying to discover.
In Big Data worlds, as in life, there is not a single version of truth over the data but multiple perspectives each with a probability of being true or reasonable. We are probably not looking for one likely model but an ensemble of models each of which provides a different perspective and discloses some discoveries in the data that we otherwise would not have found.
So the one paradigm shift is from small data that involve discrete, bounded, top-down approaches to computing to big data that require bottom-up approaches that tend to be vague or probabilistic, unbounded, and provide support multiple perspectives. I call this latter approach Computing Reality, reflecting the vagueness and unboundedness of reality.
A second, related shift – from why to what – can be understood in terms of Scientific Discovery. The history of scientific and Western thought, starting before Aristotle and Plato, has matured into what we know today as the Scientific Method in which one makes observations about a phenomenon, e.g., sees some data, hypothesizes a model, and determines if the model makes sense over the observed data.
This process is What: What are the correlations in the data that might explain the phenomenon.
A reasonable model over the data leads to Why: the Holy Grail of Science – causation – Why does the phenomenon occur.
For over 2,000 years a little What has guided Why – Scientific Discovery through empiricism.
Big Data has the potential of turning scientific discovery on its ear. Big Data is leading to a shift from Why to What.
The value of Big Data and the emergence of Big Data analytics may shift the preponderance of scientific discovery to What, since it is so much cheaper that Why – clinical studies that take vast resources and years of careful work. Here is the challenge.
Why – causation – cannot be deduced from What. It is not clear that Big Data practitioners understand the tenuous link between What and Why. Massive Big data blunders [1, 2] suggest that this is the case.
My research into Computing Reality explores this link with the hopes of providing guidance for Big Data tools and techniques. And even cooler than that to accelerate Scientific Discovery by adding mechanisms and metrics of veracity to Big Data and its symbiosis with empericism
Q2. You also wrote [7] that with Big Data “more than ever before, technology is far ahead of the law”. What do you mean with this?
Michael L. Brodie: The Forth Amendment of the United States Constitution was tested many times by technology including when electronic techniques could be used to determine activities inside a citizen’s home. When the constitution was written in 1787 electronic surveillance could never have been anticipated.
Today, the laws of search and seizure, based on the Fourth Amendment, permit those with a warrant to acquire all of your electronic devices so the government can examine everything on those devices although it appears that the intent of the law was to permit search and seizure of evidence relative to the suspected offence. That is, the current laws were perfectly rational when written; however, technology has so changed the world we live in that the law, interpreted simply allows the government to look at your entire digital life, which for many of us is much of our lives thus minimizing or eliminating the protections of the Fourth Amendment. The simple matter is that technology will always be ahead of the law.
So we must constantly balance current and unforeseen consequences of technology advance on our lives and societies.
Since time immemorial, and as observed by Benjamin Franklin, we must always judiciously balance freedom and security; you can’t have both. Technology more than many domains push this balance.
Q3. John Podesta, Obama’s Counselor and study lead, asked the following question during a workshop[4]: „Does our legal privacy framework support and balance safety and freedom?“ What is your personal view on this especially related to the ongoing discussion on an open and free Internet and Big Data?
Michael L. Brodie: What a great question, worth of serious pursuit, more than I will pursue here. A fundamental part of your question is of a free and open Internet. While it is debatable as to whether computing or the Internet has created economic growth and increased productivity, it is fair to say that our economies have become so dependent on computing and that Balkanizing the Internet, as exemplified recently by Turkey, China, Brasil, and even Switzerland, will surely cause major economic disruption.
Not only does a significant portion of our existing economy ride on an currently open and free Internet, that platform has been and will continue to be a fountain of innovation and potential economic growth, and, ideally, increased productivity; not to mention the daily lives of billions of people on the planet. As we have seen in Tunisia, Egypt, North Korea, China, Syria, and other constrained countries, an open and free Internet, e.g., Twitter, is becoming a means for democratic expression and constraint on totalitarian behaviour. Much is at stake to maintain an open and free Internet.
This should encourage a robust debate of the various Internet Bill of Rights [8]currently on offer. Clearly the Snowden-NSA incidents and the resulting events in the White House, the Supreme Court, and the US Congress clearly indicate that our legal privacy framework is inadequate. The more interesting question is what changes are required to permit a balance of freedom and safety. Such a framework should result from a robust, informed public debate on the issues. Hopefully these discussions will start in earnest. The workshop is an example of the White House’s commitment to such a discussion.
Q4. What would be the discussion on an open and free Internet, while balancing safety with freedom, if Edward Snowden had not disclosed the NSA surveillance?
Michael L. Brodie: What great questions with profound implications, clearly beyond my skills, but fun to poke at. Let me add to the question: Is Snowden a Whistle Blower or terrorist? Is he working to uphold the constitution or undermine it?
I happen to have had some direct experience on this issue. From April 2006 to January 2008 I served on the United States of America National Academies Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and other National Goals, co-chaired by Dr. Charles Vest, president of the National Academy of Engineering and Dr. William Perry, former US Secretary of Defense, that was commissioned by the Department of Homeland Security and the National Science Foundation.
The recent White House Investigation prompted by Snowden’s disclosures heavily cited the commission’s report [3].
The 21-month investigation by 20 experts chosen by the academy uncovered some aspects of what Snowden’s disclosures led to, it did not uncover the scope and scale of the NSA actions that emerged from Snowden’s disclosures.
It is not until you discover the actions that you question the relevant laws or as the White House justifiably asked, the legal privacy framework to support and balance safety and freedom.
As I said in the piece that you reference the White House and Snowden are asking exactly the same questions. Snowden has said that he saw it as his obligation to do what he did given his oath to uphold the constitution. Hence, such a discussion could emerge without Snowden in the next decade, but it would not have emerged at the moment without his actions.
Would that it had emerged in 2006 or as a consequence of the many other similarly intended investigations.
It seems to me that Snowden blew the whistle on NSA.
Q5. De facto, the Internet is becoming a new battlefield among different political and economic systems in the world. What is the political and economic value of data?
Michael L. Brodie: Again a grand question for my betters. This is another profound question that I am not skilled to answer. But why let that stop me?
Our economic system is based on commodities, goods and services, with almost no means of attributing economic value to data. Indirectly, data is valued at inconceivably high values according to many Internet company acquisitions, especially Facebook’s recent $16 Billion acquisition of Whatsapp that appears to be acquiring people and their data by the network effect.
How do you ascribe value to data? Who owns data?
Does it age and does time reduce or raise its value? If it has economic value, then what legal jurisdiction governs data? What is the political value of data?
For one example look at Europe’s solicitation of business away from the United States based on data, data ownership, and data governance.
Another example is that President Lyndon Johnson achieved the US Civil Rights Bill because of data – he knew where all the bodies were buried. What is the value of data there?
So it is not even the volume of data that imputes political or economic value. Hence, it is clear that data has enormous political and economic value. Given the increasing digitization of our world it seems inevitable that our legal, economic, and political systems, amongst others, will ascribe to formal measures of value for data.
Q6. There has been a claim that “Big data” has rendered obsolete the current approach to protecting privacy and civil liberties [5]. Is this really so?
Michael L. Brodie: Without question expanding beyond bounded, discrete, top-down models of the world, to a vastly larger, more complex digital version of the world, requires a reevaluation of previous approaches to computing problems, including privacy and civil liberties. The quote is from Craig Mundie [5] who makes the observation for a policy and strategy point of view. A recent report on machine learning and curly fries claims that organizations, e.g., marketing, can create complete profiles of individuals without their permission and presumably use it in many ways, e.g., refuse providing a loan? Does that threaten privacy and civil liberties?
While I quoted Mundie concerning civil liberties, my knowledge is in computing and databases. My reference concerns the fact that current solutions will simply not scale to the world of Big Data and Computing Reality. It seems a safe statement since Butler Lampson and Mike Stonebraker have both said the same thing. Simply stated, we cannot anticipate every attack, what combination of data accesses could be used to deduce private information. A famous case is to use Netflix movie selection data to identify private patient information from anonymized Medicare data. So while you may do a top-down job applying existing protection mechanisms, your only hope is to detect violations and stop further such attacks, as has been claimed for Heartbleed.
As Butler Lampson said [6]
“It’s time to change the way we think about computer security: instead of trying to prevent security breaches, we should focus on dealing with them after they happen.
Today computer security depends on access control, and it’s been a failure. Real world security, by contrast, is mainly retroactive: the reason burglars don’t break into my house is that they are afraid of going to jail, and the financial system is secure mainly because almost any transaction can be undone.
There are many ways to make security retroactive:
• Track down and punish offenders.
• Selectively undo data corruption caused by malware.
Require applications and online services to respect people’s ownership of their personal data.
Access control is still needed, but it can be much more coarse-grained, and therefore both more reliable and less intrusive. Authentication and auditing are the most important features. Retroactive security will not be perfect, but perfect security is not to be had, and it will be much better than what we have now.”
References
[2] G. Marcus and E. Davis. Eight (no, nine!) problems with big data. New York Times, August 2014.
[4] John Podesta, White House Counselor, White House-MIT Big Data Privacy Workshop: Advancing the State of the Art in Technology and Practice, March 4, 2014, MIT, Cambridge, MA http://web.mit.edu/bigdata-priv/agenda.html
[7] White House-MIT Big Data Privacy Workshop A Personal View, Dr. Michael L. Brodie , Computer Science and Artificial Intelligence Laboratory, MIT , March 24, 2014 http://www.odbms.org/2014/04/white-house-mit-big-data-privacy-workshop/
[8] Tim Berners-Lee, Online Magna Carta, aka Internet Bill of Rights, aka a Global Constitution.
Dr. Michael L. Brodie
Dr. Brodie has over 40 years experience in research and industrial practice in databases, distributed systems, integration, artificial intelligence, and multi-disciplinary problem solving. He is concerned with the Big Picture aspects of information ecosystems including business, economic, social, application, and technical.
Dr. Brodie is a Research Scientist, MIT Computer Science and Artificial Intelligence Laboratory; advises startups; serves on Advisory Boards of national and international research organizations; and is an adjunct professor at the National University of Ireland, Galway. For over 20 years he served as Chief Scientist of IT, Verizon, a Fortune 20 company, responsible for advanced technologies, architectures, and methodologies for Information Technology strategies and for guiding industrial scale deployments of emergent technologies, most recently Cloud Computing and Big Data and start ups Jisto.com and data-tamer.com. He has served on several National Academy of Science committees.
Dr. Brodie holds a PhD in Databases from the University of Toronto
and a Doctor of Science (honoris causa) from the National University of Ireland.
Resources
Follow ODBMS.org on Twitter: @odbmsorg