The Truth is Out There
By Ken Birman, N. Rama Rao Professor of Computer Science, Cornell University. –January 2015.
Up to the present, Big Data has been dominated by a cloud computing model: data is collected by web sites, then aggregated into very large data sets that can be mined for insights. In effect, we’ve been uploading the truth to the cloud, and our willingness to do so triggered the greatest revolution in modern technology history.
Yet the cloud computing infrastructure only glimpses a miniscule fraction of the information existing devices capture, and with the rollout of billions of new networked devices per year, that fraction is dropping. The really big data is out at the edge, and for a great many reasons is likely to stay there.
Consider the smart power grid. The importance of reducing energy consumption is front and center for most governments, so it should be a no-brainer to deploy systems that support a dialog between power generating companies and the consumer aimed at balancing generation and demand. Yet this has been very slow to occur.
The need is for a system that would preserve privacy, yet allow the grid operator to issue basic queries: “How much power do you need for the next few hours? Could you defer some of that load until later in the afternoon?”
With better information, we could slash waste and reduce harmful CO2 emissions. But an unscrupulous operator might try to abuse the query system to violate personal privacy. The truth really is out there, so unless we can offer privacy-preserving ways to answer a query without revealing sensitive personal data, that risk would be a significant barrier to deploying smart demand/response control solutions.
What we need is a privacy-preserving data aggregation technology. With my colleagues (Edward Tremel, Mark Jelasity, and Bobby Kleinberg), I’ve been looking at this problem, and we have a novel solution that can support a wide range of queries, protecting the individual contributions towards the answer while still letting the system operator learn enough to balance power production against demand. Our approach is tolerant of Byzantine faults and guarantees differential privacy, which is the most powerful model known for the protection of sensitive information. Further, our solution is very practical (the criticism of prior work has been that costs were really high and scalability was poor).
We’re looking for a partner who might be interested in carrying out some real-world experiments. Drop me a note to learn more about how our solution works.
With the approaching Web of Things, more and more data will be captured at the edge, much of it quite sensitive. Making use of all this “truth” is going to force us to develop efficient ways to perform decentralized data mining, without compromising privacy. This isn’t easy, but the payoff could be enormous: including, quite possibly, huge improvements in the efficiency of the power grid (which would bring correspondingly big reductions in greenhouse gas emissions). Wouldn’t it be great to use data mining to save the environment without intruding on personal privacy? Until recently, that combination might have sounded unattainable, but today, it could well be within reach!