Q&A with Kenneth P. Birman, Rama Rao Professor of Computer Science, Cornell University
RZ: Ken, I wonder if you could tell our readers a little about the work you’ve been doing on the smart power grid.
KPB: Roberto, this would be a pleasure. My group at Cornell has been working to show that cloud computing technology can be used to build very robust Internet of Things systems, and for us, the smart power grid is an ideal test case. The goals of the power community won’t surprise anyone: every country in the world wants to bring more renewable power and microgeneration into the grid (solar, wind, etc) and we also have a huge wave of new technologies (wall-size rechargeable storage batteries, controllable heating and cooling and hot water units), and then there are also new ways to dynamically shunt power around within a region or even between states or countries. The control systems for these new options are what we mean by a “smart” grid – we’ll need to use a wide range of optimization-based planners, expert systems, and constraint solvers just to configure the platforms to operate safely, and then beyond that we want to maximize the use of renewables, lower consumer prices, ensure that power is steady and reliable, and even to protect against mishaps due to major storms, or terrorists.
We’re finding that this is the world’s most convincing Internet of Things scenario. Everyone talks about the Internet of Things, but when you look at the smart grid, you see a sound economic model for why such an Internet might be created, how it would work, and what its concrete goals might be. In contrast, for ideas that just scatter sensors everywhere, I often ask myself who would be paying the bill and why they would have an incentive to keep the thing running. And this kind of project is underway broadly: Germany and Denmark and the Netherlands are widely cited as leaders, but in the United States, Switzerland, Italy – everywhere you look, there is an explosion of research to create smart grid solutions of this kind.
RZ: But is this a big data scenario? How much data is really generated in a smart grid? What platforms are used to manage it?
KPB: Absolutely! We’ve done all sorts of paper and pencil analyses and what you quickly discover is that while the data rates from devices like the ones I’ve mentioned are low, the numbers of units one might want to monitor and control are immense. So you add this up at the scale of a city or some larger region, and suddenly you are looking at a very big data scenario. This is what drove us towards cloud computing models: those are the most cost-effective options for capturing and working with big data today. My group’s approach has been to harden the cloud infrastructure so that it can offer the real-time, reliability and security properties required.
So for example we’ve created the Vsync software library (your readers can download it from vsync.codeplex.com), and are working now on extensions that will leverage RDMA remote DMA transfers over optical networks to do reliable data replication at amazing speeds, maybe 10,000x faster than what we’ve seen in the past using file copying tools on clusters of Linux servers. Another new system, SST, will let us track the state of small groups of cooperating servers in real-time.
Then we’re integrating these low-level tools into higher-level software. For example, our new Freeze Frame File System lets us track streaming data in real-time, and can materialize very accurate slices of the data on demand, or large numbers of slices if desired, with optimal temporal guarantees and also logical consistency: you never see a mashup of data from different period somehow combined in one slice. FFFS will be using RDMA replication so that if you access it from Hadoop on Spark, the actual movement of data benefits from those amazing speeds I mentioned.
And our CloudMake software manager is a tool built on the Vsync software library that will do 24×7 system management in a style that looks to the developer just like building old-fashioned makefiles.
Our platform wraps these together – we call it GridCloud. Everything is open source, so if any of your readers sees a possible fit, they are welcome to use our solutions. We would be happy to help them if they have questions.
RZ: So now tell me what the big challenges are? Is there a need for much more research or can companies already think about products for this kind of setting?
KPB: We do see many research challenges. For example, one would want to preserve privacy, even as we automate power management in your house. The tensions there are obvious and pose big puzzles, but we think they can be solved. Another area for research is to understand how to use the cloud as a reactive control system. When you think about smart grid settings, or for that matter about smart cars that depend on help from the cloud, or smart cities that coordinate traffic flow using cloud resources, you suddenly see a rather tight round-trip loop: data is acquired, we run these machine learning and optimization codes, and then we want to promptly act on the basis of what they recommend.
The tighter that loop can be the better.
Speed is a big issue because ultimately, the time-limiting steps come down to moving big data sets from place to place. In fact this is why we are so focused now on RDMA (and on NVRAM, 3D-XPoint, and other similar solutions too). The key for us is to integrate these ultra-fast hardware options into high value software platforms.
And my specialty is strong reliability guarantees, so I go further and ask how to provide a rich formal model and correctness proofs for everything we create.
But today’s cloud is really not optimized for that particular case. In fact, as many readers will know, the cloud today believes in CAP and basically favors weak consistency at the edge, specifically to allow long data transit times from when data is captured until when the system acts upon it. So cutting that delay down and finding consistency and reliability models that are scalable – all of these are important research agendas. And the longer agenda is to shift from a CAP cloud to a consistent, real-time reactive, privacy-preserving cloud!
RZ: Well, I want to thank you very much. Where could my readers look in order to learn more about this area?
KPB: One collections of very recent videos I would recommend was the set created at the ACM SIGOPS History Day event in Monterey California last fall, just before the 25th iteration of the Symposium on Operating Systems Principles, which as you know is the top systems conference. Your readers would find the collection of videotapes at http://sigops.org/sosp/sosp15/history/index.html, and some of the talks have accompanying essays too – I just posted a 30-page essay to go with my talk, on the history of fault-tolerance. I felt that the talks gave a wonderful sense of how the systems community is shifting its goals to match this new era of big data and big systems.
You see that also in the papers at the conference, which are also online. A great many were on these kinds of topics.
For our GridCloud platform, which is what I was talking about above, they could look at http://www.cs.cornell.edu/Projects/gridcontrol/.
We’ve linked some technical reports there, and are also working on full papers on the system. All of my papers are available at http://www.cs.cornell.edu/projects/quicksilver/pubs.html.