Skip to content

On Video Analytics for Smart Cities. Interview with Ganesh Ananthanarayanan

by Roberto V. Zicari on September 7, 2019

“Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!” — Ganesh Ananthanarayanan

I have interviewed Ganesh Ananthanarayanan. We talked about his projects at Microsoft Research.


Q1. What is your role at Microsoft Research?

Ganesh Ananthanarayanan: I am a Researcher at Microsoft Research. Microsoft Research is a research wing within Microsoft, and my role is to watch out for key technology trends and work on large scale networked-systems.

Q2. Your current research focus is to democratize video analytics. What is it?

Ganesh Ananthanarayanan:  Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!

Project Rocket‘s goal is to democratize video analytics: build a system for real-time, low-cost, accurate analysis of live videos. This system will work across a geo-distributed hierarchy of intelligent edges and large clouds, with the ultimate goal of making it easy and affordable for anyone with a camera stream to benefit from video analytics. Easy in the sense that any non-expert in AI should be able to use video analytics and derive value. Affordable because the latest advances in CV are still very resource intensive and expensive to use.

Q3. What are the main technical challenges of large-scale video processing?

Ganesh Ananthanarayanan: In the hotly growing “Internet of Things” domain, cameras are the most challenging of “things” in terms of data volume, (vision) processing algorithms, response latencies, and security sensitivities. They dwarf other sensors in data sizes and analytics costs, and analyzing videos will be a key workload in the IoT space. Consequently, we believe that large-scale video analytics is a grand challenge for the research community representing an important and exciting frontier for big data systems.

Unlike text or numeric processing, videos require high bandwidth (e.g., up to 5 Mbps for HD streams), need fast CPUs and GPUs, richer query semantics, and tight security guarantees. Our goal is to build and deploy a highly efficient distributed video analytic  system. This will entail new research on (1) building a scalable, reliable and secure systems framework for capturing and processing video data from geographically distributed cameras; (2) efficient computer vision algorithms for detecting objects, performing analytics and issuing alerts on streaming video; and (3) efficient monitoring and management of computational and storage resources over a hybrid cloud computing infrastructure by reducing data movement, balancing loads over multiple cloud instances, and enhancing data-level parallelism.

Q4. What are the requirements posed by video analytics queries for systems such as IoT and edge computing?

Ganesh Ananthanarayanan: Live video analytics pose the following stringent requirements:

1) Latency: Applications require processing the video at very low latency because the output of the analytics is used to interact with humans (such as in augmented reality scenarios) or to actuate some other system (such as intersection traffic lights).

2) Bandwidth: High-definition video requires large bandwidth (5Mbps or even 25Mbps for 4K video) and streaming large number of video feeds directly to the cloud might be infeasible. When cameras are connected wirelessly, such as inside a car, the available uplink bandwidth is very limited.

3) Provisioning: Using compute at the cameras allows for correspondingly lower provisioning (or usage) in the cloud. Also, uninteresting parts of the video can be filtered out, for example, using motion-detection techniques, thus dramatically reducing the bandwidth that needs to be provisioned.

Besides low latency and efficient bandwidth usage, another major consideration for continuous video analytics is the high compute cost of video processing. Because of the high data volumes, compute demands, and latency requirements, we believe that largescale video analytics may well represent the killer application for edge computing.

Q5. Can you explain how Rocket allows programmers to plug-in vision algorithms while scaling across a hierarchy of intelligent edges and the cloud?

Ganesh Ananthanarayanan: Rocket ( is an extensible software stack for democratizing video analytics: making it easy and affordable for anyone with a camera stream to benefit from computer vision and machine learning algorithms. Rocket allows programmers to plug-in their favorite vision algorithms while scaling across a hierarchy of intelligent edges and the cloud.

The figure above shows our video analytics stack, Rocket, that supports multiple applications including traffic camera analytics for smart cities, retail store intelligence scenarios, and home assistants. The “queries” of these applications are converted into a pipeline of vision modules by the video pipeline optimizer to process live video streams. The video pipeline consists of multiple modules including the decoder, background subtractor, and deep neural network (DNN) models.

Rocket partitions the video pipeline across the edge and the cloud. For instance, it is preferable to run the heavier DNNs on the cloud where the resources are plentiful. Rocket’s edge-cloud partitioning ensures that: (i) the compute (CPU and GPU) on the edge device is not overloaded and only used for cheap filtering, and (ii) the data sent between the edge and the cloud does not overload the network link. Rocket also periodically checks the connectivity to the cloud and falls back to an “edge-only” mode when disconnected. This avoids any disruption to the video analytics but may produce outputs of lower accuracy due to relying only on lightweight models. Finally, Rocket piggybacks on the live video analytics to use its results as an index for after-the-fact interactive querying on stored videos.

More details can be found in our recent MobiSys 2019 work.

Q6. One of the verticals your project is focused on is video streams from cameras at traffic intersections. Can you please tell us more how this works in practice?

Ganesh Ananthanarayanan: As we embarked on this project, two key trends stood out: (i) cities were already equipped with a lot of cameras and had plans to deploy many more, and (ii) traffic related fatalities were among the top-10 causes of deaths worldwide, which is terrible! So, in partnership with my colleague (Franz Loewenherz) at the City of Bellevue, we asked the question: can we use traffic video analytics to improve traffic safety, traffic efficiency, and traffic planning? We understood that most jurisdictions have little to no data on the continuous trends on directional traffic volumes; accident near-misses; pedestrian, bike & multi-modal volumes, etc. Data on these is usually got by commissioning an agency to count vehicles once or twice a year for a day.

We have built technology that analyzes traffic camera feeds 24X7 at low cost to power a dashboard of directional traffic volumes. The dashboard raises alerts on traffic congestion & conflicts. Such a capability can be vital towards traffic planning (of lanes), traffic efficiency (light durations), and safety (unsafe intersections).
A key aspect is that we do our video analytics using existing cameras and consciously decided to shy away from installing our own cameras. Check out this project video on Video Analytics for Smart Cities.

Q7. What are the lessons learned so far from your on-going pilot in Bellevue (Washington) for active traffic monitoring of traffic intersections live 24X7?  Does it really help preventing traffic-related accidents?  Does the use of your technology help your partners with jurisdictions to identify traffic details that impact traffic planning and safety?

Ganesh Ananthanarayanan: Our traffic analytics dashboard runs 24X7 and accumulates data non-stop that the officials didn’t have access to before. It helps them understand instances of unexpectedly high traffic volumes in certain directions. It also generates alerts on traffic volumes to help dispatch personnel accordingly. We also used the technology for planning a bike corridor in Bellevue. The objective was to do a before/after study of the bike corridor to help understand the impact of the corridor on driver behavior. The City plans to use the results, to decide on bike corridor designs.

Our goal is to make the roads considerably safer & efficient with affordable video analytics. We expect that video analytics will be able to drive decisions of cities precisely in these directions towards how they manage their lights, lanes, and signs. We also believe that data regarding traffic volumes from a dense network of cameras will be able to power & augment routing applications for better navigation.

As the number of cities that start to deploy the solution increase, it will only increase the accuracy of the computer vision models with better training data, thus leading to a nice virtuous cycle.

Qx Anything else you wish to add?

Ganesh Ananthanarayanan: So far I’ve described our video analytics solution on how it uses video cameras to continuously analyze and get data. One thing I am particularly excited to make happen is to “complete the loop”. That is, take the output from the video analytics and in real-time actuate it on the ground to users. For instance, if we predict an unsafe interaction between a bicycle & car, send a notification to one or both of them. Pedestrian lights can be automatically activated and even extended for people with disabilities (e.g., in a wheelchair) to enable them to safely cross the road (see demo). I believe that the infrastructure will be sufficiently equipped for this kind of communication in a few years. Another example of this is warning approaching cars when they cannot spot pedestrians between parked cars on the road.

I am really excited about the prospect of the AI analytics interacting with the infrastructure and people on the ground and I believe we are well on track for it!


Ganesh Ananthanarayanan is a Researcher at Microsoft Research. His research interests are broadly in systems & networking, with recent focus on live video analytics, cloud computing & large scale data analytics systems, and Internet performance. His work on “Video Analytics for Vision Zero” on analyzing traffic camera feeds won the Institute of Transportation Engineers 2017 Achievement Award as well as the “Safer Cities, Safer People” US Department of Transportation Award. He has published over 30 papers in systems & networking conferences such as USENIX OSDI, ACM SIGCOMM and USENIX NSDI. He has collaborated with and shipped technology to Microsoft’s cloud and online products like the Azure Cloud, Cosmos (Microsoft’s big data system) and Skype. He is a member of the ACM Future of Computing Academy. Prior to joining Microsoft Research, he completed his Ph.D. at UC Berkeley in Dec 2013, where he was also a recipient of the UC Berkeley Regents Fellowship. For more details:


Rocket (

Related Posts

– On Amundsen. Q&A with Li Gao tech lead at Lyft, Expert Article, JUL 30, 2019

–  On IoT, edge computing and Couchbase Mobile. Q&A with Priya Rajagopal, Expert article, JUL 25, 2019


From → Uncategorized

No comments yet

Leave a Reply

Note: HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS