Our ability to collect videos en masse can revolutionize how we interact with the world by enabling powerful virtual reality (VR) video applications in education, tourism, tele-presence and other areas. Many of these new applications involve processing and serving 360-degree stereoscopic videos, which require dramatic improvements in technology to manage and process the massive-scale visual data necessary for truly immersive experiences.
At SIGMOD ’17, we demonstrated the first prototype of VisualCloud, a new database management system (DBMS) designed to efficiently ingest, store, and deliver VR content at scale.
VisualCloud currently targets 360-degree videos, which allow a user, through the use of a VR head-mounted display or mobile device, to observe a scene from a fixed position at any angle. These videos are captured using multiple cameras and produced using software that stitches together parts of each frame to produce an approximate (potentially stereoscopic) spherical representation. Devices that support the ability to record and view VR video have become increasingly popular, and efficiently managing this type of data has thus become increasingly important.
Read the full publication: “VisualCloud Demonstration: A DBMS for Virtual Reality.” Brandon Haynes, Artem Minyaylov, Magdalena Balazinska, Luis Ceze, and Alvin Cheung. SIGMOD 2017 *** Best Demonstration Honorable Mention Award ***
VisualCloud reduces the data volume required to stream 360-degree VR video. This is particularly important for mobile viewers, who are subject to bandwidth and battery power constraints. This reduction is important because data sizes involved in streaming and storing 360-degree videos far exceed those seen with ordinary 2D videos. A single frame of uncompressed 2D video at 4K resolution (3840 x 2160 pixels) requires approximately 24MB to store. In contrast, to render a 4K 360-degree video on a headset with a 120-degree field of view, we need a much larger frame (approximately 3x higher resolution) since only a portion of the projection is viewed at any time. Persisting each such frame requires more than 9x space compared to its 2D counterpart. For stereoscopic videos, this requirement doubles because one video per eye is needed.
Existing streaming VR video platforms (e.g, YouTube) treat VR video the same way as ordinary 2D videos, and are thereby poorly suited for dealing with the massive quantities of data that high-resolution VR video streaming requires. When 360-degree video is delivered over a network, these platforms reduce bandwidth only in the face of network congestion and do so by reducing video quality. This is called adaptive streaming and it has been used for many video-streaming web sites. With adaptive streaming, rather than streaming all video frames at the same quality, the server temporally segments an input video and encodes each fragment at various qualities. A client then requests fragments at the appropriate quality based on current available network capacity. The fragments are concatenated on the client before being rendered.
In contrast, VisualCloud aims to dramatically cut bandwidth requirements without significant impact on quality. We currently focus on 360-degree videos, but plan to generalize this to other kinds of videos (e.g., augmented reality) in the future. The initial design and prototype aim to reduce the amount of data that needs to be transferred to the viewer without impacting the immersive experience of 360-degree videos. This is important as reducing the amount of data streamed to viewers has been shown to reduce both network traffic and battery consumption, as we show below.
As illustrated in Figure 1, VisualCloud stores, retrieves, and streams both archived and live VR data. To reduce the amount of data that needs to be transferred to viewers, VisualCloud segments 360-degree videos both in time and in space; the process is illustrated in Figure 2. This design is inspired by recent work  that demonstrated substantial savings from degrading the out-of-view portions of each frame. Since it is advantageous to deliver out-of-view segments at lower quality, VisualCloud prefetches and prioritizes the spatiotemporal segments that are most likely to be viewed. VisualCloud identifies tiles that are likely to be viewed by jointly (i) applying dead reckoning to predict a user’s future orientation and (ii) for archived videos, increasing the priority of tiles that are frequently viewed by other users. It transfers those segments using the highest resolution and other segments with lower resolutions. Additionally, VisualCloud implements in-memory and near real-time 360-degree video partitioning and preprocessing to generate multi-resolution data segments and reduce bandwidth utilization–even for live streams.
VisualCloud builds on recent work in multidimensional array processing, including the SciDB andTileDB systems produced previously in the context of the ISTC for Big Data. It also introduces techniques for VR data storage and retrieval and near real-time in- memory processing of VR videos. Our system combines the state of the art in array-oriented systems (e.g., an efficient multidimensional array representation, tiling, prefetching) with the ability to apply recently introduced optimizations by the multimedia community (e.g., motion-constrained tile sets) and the machine learning community (e.g., path prediction). As a result, VisualCloud reduces bandwidth and power consumption on client devices, while at the same time scaling to support streaming to many concurrent connections.
“VisualCloud combines the state of the art in array-oriented systems…with the ability to apply recently introduced optimizations by the multimedia and machine learning communities.”
More information about the project is available on our project website. Please bookmark the site and check back for availability of code on GitHub.
 A. Zare, A. Aminlou, M. M. Hannuksela, and M. Gabbouj. “HEVC-compliant tile-based streaming of panoramic video for virtual reality applications.” In Proceedings of the 2016 ACM on Multimedia Conference (ACMM 2016), pages 601-605.