The success of streaming video has generated interest in newer forms of multi-perspective video content, such as those generated by 360-degree cameras, multi-angle camera arrays, or (iii) light-field cameras. The immersive experience provided by these cameras can enhance user satisfaction for broadcast performances (e.g., theater, concerts) and events (e.g., graduation ceremonies), as well as online meetings. With content from these cameras, users do not just passively consume content, but may interactively traverse the content along many different paths from a perspective of their choice, with different users observing different perspectives of the same content. To support this at Internet-scale is challenging; client players must be able to download and content distribution networks (CDNs) must be able to generate perspectives on demand, while ensuring low latency, and must support a variety of devices for capture and consumption of this content.

This project explores architectural enhancements, algorithms, and techniques to deliver multi-perspective video at Internet-scale. It couples delivery optimization  with video coding and human computer interaction. At its core, the project will develop interactivity abstractions by which content publishers can specify the range of perspectives users are permitted to choose from at each point in the video. The interactivity specified in a video will drive perspective coding and novel dynamic perspective generation algorithms. It will also enable infrastructure provisioning to meet CDN storage and cost constraints, and will guide adaptive perspective retrieval from CDN servers or from nearby caches. Finally, the project will explore methods to predict and guide user behavior to improve delivery quality based on user studies.