During the Roland-Garros 2020 tournament, France TV presented several demonstrations of augmented tennis. This blog describes the technical details of the project.
This blog was written by France TV with the support of Thierry Fautier, vice president of video strategy at Harmonic; Jean Pierre Navarro global account manager, Software Division at Intel; Christophe Massiot, president at Easy tools; Stephane Desproges, director of application engineering at VisualOn, and Morten Kolle CEO at Pixop
A press release announcing the project can be found here. A video of the Multiview demo can be found here, a video of the AV1 and AI upconversion can be found here.
A French version of the Roland-Garros 2020 Lab for a brider audience can be found here.
Genesis of the project
In March 2020, before the COVID-19 pandemic caused global lockdowns, Vincent Nalpas, director of services innovations at France Télévisions, met with us and expressed his willingness to see some immersive experiences in 2020.
France Télévisions has been a pioneer in video, with the first live 4K demonstration in 2014 and first live 8K demonstration in 2019. Now they want to explore more augmented experiences. We came up with lots of ideas, such as VR or volumetric capture, but due to the pandemic, we had to narrow down the choice to scenarios that could be implemented within the constraints imposed by the French government. Then the pandemic hit with full force. The first impact was when the event was moved from May to September. The second impact was that no equipment could be installed in the France TV compound, which led us to build a cloud-based solution in less than 30 days.
This blog will cover the following applications:
- Increased interactivity: Make the user an actor of his viewing experience
- Increased quality: Revisit images of the past in UHD thanks to artificial intelligence
- Increased reach: Use new AV1 codecs to reach more people and more devices
Introduction
We present in this blog the Multiview project, how the content was captured, kept synchronized, delivered using a Cloud based solution and played back on several Android devices. We have paid specific attention in measuring the end to end delay as we make use of Cloud technology coupled with SRT for the contribution and CMAF DASH low latency for the OTT distribution.
The second demonstration shows how SD content can be upconverted to UHD at the head end side and what level of quality is now possible using AI vs classical upconversion filters developed for real time applications.
The last demonstration shows a Cloud to PC AV1 Live compression and transmission of 1080p50 to the latest generation of Intel processors (CPU & GPU).
We provide in this blog the details of the solution deployed during the 2 weeks as well as the measurements we have done.
We believe all those demonstrations are a world’s first.
Multiview concept
Multiview is a concept that has been deployed for many years, either for multigame or multicamera applications. For multigame, we only send one HD, and the decoder will have to tune to another HD when another game is selected, it is considered as a channel change, therefore, the switching delay is three to five seconds, not acceptable for multi cameras applications.
For multicamera, a simple and reliable way is to send a mosaic of 4xHD composed as a UHD video, present the four quadrants and let the user pick the view he wants. The player will play the selected HD video upconverted to the UHD display. A drawback of this approach is it’s difficult to handle the 4 quadrants independently. Platforms’ UI tools are optimized to handle full video views but not a portion of video views. This currently limits the creativity for UI design. The solution we decided to implement was to build an UHD stream made of the four quadrants and work on the UI side to make all possible UI options possible. This kept the four views fully synchronized and the player present on the screen, with a “stream picker” made of a low-resolution view of the four different views (coming from the ABR ladder). Figure 1 shows the layout on mobile devices, and Figure 2 shows the layout on TVs. As 4K-capable devices are now widespread, including smartphones, tablets and STBs, we believe we can reach a broad audience in developed countries with this solution. (source Stephane Desproges)
Another alternative for a seamless switch is to send 4 HD streams, have the decoder decode all HD streams and present only the selected HD video on the UHD screen. In terms of network traffic, this is more efficient compared to sending 1 UHD, as the we have measured better performance of CAE in the current state of the algorithm. The drawback of this approach is it is not possible to synchronize the 4 streams instantly, and in case of DRM use, 4 licenses have to be managed. This option needs further investigations and needs more time to be fully fledged.
Some other implementations use either proprietary protocols or proprietary hardware on the headend side to perform the same functionality on a selected set of devices. In our case, the encoder/packager and player/decoder were standard based. Moreover, we also supported low latency with the use of the DASH Low Latency standard based on CMAF segments. On the device side, we were able to demonstrate on any type of 4K capable Android device, including mobile, tablet, Android STB, Android TV STB and connected TVs.
The low-latency extensions to DASH with CMAF segments were deployed for two main reasons. The first reason is because with an at-home experience people do not accept OTT streams to be 30 seconds behind broadcast. The second reason is when we deploy it in a stadium over 5G, consumers will not tolerate a long delay. CMAF DASH is now a mature technology that can deliver in best cases five to seven seconds behind live action.
Last but not the least, the VisualOn technology allowed us to align the playback of different DASH streams in the same LAN. Moreover, relying on this synchronization mechanism and a simple control protocol, the app located on the mobile can let users take control of the TV experience from their mobile devices.
This demonstration was a world first in the sense that we demonstrated a seamless transition of several HD views, on any Android device, with low latency and control over the TV experience from mobile devices.
Overall architecture
As we could not bring any equipment in the France TV compound at Roland-Garros, we had to use a cloud-based solution. Figure 3 describes the overall multiview architecture.
In the selected architecture, the mosaic encoder was located at the production site in Roland-Garros, an SDI input was received and since ther was one single server, this was not a problem. For internet distribution, an UHD AVC TS output was transmitted using SRT (Secure Reliable Transmission). The video distribution processing workflow made of encoding, packaging and catch-up functionalities was a natural candidate to be put in a private cloud powered by Intel servers. As the number of clients was limited, there was no need to use a CDN. The content was made available directly from the origin to clients located in Paris (France TV), Rennes (Harmonic France), Toulouse (VisualOn France), San Jose CA USA (Harmonic and VisualOn headquarters), Houston TX UAS (VisualOn U.S.) and China (VisualOn R&D).
Access to the content was secured and restricted through the use of HTTPS and a white list of OTT clients’ IP addresses.
SRT transmission
SRT is an open source technology that was released in 2017 together with the SRT Alliance.
SRT provides connection and control, and reliable transmission similar to TCP; however, it does so at the application layer, using UDP protocol as an underlying transport layer. It supports packet recovery while maintaining low latency (default: 120 ms). SRT also supports encryption using AES.
The protocol was derived from the UDT project, which was designed for fast file transmission. It provided the reliability mechanism by utilizing similar methods for connection, sequence numbers, acknowledgements and retransmission of lost packets. It utilizes selective and immediate (NAK-based) retransmission.
SRT added several features on top of that in order to support live streaming mode:
Controlled latency, with source time transmission (timestamp-based packet delivery)Relaxed sender speed controlConditional "too late" packet dropping (prevents head-of-line blocking caused by a lost packet that wasn't recovered on time)Eager packet retransmission (periodic NAK-report)
During the event we measured a drop packet rate of 0%, and did not see any impact on the video quality.
The SRT setting was 500 ms default latency, buffer 4096 packets, AES-128 with 18 ms RTT from OVH to the Harmonic cloud.
RTT between Roland-Garros and OVH was 5.5 ms on average, with a peak of up to 66 ms. Retransmission was between 0-5 packets, with a peak of up to 700.
Content production workflow
The goal was to reuse existing feeds produced for the event to power our demonstrations. On the production side, the 1080i feeds were produced by France Télévisions and were composed of the on-air signal and three camera views we were able to select out of the one view produced.
Figure 4 describes the production workflow.
Mosaic content preparation
The mosaic took the four selected 1080i SDI signals and created a 2160p50 video in AVC format at 50 Mbps, and LC AAC at 192k.s for audio. The mosaic was a based on Easy tools technology and used open source FFMPEG software with de-interlacing and a France TV logo overlay deployed on a virtual machine on an Intel Core™ i9-9900K CPU @ 3.60GHz server located at the France TV production facility.
Since the feed needed to be sent over internet, it was transmitted using SRT (Secure Reliable Transmission) protocol to the OVH data center.
Figure 5 describes the mosaic generation.
The mosaic output was encoded in AVC in CBR at 50 Mbps.
We also had the capability to record the TS file before it was sent over the internet. This can be used for multiple purposes: debug, further processing in the future as well as to compare the source and the destination after SRT delivery.
The PC acquires the 4 HD-SDI inputs (1080i50) from its Blackmagic DeckLink Duo 2 card, and achieves the following video processing:
- Deinterlacing of the 1080i50 videos to 1080p50 using the yadif algorithm in “send_field” mode, which occasionally creates artefacts, for instance with thin lines.
- Overlaying of the france.tv lab logo in the upper right corner on all videos, and optionally of a timecode in the upper left corner of the first video to determine the end-to-end latency.
- Creation of the 3840x2160p50 mosaic using the 4 videos, and re-embedding of the audio of the first video input.
- Encoding of the resulting mosaic to MPEG-4 AVC @50 Mbi/s/LC-AAC @192 kbi/s, and transmission to EasyTools cloud via SRT.
Video delivery processing workflow
The video delivery processing consisted of different elements:
SRT reception of the UHD mosaicRecording of content after SRT reception on a NASTranscoding from UHD AVC to HEVC multiratePackaging in CMAF DASH low latency formatLive to VOD for catch-up applications (VOD assets recorded on the NAS)
Figure 6 captures the delivery workflow in CMAF DASH low latency.
Live transcoding was performed by Harmonic’s EyeQ® Content Aware Encoding (CAE) technology with HEVC, which encodes at the lowest possible bit rate based on the content complexity while always preserving the highest quality.
The average bit rate transmitted was about 8.5% lower than the max bit rate used, meaning, in good reception conditions the average consumed top UHD bit rate was 18.3 Mbps. The lower saving vs what can be expected with CAE can be explained by the fact the encoders sees a unnatural scene made of four quadrants. When we encode each quadrant separately, meaning we send four different HD or we encode four tiles in a UHD frame, we measure an average savings of 20%. Thus, there is still room to improve compression efficiency.
The video quality on a 4K TV was judged good by France TV on the decoded HD streams.
The complete Harmonic solution was a VOS cloud-native offering, based on Docker containers orchestrated by Kubernetes. For this demonstration, the VOS solution was deployed on Intel dual socket server modules equipped with Intel Xeon® Gold 6252 CPU @ 2.10GHz (2RU Buchanan pass frames).
The NAS was used to store content after SRT reception and after packaging. The NAS total storage capacity was set to 14 TB with a RAID 1 redundancy scheme.
Player
The player was provided by VisualON and was powering a stand-alone Android app on the following 4K-capable devices demonstrated at Roland-Garros.
As can be seen on Table1, on the tested devices, we did not see any performance limitations in switching from one view to another, while keeping all the views fully in sync. Porting on Android TV is currently ongoing, but as the performance depends on the chipset and the memory used, the results will widely vary.
The VisualOn stand-alone Android app is based on VisualOn player. The VisualOn app streams in CMAF:
The UHD mosaic A low resolution of the mosaic to be used for the stream picker
For the demo, the app had two players instantiated. The app received and decoded all the streams. Once a view was selected, the display process extracted an HD window out of the UHD decoded file in the frame buffer, upconverted it and composed it with the stream picker on the device screen. See Figure 1 and Figure 2 for details.
The interface of the app is completely programable, and it is different for mobile and TV devices. At any time during the demo, the user could remove the stream picker when watching in full screen mode.
From a user experience, the user saw a smooth transition between each view, without any drop in the video or audio. You can watch a demo of the user experience here.
The solution developed by VisualOn also enables one to control the Android STB with the tablet/mobile, which gives an even more intuitive experience. This is made possible by synchronizing the different players when located in the same LAN.
Delay
We decided to use a low-latency technology for the final distribution phase, it is all the more interesting to measure the end-to-end delay from camera to device as we use a cloud based mechanism to deliver the content.
The delay for the whole chain is made of the different components:
The measured end-to-end delay using a wall clock was between 10s (aggressive setting on player side) and 15s (non aggressive setting on player side), with a nominal value of 10.5 seconds with clients in Paris (France) , Rennes (France) and Texas (US), which is aligned with theoretical values.
Further improvements can be achieved by:
Reducing from 2 to 1 hop (remove OVH intermediate stage)Putting the encoder in low delay mode, thus increasing the bandwidthHave a more aggressive player strategy, reducing the buffer size but also increasing the buffering eventualities
This could bring the delay to around seven to eight seconds in production.
Super Resolution
Super Resolution is a technique used to upconvert a lower resolution to higher resolution images or video. Until now, Super Resolution techniques were used for SD to HD conversions and for HD to UHD. France TV, who has a large library of SD tennis archives, was interested in looking at how Super Resolution could solve the problem of SD to UHD conversion. The starting point was the performance of the real-time system implemented in Harmonic’s encoders using Lanczos filter and implemented in a commercial product.
Figure 7 describes the content preparation workflow for the different upconversion techniques tested.
The Super Resolution solution selected was provided by Pixop, a Danish startup, that can upscale offline SD to UHD using a cloud-based solution based on Machine Learning, a subset of AI. The algorithm is very sensitive to the quality of the source, especially for SD, that in some cases can come from an analog broadcast. In that case, a deep restoration process making use of convolutional networks is needed prior to Super Resolution processing. The performance is much better than the upconversion done on standard 4K TVs and is better than the best-in-class, real-time upconversions using a Lanczos filter.
Pixop uses two different algorithms in a sequence for getting all the way from SD up to 4K:
1. Pixop Deep Restoration (a deep learning neural network technique that does most of the magic in terms of deblurring, mitigating the effects of digital compression, and restoring details in textures) for upscaling from SD to HD
2. Pixop Super Resolution (the company’s own variant of a Super Resolution technique called RAISR that was originally invented by Google researchers) for converting the HD video produced into 4K
The second step is currently necessary due to lack of GPU memory and will not be required in the future when better hardware is available and/or tweaks have been made in our implementation, for example when upscaling from SD to 4K can be done in a single step/algorithm.
The Pixop processing makes use of both GPU-based (four GPUs) cloud instances for the content restoration and CPU-based (48 cores) cloud instances for the Super Resolution. The global processing runs 10x slower than real time for SD to UHD conversion, meaning the conversions need to be applied before loading the file on the playout server.
The UHD 4K outputs of the different processing were encoded by Harmonic in a 4K split-screen mode in HEVC in a TS mezzanine format at 50 Mbps or in a HD split-screen mode (cropped area from the 4K output) in AVC in a MP4 format at 15 Mbps.
Figure 8 shows the system used to compare the different types of upconversion.
We were able to test the different approaches on a 4K TV, as well on a PC where we were looking at a HD window extracted from the 4K decoded video.
We provided a 4K image extracted from one decoded frame in split screen mode in Figure 9.
We provided a HD video that is the HD window extracted from the 4K decoded video from the Lanczos and Pixop upconverted videos, in split screen.
Overall, France TV was impressed by the first results and plans to compare different Super Resolution technologies on a broader range of content.
AV1 codec
AV1 is a codec that has been developed by the Alliance for Open Media, a group that Intel is a member of. The codec was released in April 2018. Intel has developed its SVT-AV1 open source codec (https://01.org/svt) that is now the official reference code base for the AV1 project. Intel has also been working on PC chipsets that can decode AV1 streams in hardware up to 4Kp60 resolution.
The goal of the 2020 Roland-Garros demonstration was to demonstrate a live 1080p50 AV1 streaming chain to a PC client, which was a world premiere.
Figure 11 provides details on the workflow used for the AV1 stream generation.
An Easy tools encoder provided a 1080p50 AVC over SRT stream that was encoded in AV1 with an Intel Xeon 8280 server. A live 1080p50 video can fit in a dual socket server, which is a big improvement compared with the initial encoding speed results when AV1 was released in 2018.
Figure 12 shows a screen capture of one live encoding.
The Intel SVT-AV1 encoder was set at 2 Mbps in CBR mode and packaged in HLS. We have also tested different other configurations of Intel SVT-AV1 encoder and provide the subjective evaluation result on PC (15”) and 4K TV (65 “)
This shows that to address both PC and TV screens, AV1 HLS stream will need at least 2 profiles at 3M/s and 4M/s.
Overall AV1 delivered a superior visual quality when reducing the bitrate below 4M/s than HEVC using Harmonic’s HEVC Live encoder as a reference.
In conclusion, AV1 is competitive vs HEVC and is the only UHD codec that can be played on web browsers.
This study only relates to the tennis content we have processed during the 2 weeks of Roland Garros.
On the decoding side, Intel has developed its 11th Gen Intel i7 Core processors with Intel Iris Xe graphics native AV1 playback support in consumer laptops. This will improve the quality of experience on a PC versus a software-based approach where the CPU is always shared with other tasks. Figure 13 show the system used for the playback of the AV1 stream.
The playback is done on a PC by VLC that uses Intel GPU acceleration to decode the AV1 stream. The decoder was demonstrated in 1080p50, but can support up to 4Kp60.
The GPU load measured during the decoding of a 1080p50 AV1 sequence at 2 Mbps was between 5% to 10%.
For the record, Harmonic demonstrated live AV1 1080p60 on a STB at IBC 2019. The problem is AV1 must be consumed on all devices, so PC support is of paramount. With this demonstration Intel proves that AV1 can now be supported in hardware on new PC platforms, which will incentivize OTT service providers like Netflix, Amazon and YouTube to deploy AV1 faster.
What does this mean for the future of live sports delivery?
The combined efforts of Intel, Easy tools, Harmonic, VisualOn and Pixop resulted in world-class demonstrations that will enable an increase in interactivity, an increase in quality when consuming archived content and an increase in reach with AV1.
During the entire time we worked virtually and could not be at Roland Garros in person, so we will share a Zoom photo of all the people who contributed to the success of this Roland Garros 2020 demonstration.
Present on the photo : Vincent Nalpas (France TV), Jean Paul Chevreux (France TV), Yves-Marie Poirier (France TV), Christophe Massiot (Easy tools), Thierry Fautier (Harmonic), Patrick Gendron (Harmonic), Xavier Ducloux (Harmonic), Christophe Coquerel (VisualOn), Stephane Desproges (VisualOn), Francois Hannebicq (Intel) , Jean-Pierre Navarro (Intel), Pierre Vallée (Intel)