uvgRTP 3.0: Towards V3C Volumetric Video Communication

Low-latency volumetric video transport is a key enabling technology for more immersive communication applications. This paper presents the latest release of our open-source Real-time Transport Protocol (RTP) library called uvgRTP 3.0 that has been upgraded to support Visual Volumetric Video-based Coding (V3C) transmission in Video-based Point Cloud Compression (V-PCC) and MPEG Immersive Video (MIV) formats. uvgRTP 3.0 introduces 1) the V3C atlas RTP payload format; 2) two multiplexing methods to reduce port reservations during V3C transmission; and 3) improved packet reception with multithreading. Our performance results show that uvgRTP 3.0 can encrypt and transmit V3C bitstreams at 106 Mbit/s, with CPU core utilization of 26%, and achieving a round-trip latency of 2 ms in a local area network. The support for high-speed, encrypted V3C communication with the permissive BSD-license make uvgRTP 3.0 a potential transmission library for any industrial or academic volumetric communication system.


Introduction
Visual volumetric media technologies lay the foundation for deploying extended reality (XR) in applications like live communication, remote supervision, and remote piloting.However, this implies an order of magnitude higher data rates which challenges communication applications to meet the required transmission speed with low latency.
The Moving Picture Experts Group (MPEG) has addressed the transmission and storage needs of volumetric visual data by introducing the Visual Volumetric Video-based Coding (V3C) standard suite, specified as ISO/IEC 23090-5 [1].As shown in Figure 1, V3C contains three standard specifications called Video-based Point Cloud Compression (V-PCC) [1], MPEG Immersive Video (MIV) [2], and Video-based Dynamic Mesh Compression (V-DMC) [3], of which V-PCC and MIV have already been published.V-PCC is designed for dynamic point clouds, MIV for multi-plane images as well as for multi-view and depth videos, and V-DMC for dynamic meshes.All of them are built using established 2D video compression techniques while conforming to the common V3C bitstream structure.
In ISO/IEC 23090-10 [4], MPEG specifies a container format for V3C that can be used by Dynamic Adaptive Streaming over HTTP (DASH) for carriage.However, the transmission delay of DASH is not ideal for applications with stringent latency requirements.Instead, Real-time Transport Protocol (RTP), specified in Request for Comments (RFC) 3550 [5], is commonly used for low-latency streaming, such as communication.An RFC draft [6] describes how V3C content should be streamed using RTP.uvgRTP [7] is our low-latency open-source RTP streaming library that supports a wide range of codec-specific payload formats, such as Advanced Video Coding (AVC) [8], High Efficiency Video Coding (HEVC) [9], and Versatile Video Coding (VVC) [10].In its third major release, uvgRTP 3.0 becomes the first open-source RTP library to support low latency transmission of V3C bitstreams over the network, making it a promising component for future volumetric video communication applications.For practical and efficient V3C streaming, the following features have been implemented: • support for the V3C atlas RTP payload format; • new multiplexing methods to reduce port reservations during V3C transmission; • introduction of multithreading to enhance the packet reception process.Furthermore, IPv6 compatibility was introduced to meet modern networking requirements.The uvgRTP library is available at: https://github.com/ultravideo/uvgRTPThe remainder of the paper is structured as follows.Section 2 provides an overview of the V3C bitstream structure, describes V3C transmission over RTP, and goes over existing opensource libraries for RTP streaming.Section 3 presents our uvgRTP 3.0 library by focusing on its new features designed for low-latency V3C streaming, Section 4 reports the performance results, and Section 5 addresses the usage and applications of the uvgRTP library.Finally, Section 6 discusses the limitations of V3C bitstream formats in real-time applications and Section 7 concludes the paper.

V3C Bitstream Formats
The V3C standard [1] defines two V3C bitstream formats: V3C unit stream and V3C sample stream.Both of them can use up to seven V3C unit types.Table 1 tabulates these unit types along with their sub-bitstream types and use in standardized V3C codecs.All V3C units contain a V3C unit header that specifies the unit type.All sub-bitstream types are composed of Network Abstraction Layer (NAL) units.Video sub-bitstreams contain video data encoded with an AVC [11], HEVC [12], or VVC [13] video codec, whereas atlas sub-bitstreams contain atlas data.
The V3C unit stream [1] overall structure is illustrated in Figure 2. It contains V3C units segmented into groups called group of frames (GOF).A single GOF contains one V3C unit of each type required by the codec.The V3C units can contain data to reconstruct one or more frames, depending on the GOF size.The V3C unit stream cannot be used in applications without any additional parsing information.
A V3C sample stream [1] is a V3C unit stream, where each V3C unit is prepended by its size.A V3C sample stream header is added at the beginning of the bitstream, specifying the number of bytes used to represent the V3C unit sizes.The V3C sample stream is the format used by V3C codec test models.

V3C Transmission over RTP
In V3C bitstream transmission, the first step is to parse it into V3C units.Then, the sub-bitstreams inside the V3C units are parsed into NAL units, which are then transmitted via RTP.V3C Parameter Set (VPS) and the information contained in V3C unit headers are conveyed outside of RTP transmission via other means, such as Session Description Protocol (SDP) [14].
For video sub-bitstreams, existing RTP payload formats, AVC [8], HEVC [9], or VVC [10], are used.For atlas subbitstreams, a V3C atlas RTP payload format has been defined in an RFC draft [6].The draft specifies rules for packetization of NAL units in atlas sub-bitstreams.To avoid Internet Protocol (IP) layer fragmentation, the fragmentation process is specified for NAL units larger than the maximum transmission unit (MTU) of the network.In addition, the draft includes instructions for mapping the information from VPS and V3C unit headers into SDP messages.

Existing RTP Streaming Libraries
Table 2 characterizes the existing open-source RTP libraries based on their support for video and atlas RTP payload formats.None of them natively support the V3C atlas RTP payload format, which is a prerequisite for V3C transmission over RTP, together with at least one of the tabulated video RTP payload formats.
Nokia Technologies has created a plugin [19] for GStreamer, which takes V3C bitstreams packaged into ISO base media file format (ISOBMFF) [20] as input and outputs RTP packets.However, there is currently no open-source tools available for packaging V3C bitstreams into the ISOBMFF format.Moreover, ISOBMFF is a storage format and is inherently not designed for low-latency communication.As described next, our uvgRTP 3.0 library is developed for real-time V3C transmission.When streaming between two V3C applications, both of them create a session module for connections, as depicted in Figure 4.A separate media streamer is applied for each type of V3C unit with RTP payload.In addition, another media streamer is created for audio.Outside of the uvgRTP library, V3C applications can use SDP for transmission of VPS and V3C unit header information.

Extended RTP Payload Format Support
The RTP payload formats module takes care of operations specific to the chosen RTP payload format.It can support various video RTP payload formats such as AVC, HEVC, or VVC.uvgRTP 3.0 extends it with support for the V3C atlas RTP payload format.In addition, the generic RTP payload format can be used for other types of RTP payload that do not require any format-specific features.

Port Multiplexing and Demultiplexing
Transporting a V3C bitstream with the uvgRTP library requires multiple media streamers.uvgRTP 2.0 required one network port for each media streamer and another for each associated RTCP module, so V3C application would have needed to reserve up to 14 ports for each session.This increases the probability of port collisions with other applications.Furthermore, it raises the complexity of required firewall and router configurations.To that end, uvgRTP 3.0 introduces two multiplexing techniques to minimize the number of ports required: 1) synchronization source (SSRC) multiplexing [21] and 2) RTCP to RTP port multiplexing [22].
Using SSRC multiplexing, as described in RFC 8872 [21], all media streamers within one session share a single network socket module for all transmissions.This network socket module binds to a specific port number.With multiple peers, the socket factory module distributes the network sockets for different sessions.Packets are forwarded to correct media   streamers based on a unique 32-bit SSRC identifier.The SSRC values of media streamers are communicated outside of the uvgRTP library, for example via SDP.For sent packets, the SSRC field in the RTP packet header is set to the SSRC identifier of the media streamer module.Packets from multiple media streamer modules are then multiplexed together and transmitted through a single port.On the receiving end, the packet reception module uses the SSRC identifier in the headers of received packets to distribute them to corresponding media streamers.
In addition, the usage of RTCP to RTP port multiplexing, as specified in RFC 5761 [22], enables the use of the same port for both RTCP and RTP transmissions, with RTCP and RTP packets being demultiplexed based on packet header structure.This technique is compatible with SSRC multiplexing, as RTCP packet headers also include the SSRC field.

Multithreaded Packet Reception
In uvgRTP 2.0, a single thread handles reading and processing of received packets, causing reliability issues and increased latency.To address this, uvgRTP 3.0 improves the reception process with multithreading.One thread is dedicated to reading packets from a network socket module and another one processes them.These threads communicate through a ring buffer, the size of which can be configured by the user.Multithreading in packet reception improves the processing capacity and reduces latency with large bitstreams.

End-to-end Workflow
Figure 5 illustrates the end-to-end workflow of uvgRTP 3.0 for encrypted transmission of a V3C bitstream.The sender V3C application gives NAL units as input to the media streamer modules inside a session.Fragmentation is performed for large NAL units by the RTP payload formats module.This produces smaller fragmentation units (FU) for transportation.Encryption is carried out using the Crypto++ library [23], and resulting units are multiplexed and sent through the network socket.At the receiving end, these operations are reversed and uvgRTP 3.0 outputs NAL units to receiver V3C application for V3C bitstream reconstruction.

Experimental Setup
uvgRTP 3.0 was benchmarked by transmitting an encoded V-PCC bitstream between two desktop computers, as illustrated in Figure 6.The computers were equipped with Intel Core i9-7980XE and Intel Xeon W-2145 processors along with Linux kernel versions 5.15.0 and 6.2.0, respectively.The network consisted of a 10-Gbit Cisco SG350XG switch between the 10-Gbit network cards in both computers.
The benchmarks were performed using the longdress sequence from common test conditions (CTC) for V3C and V-PCC [24].Table 4 tabulates the sequence characteristics.It was encoded using the reference V-PCC encoder test model category 2 (TMC2) version 21 [25] with low-delay configuration at the rate R3 as defined in CTC [24].The V3C bitstream includes parts that are not transmitted with RTP, such as V3C VPS units or fields denoting V3C or NAL unit sizes.These parts were subtracted from the total bitstream to obtain the total size of transmitted RTP payloads.
As our V-PCC test bitstream contained V3C units of types AD, OVD, GVD, and AVD, four uvgRTP media streamer modules were created in the session module: one for the atlas payload and one for each video payload.The traffic of all media streamers was multiplexed through a single network socket.
Although the tests were performed only with V-PCC bitstreams, similar results can be expected for other V3C compliant bitstreams like MIV.Indeed, RTP streaming performance is only affected by the bitrate, regardless the type of payload.
Figure 6 also introduces the two separate benchmarks used to evaluate our solution.In the goodput benchmark, the CPU core utilization values of the sender and receiver were measured at different bitrates.The bitrate was virtually increased by sending n times the same V-PCC bitstream with 30 fps increments.The structure of the session remained the same.In a peer-to-peer conferencing use case, n represents the number of participants in a call at 30 fps.The round-trip latency benchmark measured round-trip latency of V3C streaming at a constant bitrate of 10.1 Mbit/s.Both encrypted SRTP and unencrypted RTP streaming modes for V3C were tested.All tests were performed 100 times, and the results were averaged.In both RTP and SRTP modes, the goodput benchmark shows that uvgRTP 3.0 is able to stream the V-PCC sequence at a bitrate of 105.9 Mbit/s.At this bitrate, the CPU core utilization values of the sender and receiver averaged at 18.9% and 5.0% for RTP streaming and 26.2% and 5.4% for SRTP streaming, respectively.These results amount to peer-to-peer V-PCC twoway communication at 30 fps with 10 other peers simultaneously.

Performance Evaluation
To conclude, our results show that the efficient CPU core utilization of uvgRTP 3.0 leaves computing power for other processes, such as V3C encoding and decoding.Moreover, the low round-trip latency addresses the needs of low-latency communication in various volumetric video applications.

Usage and Examples
The uvgRTP repository [26] on GitHub has several resources to facilitate the adoption of the library for streaming applications.Instructions for building uvgRTP 3.0 with CMake [27] can be found in /BUILDING.md.A step-by-step tutorial and more indepth documentation can be found in /USAGE.mdand /docs/README.md,respectively.The uvgRTP library also provides a documentation for its public API, which has been generated using Doxygen [28]  To help users develop applications built on V3C bitstream transmission, uvgRTP 3.0 includes a new example program, which demonstrates the implementation of an end-to-end workflow for transmitting a V3C bitstream.For straightforward testing, an encoded V-PCC test sequence is also provided alongside.This example program is composed of the following two parts: 1) Sender V3C application that takes as input a V-PCC bitstream and maps the locations and sizes of its NAL units into memory.One session is created for V3C transmission, in which media streamers are initialized for each type of sub-bitstream.When the program is running, media streamers send their corresponding data in parallel threads.2) Receiver V3C application that creates one session, initializes the media streamers for sub-bitstreams, and begins reception.When a GOF is received, the respective part of the V3C bitstream is reconstructed.Finally, the GOFs are combined to form a complete V-PCC bitstream after the whole sequence is received.

Applications and Impact
After its release, uvgRTP 2.0 became a popular solution for real-time streaming thanks to features such as low latency, high performance, and ease of use.It has gathered multiple contributors and with a permissive BSD-license, the library is used by both industrial and academic applications.With the introduction of V3C transmission in uvgRTP 3.0, solutions taking advantage of both the V-PCC and the MIV standards can be developed.
V-PCC streaming could be used as part of visual volumetric communication applications.In immersive video conferencing, users could interact as life-like virtual avatars.In addition to communication, the real-time V-PCC streaming would enable additional applications for remote object manipulation such as surgery, pottery, and art, where low latency is either an advantage or a prerequisite.
MIV streaming could be used for virtual reality (VR) applications such as virtual architectural tours.Each participant could view the scene in real-time from their own location and move around freely.Furthermore, the applications of MIV could include gaming, sports, and entertainment in visual volumetric format.
Thanks to visual volumetric content and more immersive applications, the low-latency streaming capabilities of uvgRTP 3.0 have the potential to reduce global plane travel by bringing forth new ways of accomplishing tasks remotely.As plane travel alone is responsible for 4% of total global warming [30], the impact of transporting bits instead of people can be significant.

Compatibility for Live Communication
As depicted in Figure 2, V3C units of the current V3C bitstream formats encapsulate the complete sub-bitstream of a GOF.GOF size specifies the number of frames in a single GOF, with 32 being the value used in CTC for V3C and V-PCC [24].With GOF size of 32 and visual volumetric content recorded at 30 fps, the encoder introduces a latency of more than one second.
G.114 [31] defines latencies above 400 ms as generally unacceptable for audio communication, whereas user satisfaction starts to decline after 150 ms.Because the visual information needs to be synchronized with the audio conversation, visual latency should also be minimized to meet acceptable user satisfaction in live communication applications.
The encoder latency induced by the current V3C bitstream formats could be tackled by setting GOF size to 1, but it would lead to all intra coding with a significant coding overhead.This limitation motivates us to propose that this latency aspect be addressed in the upcoming updates to the V3C bitstream formats.

Conclusion
In this paper, we presented the latest version of our RTP library called uvgRTP 3.0.The introduced features make it the first open-source RTP library for low-latency transmission of V3C encoded visual volumetric content, such as V-PCC and MIV.Our measurements show that uvgRTP 3.0 can transfer an encrypted V3C bitstream at 106 Mbit/s with CPU core utilization of 26% and with a low round-trip latency of 2 ms.Given that our library is distributed under a permissive license and also supports popular AVC, HEVC and VVC formats, it is a potential candidate for a broad range of real-time communication applications.
For future research, a complete open-source end-to-end V3C pipeline still calls for practical solutions for volumetric video acquisition and coding.In addition to that, making V3C bitstream compatible with low-latency scenarios warrants further exploration.This work creates a solid foundation for these follow-up activities in unlocking the potential of lowlatency V3C volumetric video communication.

Figure 1 :
Figure 1: Overview of the V3C standard suite.

Figure 7 :
Figure 7: CPU Core Utilization of uvgRTP, n indicate the corresponding number of V-PCC streams at 30fps.

Figure 7
Figure 7 illustrates the results of the goodput benchmark.In addition, results of the average round-trip latency benchmark are the following: • RTP streaming mode: 1.7 ms • SRTP streaming mode: 2.2 ms

Figure 7 :
Figure 7: CPU core utilization of uvgRTP, n indicate the corresponding number of V-PCC streams at 30fps.

Table 1 : V3C unit types, their sub-bitstream types and use in standardized V3C codecs.
Figure3illustrates the software architecture of uvgRTP 3.0 that is upgraded from uvgRTP 2.0 by adding a new module (in red) and updating three modules (in green).Context is a singleton module created by a V3C application.It contains one or more sessions, whereas a session represents connections between two IP addresses.Session may contain multiple media streamers through which media is transmitted.An RTP Control Protocol (RTCP) module is associated with each media streamer.It monitors the data delivery and provides feedback on the quality of service (QoS) of the media stream.A new Socket factory module manages network sockets with support for both IPv4 and the newly added IPv6.The packet reception module reads incoming packets from network socket and distributes them for processing in other modules.The Secure RTP (SRTP) and RTP modules process

Table 4 : Test sequence characteristics.
and is hosted on GitHub [29].Additionally, the /examples folder contains programs that demonstrate different usage scenarios of the uvgRTP library.Instructions for running these programs are given in /examples/README.md.