skip to main content
research-article
Open Access

An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC

Published:12 November 2021Publication History

Skip Abstract Section

Abstract

Web Real-Time Communication (WebRTC) combines a set of standards and technologies to enable high-quality audio, video, and auxiliary data exchange in web browsers and mobile applications. It enables peer-to-peer multimedia sessions over IP networks without the need for additional plugins. The Opus codec, which is deployed as the default audio codec for speech and music streaming in WebRTC, supports a wide range of bitrates. This range of bitrates covers narrowband, wideband, and super-wideband up to fullband bandwidths. Users of IP-based telephony always demand high-quality audio. In addition to users’ expectation, their emotional state, content type, and many other psychological factors; network quality of service; and distortions introduced at the end terminals could determine their quality of experience. To measure the quality experienced by the end user for voice transmission service, the E-model standardized in the ITU-T Rec. G.107 (a narrowband version), ITU-T Rec. G.107.1 (a wideband version), and the most recent ITU-T Rec. G.107.2 extension for the super-wideband E-model can be used. In this work, we present a quality of experience model built on the E-model to measure the impact of coding and packet loss to assess the quality perceived by the end user in WebRTC speech applications. Based on the computed Mean Opinion Score, a real-time adaptive codec parameter switching mechanism is used to switch to the most optimum codec bitrate under the present network conditions. We present the evaluation results to show the effectiveness of the proposed approach when compared with the default codec configuration in WebRTC.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Web Real-Time Communication (WebRTC) [7] enables web browsers to provide audio, video, and auxiliary data exchange without the need for additional plugins. The standards used in WebRTC make it possible for users to establish peer-to-peer multimedia sessions over IP networks. The Internet Engineering Task Force defines the communication protocols for the communication channel of WebRTC. World Wide Web Consortium (W3C) defined the HTML5 and JavaScript tags to enable transmission between the supported devices, which makes it easier for the developers to build interactive web and mobile applications using WebRTC. The multimedia communication added by WebRTC enables web browsers to interact using the so-called Triangle Architecture shown in Figure 1. WebRTC has rapidly evolved since it was released on May 2011 by Google. Shortly after the initial release, Ericsson Labs implemented its first development toolkit built on top of WebRTC. In 2013, the first cross-browser audio and video call was supported between Google Chrome and Mozilla Firefox, followed by the support for the Android operating system in 2014. The first Google RTC application known as Google Hangouts was partially built on WebRTC, whereas the latest Google Duo is fully based on the technologies provided by WebRTC. As of now, WebRTC is supported by Microsoft Edge, Apple Safari, and Opera in addition to Chrome and Firefox web browsers, as well as mobile operating systems like iOS and Android.

Fig. 1.

Fig. 1. WebRTC Triangle Architecture.

With increasing bandwidth capacity in IP networks, great effort is being placed on the Quality of Experience (QoE) for multimedia applications. For Voice over Internet Protocol (VoIP) applications, this is evident by support for a wide range of codecs, from the traditional narrowband (NB) telephony with a bandwidth of 300 to 3,400 Hz and wideband (WB) of 100 to 7,000 Hz, up to super-wideband (SWB) that offers a bandwidth of 50 to 14,000 Hz, enabling high-quality voice transmission. However, in public networks including wireless (WiFi) and cellular environments, network impairments such as delay and packet loss persist. Therefore, call quality assessment is a key to enable current VoIP systems to provide high-quality call services. To evaluate the degradation introduced by such impairments, subjective listening quality measurement is used to assess the quality perceived by the end user under various network conditions [39]. Such a method is known to be very accurate; however, it also can be time consuming and expensive to conduct in order to evaluate the quality of a voice call. As such, instrumental models have been developed to evaluate the quality using objective methods, which include intrusive models such as Perceptual Evaluation of Speech Quality (PESQ) [44], Perceptual Objective Listening Quality Assessment (POLQA) [46], non-intrusive models, e.g. ITU-T P.563 [38] and ANIQUE+ [26], and parametric models like the E-model [12].

The E-model is a simple computational model that was designed for transmission planning purposes. It considers a wide range of telephony-band impairments, particularly the impairment due to low bitrate codecs, one-way delay, packet loss, and the “classical” telephony impairments of loss, noise, and echo. The E-model computes an overall rating of the expected conversational quality, termed a transmission rating factor and denoted R, by combining effects of the different impairments on conversational speech quality. It is worth noting here that the transmission rating factor can be calculated in real time due to the E-model’s computational simplicity. For the NB E-model, the transmission rating factor scale ranges from 0 (poor) to 100 (excellent) and can be calculated as follows:

(1)
where describes the basic signal-to-noise ratio. combines the simultaneous distortions. represents the impairment factor due to delay. is the effective equipment impairment factor covering an impact of speech coding and in the presence of packet loss. is the advantage factor that can be used to compensate for the impairments through other quality advantage factors. This work is focused on the impact of speech coding and packet loss and therefore consider no other distortions involved and use the default values defined in reference [15] , , and parameters. Thus, Equation (1) can be re-written as follows:
(2)
is the maximum value that can be used for NB conditions and is set to 93.2. The can be calculated using Equation (3) coming from the ITU-T Rec. G.107 as follows:
(3)
is the equipment impairment factor that represents the impairments caused by the speech codec. is the percentage of packet loss. represents the packet loss robustness. Both and are codec-specific values and can be found in ITU-T Rec. G.113 [15]. The was introduced by McGowan [30] as the ratio of average number of consecutively lost packets during a random loss event and can be measured using
(4)
When 1, the loss is considered bursty, and when 1, it is considered random.

The resulted factor from (1) can be translated into the Mean Opinion Score (MOS) using the following relation.

(5a)
(5b)
(5c)

As the E-model was designed for NB transmission systems, the scale for the maximum was extended to cover WB speech codecs to reflect the improvement in the quality over NB speech codecs [34, 35, 48]. As a result, the for WB, standardized in reference [13] is scaled up . Therefore, (2) can be re-written as follows:

(6)

Similarly, recent contributions have suggested an extension to the scale to cover SWB speech codecs [63]. As a result, the maximum scale for SWB proposed was 148 and the factor for the SWB E-model can be calculated using

(7)
The following modification to Equation (3) is used for deriving the effective equipment impairment factor and packet loss robustness values for SWB codec conditions according to Mittag et al. [32]:
(8)

The Internet Engineering Task Force standardized Opus codec [60] is the default speech codec used by WebRTC speech and audio applications. It is a multifunctional open source codec developed to support both interactive speech and music over the internet with a wide range of bitrates ranging from 6 kbit/s up to 512 kbit/s and supporting both constant bitrate (CBR) and variable bitrate (VBR) modes. It supports audio bandwidths from NB to fullband with sampling rates ranging from 8 to 48 kHz as well as frame sizes of 2.5 to 60 ms. The support for packet loss concealment and forward error correction makes it suitable for many usage scenarios over the internet. Opus is comprised of two previously known codecs, i.e., SILK [62] and CELT [59]. The two codecs are employed by Opus to provide three operation modes. The operation modes are LP mode (SILK only), MDCT mode (CELT only), [50], and hybrid mode. The LP mode is optimized for low bitrate speech signals and operates from NB to WB bandwidths. The MDCT mode is optimized for higher-quality speech and stereo music at higher bitrates. In the hybrid mode, the Opus coder combines both modes by using LP mode for lower frequencies up to 8 kHz and MDCT mode for higher frequencies and switches dynamically between the two modes in real time. It is worth noting here that a low complexity of the Opus codec makes this codec suitable for mobile phones and low-end embedded devices that usually have limited processing resources [5].

With the emergence of web technologies, there is a growing move toward web-based applications. Instead of relying on software that is based on a specific operating systems and environments, the availability of web browsers in every computer and mobile device makes it easy for a single web application to be accessible by the users anywhere. WebRTC takes advantage of such technology by enabling browser-to-browser real-time communication (RTC). Such applications are mostly deployed in public best-effort networks, and the quality experienced by the users is what led to development of many strategies by researchers in the past years. The Opus codec is primarily used in WebRTC for encoding audio data. The low complexity with a range of bitrates and features makes it suitable for such applications. We take advantage of the functionalities offered by WebRTC to make the most out of the features offered by this codec to improve the quality perceived in WebRTC audio applications.

In this article, we build on earlier work and propose an adaptive codec bitrate switching mechanism for peer-to-peer audio applications using WebRTC. It includes a QoE model that receives as inputs the rate of packet loss at the receiving end. The model uses the NB, WB, and SWB E-model to output the expected degradation for different Opus codec conditions. This then informs a mechanism that selects optimal codec parameters.

The rest of the article is structured as follows. Section 2 reviews related research on the QoE strategies dealing with network impairments including codec switching and WebRTC QoE research. Section 3 details our proposed adaptive algorithm. Section 4 describes the experimental test-bed designed to evaluate the performance of the proposed approach. In Section 5, we summarize the results presented in this article. Finally, Section 6 concludes the article and details the planned future work.

Skip 2RELATED WORK Section

2 RELATED WORK

This section includes a review of the literature for the areas involved in this work. Mainly, it is divided into four parts focusing on quality of service (QoS) and QoE strategies, QoE in WebRTC, codec and codec parameter switching strategies, and Opus E-model and parameters.

2.1 QoS and QoE Strategies

The quality of speech in VoIP varies depending on the performance of the underlying IP networks that may introduce impairments like delay and distortions including packet loss, speech coding, and many other factors [22]. To cope with such impairments in packet-switched networks, researchers have been developing different QoS control strategies and QoE methods to improve users’ experience [9]. A large body of work has focused on strategies to establish a relationship between QoS metrics and QoE in multimedia applications. A comprehensive review by De Carvalho and De Souza Mota [9] discussed the early strategies used by the researchers deploying source-based adaptive adjustment of the application-layer parameters. Those adjustments include switching the codec type, encoding rate and/or packetization size, and the transmission ends. Some of the strategies rely only on one of the objective models, i.e., intrusive or non-intrusive, and some use hybrid approaches to benefit from both methods. More recent work, by Han and Muntean [18], proposed a hybrid call quality assessment model that uses both an intrusive and a non-intrusive objective evaluation method for the NB Opus codec using the E-model, i.e., the non-intrusive method, in real time and then corrects the quality slowly deploying PESQ, i.e., the intrusive method, by considering packet Loss Rate (LR) in the link. The model obtains the network statistics at the start of the call and then repeatedly corrects itself by recording a short period of voice and running an offline assessment using PESQ at each end.

2.2 WebRTC QoE Strategies

There has been an extensive amount of work in the literature dealing with the performance delivered by WebRTC audio and video applications [17]. The performance of the Opus codec in WebRTC can achieve similar results to the quality delivered by the stand-alone Opus coder according to other works [23, 28]. Maruschke et al. [28] have identified a number of Opus key parameters that can be modified by changing the SDP messages to get the best performance out of the Opus codec that is usually limited by the default configuration in WebRTC. The same authors have recently proposed a tool to assess the quality of audio in the secured non-browser-based WebRTC applications [31]. The proposed method is deployed in multiple measurement points to capture and extract the audio payload carried by the Secure Real-Time Transport Protocol (SRTP) to process through POLQA and other objective models with some limitations due to the nature of the SDP protocol when dealing with encrypted multimedia streams.

2.3 Codec and Codec Parameter Switching Strategies

The switching of codec type in VoIP applications is usually done before or during the session in real-time [10, 37]. Aktas et al. [1] compared the quality of standard NB speech codecs under varying network conditions and proposed an adaptive end-to-end based codec switching scheme to adapt to changes in the available bandwidth during the call. This was done by continuous monitoring of the network conditions between two ends and using the re-INVITE message defined in the Session Initiation Protocol standard [51] to modify the session, i.e., switching to the most suitable codec. Assem et al. [6] evaluated the impact of adaptive codecs switching algorithms from an end-user perception point of view and proposed an algorithm to minimize such impacts. Compared to Aktas et al. [1], the study included both NB and WB codecs, i.e., G711, ILBC, GSM, SPEEX, and SILK. The authors showed that the algorithm can produce a significant improvement to voice quality compared to the codec used at the start of the call only. However, switching codec types would introduce silent gaps that vary in length depending on the prevailing packet loss. Therefore, the number of switching codecs within a time interval should be limited to minimize the degradation in the quality perceived by the end user. Other strategies consider the switching of the encoding rate, i.e., bitrate [33, 49, 64]. Once the re-configuration of the bitrate is made by the encoder, the RTP header carrying the payload can help to inform the decoder about the new bitrate delivered by changing the field containing the payload information. So, the bitrate adaptation/change is seamlessly done by a codec, when the codec supports it. It is worth noting here that the seamless adaptation of codec parameters is supported by the Opus codec [24, 29, 60], the codec deployed in this work, as will be discussed in Section 3.4 of this article. More about codec and codec parameters adaptation strategies can be found in the work of De Carvalho and De Souza Mota [9].

2.4 Opus and Factors

To deploy the E-model in NB and WB networks, ITU-T has standardized a list of the required I and values for a number of speech codecs in recommendation G.113 [15]. The recommendation does not provide those factors for some of the most recent speech codecs including the Opus codec. However, to derive the equipment impairment and packet loss robustness factors for any codec condition, the methodologies defined in the ITU-T recommendation in references [40] and [41] can be used to derive such factors for the NB and WB codec conditions, respectively, using the subjective methods. As the objective models have improved over time, they can predict the quality degradation of speech with a closer correlation to the subjective results. The ITU-T standardized a similar methodology in ITU-T Rec. P.834 [42] and ITU-T Rec. P.834.1 [43] to derive the and for new conditions using the objective methods. For the Opus codec, as the main codec considered in this work, we have followed the method recommended in ITU-T Rec. P.834 to derive the and factors for the NB bitrates [56]. Similarly in reference [55], we have followed recommendation ITU-T Rec. P.834.1 to derive the and for the WB bitrates using the POLQA (v.3) as the instrumental model. There has not been a standardized method to derive the equipment impairment factors packet loss robustness for SWB conditions. However, the work proposed by Mittag et al. [32] can be used to derive and for SWB speech codecs. The same method was used to derive the impairment factors for the 3GPP Enhanced Variable Rate (EVS) codec [8], currently listed in the recent amendment to the ITU-T G.113 recommendation [16]. In a recent work, we followed the same method to derive the equipment impairment factors and packet loss robustness factors for the SWB Opus codec in other works [2, 54]. A list of all of the derived factors considered in this work can be found in the appendix to this article.

Skip 3THE PROPOSED MODEL Section

3 THE PROPOSED MODEL

This section details our proposed QoE model, which includes a quality estimation module consisting of the NB, WB, and SWB E-model combined with a database of the required equipment impairment factors for the Opus codec as the default codec deployed in WebRTC. The model benefits from the getStats API offered by WebRTC to extract the relevant QoS metrics taken from the Real-Time Transport Control Protocol sender and receiver reports for each RTC session. With the help of the real-time session re-negotiation functionality offered by WebRTC, we developed an adaptive algorithm that selects the most optimal Opus codec bitrate and operating mode based on the quality scores coming from the E-model. The proposed adaptive algorithm employs an adaptive VoIP architecture by following the concept of a closed feedback loop defined elsewhere [57]. The concept simply considers measurable values in every system state as a variable. The majority of adaptive implementations in VoIP applications deal with four key variables described in the work of De Carvalho and De Souza Mota [9] as follows:

  • Observation parameters: Variables that can be measured to assess the performance of the network. QoS parameters like delay, packet loss, and other impairments that are not controlled by the sender and receiver.

  • Decision metrics: Parameters used to evaluate the characteristics of the audio session over a period of time or a number of samples. This includes QoE parameters like the equipment impairment factors used by the E-model and the resulted MOS, i.e., an indicator of quality perceived by the end user.

  • Performance reference: This is a variable that can be used as a reference to make decisions by one of the session endpoints. This can be a threshold of a minimum or maximum MOS used to make a decision on how to modify any parameters related to the audio session.

  • Adjustable parameters: The parameters that can be adjusted based on the three variables mentioned previously. A model can change any codec-related parameters at the sender side or the selection of the Dejitter buffer strategies at the receiver side to improve the quality.

The observation parameters in our QoE model are the packet loss metrics and the impact of coding using the Opus codec. The MOS provided by the E-model is considered as the decision metric. Opus codec configuration including encoding bitrate and operating mode that results in the maximum MOS based on the LR and type and type is the performance reference. Finally, the configurable Opus bitrates and operating mode are considered the adjustable parameters. A flow diagram of the higher-level components of the our adaptive algorithm is illustrated in Figure 2. The QoE model can dynamically perform the following functions at a fixed interval in real time:

Fig. 2.

Fig. 2. Flowchart diagram of the adaptive algorithm.

(1)

Monitor, detect the presence of packet loss, and calculate the resulted LR.

(2)

Evaluate the impact of the loss condition on each of the NB, WB, and SWB Opus bitrates available.

(3)

Compare the resulted MOS from all bitrates available to suggest the most optimal Opus codec condition.

(4)

Re-negotiate the session using the most optimal Opus bitrate and operating mode by modifying the SDP message.

3.1 WebRTC getStats API

The getStats API described in the work of Bergkvist et al. [7] provides all QoS metrics taken from the Real-Time Transport Control Protocol sender’s and receiver’s report for each peer participating in any WebRTC audio and video session. The metrics provided by the getStats API are built-in Google Chrome and Mozilla Firefox web browsers. For example, Google Chrome offers a dashboard to visualize the QoS metrics for every participant using an audio and/or video WebRTC application. This feature can be accessed by navigating to the following address: chrome://webrtc-internals. Similarly, “about:webrtc” can be used in the Mozilla Firefox web browser. The getStats API can also be called by web applications running within the browsers to return specific statistics to monitor the performance of the underlying network for each session. A definition of all metrics provided by the getStats API is described in the work of Alvestrand and Singh [3]. The following are the QoS metrics used by our QoE model in WebRTC deployed in Chrome web browser:

  • packetsLost: The number of lost RTP packet for each SSRC according to RFC3550.

  • totalSamplesDuration: The duration in seconds of the session since the first RTP packet was transmitted.

  • bytesReceived: The total number of bytes including packet overhead of the audio data received.

3.2 Packet Loss Calculation

The calculation of packet LR is done by calling the packetsLost metric from the getStats API. We set the frequency of calculating the LR to be done for every 10 seconds to collect enough information regarding the loss presented. As a result, we lower the number of switching of codec configuration that may occur during brief events of packet loss. Computation and detection of bursty packet loss conditions requires the calculation of the average number of consecutively lost packets. However, the average number and loss distribution information are not provided by the current version of the getStats API. Therefore, our adaptive algorithm can not detect bursty loss behaviors in the network. Instead, we calculate the LR and consider loss type as random. The impact of the bursty loss and the performance of the adaptive algorithm under the bursty loss condition is evaluated in Section 4. The experimental test-bed described in Section 4 provides a controlled testing environment. Thus, we can specify the type of loss pattern used, then inform the E-model to operate accordingly.

3.3 E-model Implementation

The quality estimation aspect of our QoE model uses the NB, WB, and SWB E-model versions as described in Section 1. To implement the E-model in our QoE model in WebRTC, a database of the and factors from references [54, 55, 56] for Opus codec version 1.2 are used. Table 1 lists all of the bitrates covered by the NB and WB bandwidths, as well as a selected number of bitrates that cover the SWB bandwidth up to 40 kbps, which is considered as the maximum encoding bitrate for speech applications according to Valin et al. [60]. It is worth noting here that Opus can encode at a lower bitrates in WB and SWB when operating at VBR mode. This is because VBR mode in the Opus codec is more efficient and can produce a better quality compared to encoding at the same bitrates in CBR mode [60]. The quality estimation part of the QoE model takes the LR as an input. Each implementation of the E-model calculates the for every Opus condition listed in Table 1. Equation (3) is used for calculating the factor for the NB and WB bitrates. When it comes to SWB mode, the and values are calculated using (8) instead. The NB E-model uses and values to calculate the R-factor using = 94.15 according to reference [14]. Similarly, the WB E-model calculates using Equation (1) and =129 for the WB bitrates. =148 is used to calculate for SWB bitrates. The resulted factors for each implementation of the E-model are then transformed into the MOS scale using Equation (5). The BitRateSelector() module compares all provided MOS and selects the codec condition resulting in the highest MOS as shown in Figure 3. It is worth noting here that as the equations deployed to get the MOS, i.e., Equations (2) through (8), include rather simple mathematical operations and the corresponding comparison, as well as selection processes represent rather simple processes, the computational cost is rather small, i.e., a few milliseconds, making it suitable for real-time applications. To get an exact value of the computational cost, the JavaScript built-in function performance.now() was utilized to evaluate the processing time taken by the algorithm, and the resulted computational time was between 2 and 3 ms when tested on a 2.5-GHz Dual-Core Intel i5 machine. The module checks if the selected codec condition is currently being used; otherwise, if another codec condition is suggested, it then outputs the bitrate and the operating mode with the expected MOS for the new codec configuration.

Fig. 3.

Fig. 3. Implementation of the NB, WB, and SWB E-model.

Table 1.
BandwidthBitrateMode
14 kbpsvbr
15 kbpsvbr
16 kbpsvbr/cbr
19 kbpsvbr/cbr
22 kbpsvbr/cbr
SWB25 kbpsvbr/cbr
28 kbpsvbr/cbr
31 kbpsvbr/cbr
34 kbpsvbr/cbr
37 kbpsvbr/cbr
40 kbpsvbr/cbr
11 kbpsvbr
12 kbpsvbr/cbr
WB13 kbpsvbr/cbr
14 kbpscbr
15 kbpscbr
6 kbpsvbr/cbr
7 kbpsvbr/cbr
NB8 kbpsvbr/cbr
9 kbpsvbr/cbr
10 kbpscbr
11 kbpscbr

Table 1. Bitrates and Operating Modes Supported by Opus codec (v. 1.2.1) for Speech Applications

3.4 Session Re-negotiation

As discussed in Section 1, WebRTC peers use the SDP offer/answer inspired by Session Initiation Protocol session negotiation flow. The negotiation of multimedia sessions in WebRTC is done by exchanging the SDP messages between the peers in a text-based format over the signaling channels. An SDP message carries information of the media types and other parameters related to each peer to establish an audio and/or video session. An example of an “audio-only” session exchange message is illustrated in Figure 4.

Fig. 4.

Fig. 4. Sample of an SDP message for a WebRTC audio session.

Each line in the SDP message carries information of the type of media (audio, video), transport protocol (RTP/UDP), and the format of the media type including audio- and video-related parameters. A definition for all parameters used by SDP is described in the work of Handley et al. [19].

SDP munging or SDP modification according to Levent-Levi [27] is used to modify any parameter within the SDP messages before being exchanged between the session endpoints. WebRTC createOffer() or createAnswer() functions use the SDP messages to set up and establish audio/video sessions. The sessions can be re-negotiated while running to add, remove, or change any media-related parameters. Nandakumar and Jennings [36] describe all possible scenarios and SDP formats that can be used in JSEP to set up WebRTC sessions. For the Opus codec, a list of codec-specific parameters defined in other works [58, 60] can be modified before and during codec operation in WebRTC. Spittka et al. [58] detail all parameters that can be modified using the SDP messages for both sender and receiver. In our QoE model, we use the following receiver-only parameters that can be requested by the receiver and can affect one way of the transmission:

  • maxaveragebitrate: The maximum average coding bitrate supported by the decoder. The receiver can use this parameter to inform the sender, i.e., encoder, to limit the encoding bitrate to save resources. When a frame size of 20 ms is used, Opus supports bitrates from 6,000 bits and up to 40,000 bits for speech and up to 128,000 bits for fullband stereo music. Table 1 lists the maxaveragebitrate for Opus (version 1.2) and the default operating sampling rate based on the findings in other works [2, 54, 55].

  • isCBR: The Opus codec operates in VBR mode by default. In this mode, the encoder can deliver the highest quality by encoding the audio data depending on the available audio signals at the selected maxaveragebitrate. In CBR mode (VBR), the encoder encodes each packet with a limited number of data regardless o the number of the audio signals. According to the work of Valin et al. [60], the CBR mode delivers lower quality and adds more processing complexity compared to VBR mode.

We have implemented the Re-negotiate() module to handle the adaptive codec parameter negotiation during any active audio session in WebRTC. The SDP line starting with “a = fmtp:111” can be used as described by Spittka et al. [58] to adjust Opus parameters. The parameters “maxaveragebitrate” and “isCBR” are mainly adjusted to change the codec parameters depending on the output of the BitRateSelector() module in our implementation. Once the codec configuration with the highest MOS is returned by the BitRateSelector() module, the Re-negotiate() module modifies the SDP line by adding the recommended Opus bitrate and operating mode to the “a = fmtp:111” line. Table 2 includes examples on how this can be achieved. The required bitrate used by “maxaveragebitrate” is specified in bits per second. The parameter “isCBR” takes a Boolean value, where 1 indicates the use of CBR mode and 0 is used to operate at VBR mode. If the parameter “isCBR” is not specified, the encoder will use the VBR mode by default.

Table 2.
BitrateModeSDP Line Example
9 kbpscbra = fmtp:111 maxaveragebitrate = 9216; isCBR = 1
12 kbpsvbra = fmtp:111 maxaveragebitrate = 12288; isCBR = 0
32 kbpscbra = fmtp:111 maxaveragebitrate = 32768; isCBR = 1
40 kbpsvbra = fmtp:111 maxaveragebitrate = 40960

Table 2. Example of Switching Bitrate and Mode Using SDP Lines

Skip 4EXPERIMENTS Section

4 EXPERIMENTS

Once the components of our adaptive QoE model were operational, we designed a test-bed to evaluate the performance of the quality estimation by the model and the adaptive codec bitrate switching mechanism as illustrated in Figure 5. A sender and receiver are connected while running an audio-only WebRTC session over an RTP stream. A simple script is deployed at the sender side, which acts as a controller to handle the playing of the reference speech signals at the sender’s side and records the degraded signals at the receiver’s side simultaneously. The QoS metrics are monitored in real time by the model as the RTP packets arrive at the receiving end. The adaptive bitrate switching mechanism can be enabled or disabled to assess the quality resulted from the adaptive codec switching independently. The most recent version of POLQA (v.3) was trained to predict the quality degradation for Opus-coded speech signals [53]. In the final phase of each experiment, we obtain the MOS-LOQ quality score from the POLQA model to evaluate the performance of the codec bitrate switching mechanism. The main goals of building this experimental test-bed are as follows:

Fig. 5.

Fig. 5. Experimental Test-bed Architecture.

  • To quantify the impact of random and bursty loss conditions on the performance delivered by Opus codec in WebRTC.

  • To compare the quality resulting from different packet loss conditions for the proposed adaptive bitrate switching mechanism versus the default WebRTC codec configuration.

Table 3 lists all of the experiments conducted to assess the performance of the adaptive algorithm. Experiments 1.1 and 1.2 evaluate the quality of the AdaptiveOFF scenario, i.e., when the default Opus codec configuration under WebRTC is used and the adaptive algorithm is disabled, under random and bursty loss conditions. Experiments 1.3 and 1.4 assess the quality of using AdaptiveON, i.e., when the adaptive algorithm is enabled.

Table 3.
ExperimentDescriptionNo. of Files
1.1Random loss and AdaptiveOFF1,024
1.2Bursty loss and AdaptiveOFF1,024
1.3Random loss and AdaptiveON1,024
1.4Bursty loss and AdaptiveON1,024
Total4,096

Table 3. Adaptive Algorithm and Packet Loss Experiments

The following sections detail the functionalities of each module implemented in the experimental test-bed to evaluate quality for the goals listed previously.

4.1 Audio Injection and Recording

The test-bed uses the reference-based assessment model POLQA to evaluate the performance of the codec bitrate switching mechanism in our adaptive algorithm. To ensure the reproducibility and eliminate the degradation introduced by input and output equipment, the injection of the prerecorded speech signals is used. ITU-T Rec. P.863.1 describes three methods that can be followed to use the audio injection method [47]:

  • Acoustically: Injecting the reference signal using an artificial mouth by connecting a head and torso simulator to the input of the end terminal [45].

  • Electrically: By connecting an audio cable directly from the output device to the input of the end terminal through the 3.5-mm jack.

  • Digitally: Reference signals are injected locally using software to map the output audio device to the input device and act as the microphone and vice versa.

We choose the digitally injected and recorded signals approach in this experimental test setup. The Linux Pulse Audio Volume Control application pavucontrol [25] was used for this purpose. We play the reference audio signals into the sender’s WebRTC application, bypassing the microphone, and recording the audio at the receiver’s side, bypassing the speakers. A digital player module is implemented at the sender’s side to inject the reference signals taken from a database of 32 test signals from ITU-T Rec. P.501 Annex C [4] in eight different languages, i.e., English, German, Italian, French, Finnish, Chinese, Japanese, and Dutch. Each reference speech signal consists of two short sentences separated by a short period of silence, resulting in 8-second-long speech samples as described in reference [44]. The output of the player module feeds the audio signals into WebRTC to be processed by the encoder. On the other end, a recording module is implemented at the receiver’s side to digitally record the degraded speech signals to be processed by any objective model described in Section 4.2.

4.2 Packet Loss Module

To focus on the impact of packet loss in a controlled testing environment, the network emulation tool NetEM [21] is deployed at the sender side to introduce degradation using eight packet loss rates, i.e., 1%, 3%, 5%, 10%, 15%, 20%, 25%, and 30%, to the outgoing RTP packets using the Bernoulli model [52] to generate random loss patterns and the Gilbert-Elliot model [20] to generate bursty loss patterns. The packet LR experienced by the end user were selected to cover a broad range of packet LR experienced by the end user in current telecommunication networks, especially in the case of Over-the-Top VoIP services. When it comes to the bursty loss patterns, we used the restricted loss space method described in the work of Varela et al. [61] to create a list of LR and mean loss burst size (MLBS) values to ensure the stability of the Gilbert loss model when we generated the loss patterns for the rather short speech signals used in this experiment. The combinations of the used LR and MLBS values are listed in Table 4. It is worth noting here that an impact on the quality depends greatly on the location of the dropped frame in any of the test signals—to be more precise, whether the frame contains silence or voiced data. Therefore, we have repeated the process four times for every test condition. This procedure resulted in 1,024 test cases. Applying bursty loss patterns using NetEm is less straightforward. Although it supports most of the loss models in transmission networks, NetEM requires the additional parameters and relating to the bursty loss behavior. It is worth noting here that is the probability of a transition from state G (the Good state), where no packets are lost, to state B (the Bad state), where all packets are lost. denotes the probability of going from state B to state G, which can apply the Simple Gilbert loss. As the LR and MLBS are only required to generate bursty loss behavior, we used the Simple Gilbert model implementation under NetEM for this purpose. According to the inspection done by Fosser and Nedberg [11] to test the behavior of NetEM when applying bursty loss patterns using different loss models, Equation (6) can be used to apply bursty loss using a combination of the required LR and MLBS values.

(6)
For instance, if an LR of 5% and MLBS = 4 is required, can be calculated as follows:

Table 4.
LRMLBS
1%20.5150
3%21.5550
5%41.3225
10%42.825
15%52.313
20%63.2512
25%6412
30%65.1512

Table 4. List of LR and MLBS and their Corresponding p and r Values for NetEm

Combinations of the used LR and MLBS values along with their corresponding and required by NetEM for the LRs used in the validation test-bed experiments are listed in Table 4. It is worth noting here that an impact on the quality under both random and bursty loss depends greatly on the location of the dropped frame in any of the test signals—to be more precise, whether the frame contains silence or voiced data. Therefore, we have repeated the process four times for every test condition. This procedure resulted in 4,096 test cases.

4.3 Perceptual Speech Quality

The degraded and decoded audio data delivered by the RTP stream, recorded using the digital recording module at the receiver’s side, was stored into a database to be later used as an input to the instrumental model, i.e., POLQA, to evaluate degradation introduced by the packet loss conditions. The POLQA model can deal with the change of bandwidth during the instantaneous codec bitrate switching [47] that makes it suitable for this experimental setup.

Skip 5EXPERIMENTAL RESULTS Section

5 EXPERIMENTAL RESULTS

Next, we report results obtained from the evaluation test-bed for the packet loss conditions selected in Section 4.2 for both random loss using the Bernoulli model and the Gilbert-Elliot model for the bursty loss. Figure 6 compares the quality degradation for the MOS-LQO scale under both loss conditions. It is clear that the bursty loss has a lower MOS than the random loss condition at the same LR. This concurs with the results reported in reference [54] for the impact of loss conditions and the resulted impairments factors arising from random and bursty loss patterns.

Fig. 6.

Fig. 6. Random vs. bursty loss MOS quality. The vertical bars show a 95% CI computed over 128 MOS-LQO values.

Figure 7 shows the quality degradation on the MOS-LQO scale for AdaptiveON, i.e., when the adaptive algorithm is enabled (solid blue line) against the quality resulted for AdaptiveOFF for when the adaptive algorithm is disabled (dashed black line). AdaptiveOFF is considered as the default WebRTC configuration for the Opus codec. Detailed MOS and their confidence intervals (CI 95) are listed in Table 5. According to the MOS reported, AdaptiveON has a clear advantage over AdaptiveOFF. There is an improvement of the MOS starting from 3% to 20% for AdaptiveON; however, it is unlikely for users to tolerate a quality degradation for scores below an MOS of 3, which makes the adaptive algorithm usable for LRs up to 15%.

Fig. 7.

Fig. 7. Adaptive algorithm under random loss. The vertical bars show a 95% CI computed over 128 MOS-LQO values.

Table 5.
LRAdaptiveOFFAdaptiveON
MOSCI 95MOSCI 95
1%4.140.064.180.07
3%3.760.184.030.07
5%3.280.204.040.08
10%2.700.163.620.12
15%2.390.113.360.14
20%2.080.112.520.15
25%1.920.111.980.10
30%1.710.111.720.08

Table 5. MOS and CI 95 Under Random Loss

In Figure 8, the MOS for AdaptiveON (solid blue line) shows an improvement over the default Opus configuration used in WebRTC, i.e., AdaptiveOFF for 1% and 3% LRs in particular. There is no significant improvement for AdaptiveON for LRs beyond 3%. Even though an improvement of the MOS can be seen at 20%, the actual MOS is 2.21 as listed in Table 6, which is considered poor and most likely will not improve the quality perceived by the users.

Fig. 8.

Fig. 8. Adaptive algorithm under bursty loss. The vertical bars show a 95% CI computed over 128 MOS-LQO values.

Table 6.
LRAdaptiveOFFAdaptiveON
MOSCI 95MOSCI 95
1%4.020.164.390.10
3%3.180.253.760.19
5%2.770.312.990.27
10%2.240.262.490.28
15%1.780.242.080.29
20%1.560.162.210.32
25%1.560.201.530.1
30%1.370.091.410.12

Table 6. MOS and CI 95 Under Bursty Loss

Table 7.
BitrateVBR CBR
1438.1
1534.1
1630.1636.88
1923.8029.58
2220.2223.38
2520.2022.64
2818.4823.65
3116.8221.74
3414.7620.62
3712.0816.23
4010.6715.90

Table 7. for VBR and CBR SWB Opus

Table. 8.

Table. 8. Bpl for VBR and CBR SWB Opus Under Random and Bursty Packet Loss

Table 9.
BitrateVBR CBR
1128.4
1223.329.8
132026.2
1420.8
15

Table 9. for VBR and CBR WB Opus

Table. 10.

Table. 10. for VBR and CBR WB Opus Under Random and Bursty Packet Loss

Table 11.
BitrateVBR CBR
62346.3
721.331.7
81619.2
911.713.2
106.2
112

Table 11. for VBR and CBR NB Opus Modes

Table. 12.

Table. 12. for VBR and CBR NB Opus Under Random and Bursty Packet Loss

It is worth noting here that the results listed in Tables 5 and 6 still show a higher degradation introduced by the bursty loss conditions for both AdaptiveON and AdpativeOFF test cases.

Skip 6CONCLUSION AND FUTURE WORK Section

6 CONCLUSION AND FUTURE WORK

In this article, an adaptive codec bitrate switching algorithm is proposed for speech applications in WebRTC. The adaptive algorithm employs a QoE model deploying the NB, WB, and SWB E-model to assess the combined impact of speech coding and packet loss. The implemented E-model utilizes a database containing the equipment impairment factors that had already been derived instrumentally for all bitrates supported by the Opus codec for speech applications. Based on the calculated LR and the estimated quality degradation, the model then informs a codec parameter switching mechanism to select the most optimal bitrate and operating mode using the Opus codec. The experimental results show that the proposed approach can effectively lower the impact of random packet loss and, to some extent, bursty loss conditions. In the future, we would like to extend the implementation of the proposed adaptive approach by adding the following functionalities:

  • Include the factor in Equation (1) to consider the impact of end-to-end delay when calculating the factor to estimate a conversational quality experienced by the end user. The following QoS metrics are returned by the getStats API parameters and can be used for this purpose:

    • googCurrentDelayMs: Delay time at the receiving side in milliseconds.

    • googJitterBufferMs: The size of the jitter buffer measured in milliseconds.

  • When dealing with the issue addressed in Section 3.2 regarding the automatic detection of the packet loss type, i.e., random and bursty, we consider using third-party tools like tcpdump or Tshark. Such tools are non-web browser based, and therefore, we will be implementing an additional script to bridge the QoS statistics taken from such tools to make a decision regarding the type of loss presented in the network.

Appendix

A Opus Codec and Factors

The following sections of this appendix list all of the equipment impairment factors used by the QoE model in the proposed adaptive algorithm. The presented factors cover the and for the narrowband, wideband, and super-wideband Opus codec as discussed at the end of Section 2.4. The factors cover both VBR and CBR operating modes by Opus codec for mono speech.

Skip A.1SWB Section

A.1 SWB

Skip A.2WB Section

A.2 WB

Skip A.3NB Section

A.3 NB

REFERENCES

  1. [1] Aktas Ismet, Schmidt Florian, Weingärtner Elias, Schnelke Cai-Julian, and Wehrle Klaus. 2012. An adaptive codec switching scheme for SIP-Based VoIP. In Internet of Things, Smart Spaces, and Next Generation Networking. Springer, 347358.Google ScholarGoogle Scholar
  2. [2] AlAhmadi Mohannad, Pocta Peter, and Melvin Hugh. 2019. Instrumental estimation of e-model equipment impairment factor parameters for super-wideband Opus codec. In Proceedings of the 2019 30th Irish Signals and Systems Conference (ISSC’19). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Alvestrand Harald and Singh Varun. 2018. Identifiers for WebRTC’s Statistics API. Retrieved September 15, 2021 from https://www.w3.org/TR/2018/CR-webrtc-stats-20180703/.Google ScholarGoogle Scholar
  4. [4] 1 ITU-T P.501 amendment. 2017. Test signals for use in telephonometry. ITU-T Recommendation (2017).Google ScholarGoogle Scholar
  5. [5] Ashara Amit. 2016. Implementing Opus Voice Code for TM4C129x device. Application Report. Texas Instruments.Google ScholarGoogle Scholar
  6. [6] Assem Haytham, Adel Mohamed, Jennings Brendan, Malone David, Dunne Jonathan, and O’Sullivan Pat. 2013. A generic algorithm for mid-call audio codec switching. In Proceedings of the 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM’13). IEEE, Los Alamitos, CA, 12761281.Google ScholarGoogle Scholar
  7. [7] Bergkvist Adam, Burnett Daniel, Jennings Cullen, Narayanan Anant, Aboba Bernard, Brandstetter Taylor, and Bruaroey Jan-Ivar. 2018. WebRTC 1.0: Real-time Communication Between Browsers. Retrieved September 15, 2021 from https://www.w3.org/TR/2018/CR-webrtc-20180927/.Google ScholarGoogle Scholar
  8. [8] Bruhn Stefan, Pobloth Harald, Schnell Markus, Grill Bernhard, Gibbs Jon, Miao Lei, Järvinen Kari, et al. 2015. Standardization of the new 3GPP EVS codec. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 57035707.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Carvalho Leandro Silva Galvão De and Mota Edjair De Souza. 2013. Survey on application-layer mechanisms for speech quality adaptation in VoIP. ACM Computing Surveys 45, 3 (2013), 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Casetti Claudio, Martin Juan Carlos De, and Meo Michela. 2000. A framework for the analysis of adaptive voice over IP. In Proceedings of the 2000 IEEE International Conference on Communications (ICC’00). Global Convergence Through Communications. Conference Record, Vol. 2. IEEE, Los Alamitos, CA, 821826.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Fosser Eirik and Nedberg Lars Olav D.. 2016. Quality of Experience of WebRTC Based Video Communication. Master’s Thesis. NTNU.Google ScholarGoogle Scholar
  12. [12] G.107 ITU-T. 2009. The E-model: A computational model for use in transmission planning. ITU-T Recommendation (2009).Google ScholarGoogle Scholar
  13. [13] G.107.1 ITU-T. 2019. Wideband e-model. ITU-T Recommendation (2019).Google ScholarGoogle Scholar
  14. [14] G.108 ITU-T. 1999. Application of the e-model: A planning guide. ITU-T Recommendation (1999).Google ScholarGoogle Scholar
  15. [15] G.113 ITU-T. 2007. Transmission impairments due to speech processing. ITU-T Recommendation (2007).Google ScholarGoogle Scholar
  16. [16] G.113.2 ITU-T. 2019. New appendix v—provisional planning values for the fullband equipment impairment factor and the fullband packet loss robustness factor. ITU-T Recommendation (2019).Google ScholarGoogle Scholar
  17. [17] García Boni, Gallego Micael, Gortázar Francisco, and Bertolino Antonia. 2018. Understanding and estimating quality of experience in WebRTC applications. Computing 101 (2018), 15851607.Google ScholarGoogle Scholar
  18. [18] Han Yi and Muntean Gabriel-Miro. 2015. Hybrid real-time quality assessment model for voice over IP. In Proceedings of the 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Handley Mark, Jacobson Van, and Perkins Colin. 2006. SDP: Session description protocol. IETF RFC4566 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Haßlinger Gerhard and Hohlfeld Oliver. 2008. The Gilbert-Elliott model for packet loss in real time services on the internet. In Proceedings of the 14th GI/ITG Conference-Measurement, Modelling, and Evaluation of Computer and Communication Systems. 115.Google ScholarGoogle Scholar
  21. [21] Hemminger Stephen. 2005. Network emulation with NetEm. In Proceedings of the Linux Conference. 1823.Google ScholarGoogle Scholar
  22. [22] Janssen Jan, Vleeschauwer Danny De, Buchli Maarten, and Petit Guido H.. 2002. Assessing voice quality in packet-based telephony. IEEE Internet Computing 6, 3 (2002), 4856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Jokisch Oliver, Maruschke Michael, Meszaros Martin, and Iaroshenko Viktor. 2016. Audio and speech quality survey of the Opus codec in web real-time communication. In Proceedings of the 27th Conference of Electronic Signal Processing (ESSV’16). 254262.Google ScholarGoogle Scholar
  24. [24] Jokisch Oliver, Maruschke Michael, Meszaros Martin, and Iaroshenko Viktor. 2016. Audio and speech quality survey of the Opus codec in Web real-time communication. In Proceedings of the 27th Conference of Electronic Signal Processing (ESSV’16). 254262.Google ScholarGoogle Scholar
  25. [25] Kaskinen Tanu. 2019. PulseAudio Volume Control 4.0. Retrieved September 15, 2021 from https://freedesktop.org/software/pulseaudio/pavucontrol/.Google ScholarGoogle Scholar
  26. [26] Kim Doh-Suk and Tarraf Ahmed. 2007. ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality. Bell Labs Technical Journal 12, 1 (2007), 221236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Levent-Levi Tashi. 2019. Identifiers for WebRTC’s Statistics API. Retrieved September 15, 2021 from https://webrtcglossary.com/sdp-munging/.Google ScholarGoogle Scholar
  28. [28] Maruschke Michael, Jokisch Oliver, Meszaros Martin, and Iaroshenko Viktor. 2015. Review of the Opus codec in a WebRTC scenario for audio and speech communication. In Proceedings of the International Conference on Speech and Computer. 348355.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Maruschke Michael, Jokisch Oliver, Meszaros Martin, and Iaroshenko Viktor. 2015. Review of the Opus codec in a WebRTC scenario for audio and speech communication. In Proceedings of the International Conference on Speech and Computer. 348355.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] McGowan James William. 2005. Burst ratio: A measure of bursty loss on packet-based networks. US Patent 6,931,017.Google ScholarGoogle Scholar
  31. [31] Meszaros Martin, Trojahn Franziska, Maruschke Michael, and Jokisch Oliver. 2018. QuARTCS: A tool enabling end-to-any speech quality assessment of WebRTC-based calls. In Proceedings of the International Conference on Speech and Computer. 408418.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Mittag Gabriel, Möller Sebastian, Barriac Vincent, and Ragot Stéphane. 2018. Quantifying quality degradation of the EVS super-wideband speech codec. In Proceedings of the 2018 10th International Conference on Quality of Multimedia Experience (QoMEX’18). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Mkwawa Is-Haka, Jammeh Emmanuel, Sun Lingfen, and Ifeachor Emmanuel. 2010. Feedback-free early VoIP quality adaptation scheme in next generation networks. In Proceedings of the 2010 IEEE Global Telecommunications Conference (GLOBECOM’10). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Möller Sebastian, Côté Nicolas, Gautier-Turbin Valérie, Kitawaki Nobuhiko, and Takahashi Akira. 2010. Instrumental estimation of E-model parameters for wideband speech codecs. EURASIP Journal on Audio, Speech, and Music Processing 2010, 1 (2010), 782731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Moller Sebastian, Raake Alexander, Kitawaki Nobuhiko, Takahashi Akira, and Waltermann Marcel. 2006. Impairment factor framework for wide-band speech codecs. IEEE Transactions on Audio, Speech, and Language Processing 14, 6 (2006), 19691976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Nandakumar Suhas and Jennings Cullen. 2018. Annotated example SDP for WebRTC draft-ietf-rtcweb-sdp-11. IETF (2018).Google ScholarGoogle Scholar
  37. [37] Ng See Leng, Hoh Simon, and Singh Devinder. 2005. Effectiveness of adaptive codec switching VoIP application over heterogeneous networks. In Proceedings of the 2005 2nd Asia Pacific Conferenceon Mobile Technology, Applications, and Systems.Google ScholarGoogle Scholar
  38. [38] P.563 ITU-T. 2004. Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T Recommendation (2004).Google ScholarGoogle Scholar
  39. [39] P.800 ITU-T. 1996. Methods for subjective determination of transmission quality. ITU-T Recommendation (1996).Google ScholarGoogle Scholar
  40. [40] P.833 ITU-T. 2001. Methodology for derivation of equipment impairment factors from subjective listening-only tests. ITU-T Recommendation (2001).Google ScholarGoogle Scholar
  41. [41] P.833.1 ITU-T. 2009. Methodology for the derivation of equipment impairment factors from subjective listening-only tests for wideband speech codecs. ITU-T Recommendation (2009).Google ScholarGoogle Scholar
  42. [42] P.834 ITU-T. 2015. Methodology for the derivation of equipment impairment factors from instrumental models. ITU-T Recommendation (2015).Google ScholarGoogle Scholar
  43. [43] P.834.1 ITU-T. 2015. Extension of the methodology for the derivation of equipment impairment factors from instrumental models for wideband speech codecs. ITU-T Recommendation (2015).Google ScholarGoogle Scholar
  44. [44] P.862 ITU-T. 2001. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recommendation (2001).Google ScholarGoogle Scholar
  45. [45] P.862.3 ITU-T. 2007. Application guide for objective quality measurement based on recommendations P.862, P.862.1 and P.862.2. ITU-T Recommendation (2007).Google ScholarGoogle Scholar
  46. [46] P.863 ITU-T. 2011. Perceptual objective listening quality assessment POLQA. ITU-T Recommendation (2011).Google ScholarGoogle Scholar
  47. [47] P.863.1 ITU-T. 2019. Application guide for recommendation ITU-T P.863. ITU-T Recommendation (2019).Google ScholarGoogle Scholar
  48. [48] Raake Alexander, Möller Sebastian, Wältermann Marcel, Cote Nicolas, and Ramirez J.-P.. 2010. Parameter-based prediction of speech quality in listening context—Towards a WB e-model. In Proceedings of the 2010 2nd International Workshop on Quality of Multimedia Experience (QoMEX’19). IEEE, Los Alamitos, CA, 182187.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Rabassa Abdel, St.-Hilaire Marc, Lung Chung-Horng, Lambadaris Ioannis, Goel Nishith, and Zaman Marzia. 2010. New speech traffic background simulation models for realistic VoIP network planning. In Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’10). IEEE, Los Alamitos, CA, 364371.Google ScholarGoogle Scholar
  50. [50] Rämö Anssi and Toukomaa Henri. 2011. Voice quality characterization of IETF Opus codec. In Proceedings of the 12th Annual Conference of the International Speech Communication Association.Google ScholarGoogle Scholar
  51. [51] Rosenberg Jonathan, Schulzrinne Henning, Camarillo Gonzalo, Johnston Alan B., Peterson Jon, Sparks Robert, Handley Mark, and Schooler Eve. 2002. SIP: Session initiation protocol. IETF RFC3261 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Salsano Stefano, Ludovici Fabio, Ordine Alessandro, and Giannuzzi D.. 2012. Definition of a general and intuitive loss model for packet networks and its implementation in the Netem module in the Linux kernel. University of Rome «Tor Vergata». Version 3 (2012).Google ScholarGoogle Scholar
  53. [53] SG12-C.22 ITU-T. 2017. A subjective ACR LOT testing fullband speech coding and prediction by P.863. CH-Geneva (2017).Google ScholarGoogle Scholar
  54. [54] SG12-C334 ITU-T. 2019. Instrumental estimation of e-model equipment impairment factor parameters for super-wideband Opus codec. CH-Geneva (2019).Google ScholarGoogle Scholar
  55. [55] SG12-C335 ITU-T. 2019. Instrumental estimation of e-model equipment impairment factor parameters for wideband Opus codec. CH-Geneva (2019).Google ScholarGoogle Scholar
  56. [56] SG12-C336 ITU-T. 2019. Instrumental estimation of e-model equipment impairment factor parameters for narrowband Opus codec. CH-Geneva (2019).Google ScholarGoogle Scholar
  57. [57] Shaw Mary and Objects Beyond. 1995. Software design paradigm based on process control. ACM Software Engineering Notes 20, 1 (1995), 2739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Spittka Julian, Vos Koen, and Valin Jean-Marc. 2015. RTP payload format for the Opus speech and audio codec. IETF RFC7587 (2015).Google ScholarGoogle Scholar
  59. [59] Valin Jean-Marc, Terriberry Timothy B., Maxwell Gregory, and Montgomery Christopher. 2010. Constrained-energy lapped transform (CELT) Codecdraft-valin-celt-codec-02. IETF (2010).Google ScholarGoogle Scholar
  60. [60] Valin Jean-Marc, Vos Koen, and Terriberry Timothy B.. 2012. Definition of the Opus audio codec. IETF RFC6716 (2012).Google ScholarGoogle Scholar
  61. [61] Varela Martın, Marsh Ian, and Grönvall Björn. 2006. A systematic study of PESQ’ s behavior (from a networking perspective). In Proceedings of Measurement of Speech and Audio Quality in Networks (2006).Google ScholarGoogle Scholar
  62. [62] Vos Koen, Jensen Soeren Skak, and Soerensen Karsten Vandborg. 2010. SILK speech codec draft-vos-silk-02. IETF (2010).Google ScholarGoogle Scholar
  63. [63] Wältermann Marcel, Tucker Izabela, Raake Alexander, and Möller Sebastian. 2010. Extension of the e-model towards super-wideband speech transmission. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 46544657.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Zhou Juejia, She Xiaoming, and Chen Lan. 2010. Source and channel coding adaptation for optimizing VoIP quality of experience in cellular systems. In Proceedings of the 2010 IEEE Wireless Communication and Networking Conference. IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
      November 2021
      529 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3492437
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2021
      • Accepted: 1 March 2021
      • Revised: 1 April 2020
      • Received: 1 September 2019
      Published in tomm Volume 17, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)637
      • Downloads (Last 6 weeks)56

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!