Abstract
Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.
- Z. Akhtar and T. H. Falk. 2017. Audio-visual multimedia quality assessment: A comprehensive survey. IEEE Access 5 (2017), 21090–21117. DOI:https://doi.org/10.1109/ACCESS.2017.2750918Google Scholar
Cross Ref
- Benjamin Belmudez. 2015. Audiovisual Quality Assessment and Prediction for Videotelephony. DOI:https://doi.org/10.1007/978-3-319-14166-4Google Scholar
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 281–305. http://dl.acm.org/citation.cfm?id=2503308.2188395Google Scholar
Digital Library
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin,.Google Scholar
Digital Library
- Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (Jul 2019), 832. DOI:https://doi.org/10.3390/electronics8080832Google Scholar
Cross Ref
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). Association for Computing Machinery, New York,, 785–794. DOI:https://doi.org/10.1145/2939672.2939785Google Scholar
Digital Library
- Tasos Dagiuklas, Raimund Schatz, Pedro Assuncao, and Luigi Atzori. 2017. Editorial: Special issue on “QoE monitoring and management for future internet media services”. Multimedia Tools and Applications 76, 21 (01 Nov 2017), 22213–22214. DOI:https://doi.org/10.1007/s11042-017-5188-6Google Scholar
- Edip Demirbilek and Jean-Charles Grégoire. 2018. Perceived audiovisual quality modelling based on decison trees, genetic programming and neural networks. CoRR abs/1801.05889 (2018). arxiv:1801.05889http://arxiv.org/abs/1801.05889Google Scholar
- Edip Demirbilek and Jean-Charles Grégoire. 2016. INRS audiovisual quality dataset. In Proceedings of the 24th ACM International Conference on Multimedia (MM’16). ACM, New York, NY, USA, 167–171. DOI:https://doi.org/10.1145/2964284.2967204Google Scholar
Digital Library
- Edip Demirbilek and Jean-Charles Grégoire. 2017. Machine learning-based parametric audiovisual quality prediction models for real-time communications. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 16 (March 2017), 25 pages. DOI:https://doi.org/10.1145/3051482Google Scholar
Digital Library
- E. Demirbilek and J. Grégoire. 2017. Machine learning based reduced reference bitstream audiovisual quality prediction models for realtime communications. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME). 571–576.Google Scholar
- M. Garcia, P. List, S. Argyropoulos, D. Lindegren, M. Pettersson, B. Feiten, J. Gustafsson, and A. Raake. 2013. Parametric model for audiovisual quality assessment in IPTV: ITU-T Rec. P.1201.2. In Proceedings of the 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP). 482–487.Google Scholar
- Marie-Neige Garcia, Robert Schleicher, and Alexander Raake. 2011. Impairment-factor-based audiovisual quality model for IPTV: Influence of video resolution, degradation type, and content type. EURASIP J. Image and Video Processing 2011 (2011). DOI:https://doi.org/10.1155/2011/629284Google Scholar
- Marie-Neige Garcia. 2016. Parametric Packet-based Audiovisual Quality Model for IPTV Services (1st ed.). Springer Publishing Company, Inc.Google Scholar
- Marie-Neige Garcia, Peter Listy, Bernhard Feiteny, Ulf Wustenhageny, and Alexander Raake. 2016. Audio-video databases for H.264-bitstream-based quality assessment of IPTV services. In Proceedings of the 2016 IEEE International Conference Quality of Multimedia Experience. qomex2016.itec.aau.at/index.php/short-papers/Google Scholar
- M. N. Garcia, A. Raake, and B. Feiten. 2013. Parametric audio quality model for IPTV services - ITU-T P.1201.2 audio. In Proceedings of the 2013 5th International Workshop on Quality of Multimedia Experience (QoMEX). 194–199.Google Scholar
- P. Gastaldo, S. Rovetta, and R. Zunino. 2002. Objective quality assessment of MPEG-2 video streams by using CBP neural networks. IEEE Transactions on Neural Networks 13, 4 (July 2002), 939–947. DOI:https://doi.org/10.1109/TNN.2002.1021894Google Scholar
Digital Library
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.Google Scholar
Digital Library
- Abdelwahab Hamam, Abdulmotaleb El Saddik, and Jihad Alja’am. 2014. A quality of experience model for haptic virtual environments. ACM Trans. Multimedia Comput. Commun. Appl. 10, 3, Article 28 (April 2014), 23 pages. DOI:https://doi.org/10.1145/2540991Google Scholar
Digital Library
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). Springer.Google Scholar
- Shin ichiro Iwamiya. 1994.Interactions between auditory and visual processing when listening to music in an audiovisual context: 1. Matching 2. Audio quality.Google Scholar
- ITU-T Recommendation G.1070. 2018. Opinion Model for Video-telephony Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google Scholar
- ITU-T Recommendation G.1071. 2016. Opinion Model for Network Planning of Video and Audio Streaming Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google Scholar
- ITU-T Recommendation P.1201. 2012. Parametric Non-intrusive Assessment of Audiovisual Media Streaming Quality. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google Scholar
- Baris Konuk, Emin Zerman, Gokce Nur Yilmaz, and Gozde Akar. 2016. Video content analysis method for audiovisual quality assessment. 1–6. DOI:https://doi.org/10.1109/QoMEX.2016.7498965Google Scholar
- Helard A. Becerra Martinez and Mylène C. Q. Farias. 2018. Combining audio and video metrics to assess audio-visual quality. Multimedia Tools and Applications 77, 18 (01 Sep 2018), 23993–24012. DOI:https://doi.org/10.1007/s11042-018-5656-7Google Scholar
- Helard Becerra Martinez, Mylène C. Q. Farias, and Andrew Hines. 2019. NAViDAd: A no-reference audio-visual quality metric based on a deep autoencoder. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019), (A Coruña, Spain, September 2-6, 2019). IEEE, 1–5. DOI:https://doi.org/10.23919/EUSIPCO.2019.8902975Google Scholar
Cross Ref
- Helard Becerra Martinez, Andrew Hines, and Mylène C. Q. Farias. 2020. How deep is your encoder: An analysis of features descriptors for an autoencoder-based audio-visual quality metric. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX 2020) (Athlone, Ireland, May 26-28, 2020). IEEE, 1–6. DOI:https://doi.org/10.1109/QoMEX48832.2020.9123142Google Scholar
Cross Ref
- Mansfield Merriman. 1877. On the history of the method of least squares. The Analyst 4, 2 (1877), 33–36. http://www.jstor.org/stable 2635472Google Scholar
Cross Ref
- Decebal Mocanu, Jeevan Pokhrel, Juan Pablo Garella, Janne Sepp nen, Eirini Liotou, and Manish Narwaria. 2015. No-reference video quality measurement: Added value of machine learning. Journal of Electronic Imaging 24 (12 2015), 061208. DOI:https://doi.org/10.1117/1.JEI.24.6.061208Google Scholar
- S. Möller, B. Belmudez, M. Garcia, C. Kühnel, A. Raake, and B. Weiss. 2010. Audiovisual quality integration: Comparison of human-human and human-machine interaction scenarios of different interactivity. In Proceedings of the 2010 2nd International Workshop on Quality of Multimedia Experience (QoMEX). 58–63. DOI:https://doi.org/10.1109/QOMEX.2010.5518100Google Scholar
Cross Ref
- M. Narwaria. 2018. Toward better statistical validation of machine learning-based multimedia quality estimators. IEEE Transactions on Broadcasting 64, 2 (June 2018), 446–460. DOI:https://doi.org/10.1109/TBC.2018.2832441Google Scholar
Cross Ref
- M. Narwaria and W. Lin. 2010. Objective image quality assessment based on support vector regression. IEEE Transactions on Neural Networks 21, 3 (March 2010), 515–519. DOI:https://doi.org/10.1109/TNN.2010.2040192Google Scholar
Digital Library
- J. Nightingale, P. Salva-Garcia, J. M. A. Calero, and Q. Wang. 2018. 5G-QoE: QoE modelling for ultra-HD video streaming in 5G networks. IEEE Transactions on Broadcasting 64, 2 (June 2018), 621–634. DOI:https://doi.org/10.1109/TBC.2018.2816786Google Scholar
Cross Ref
- Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, and Tetsuya Ogata. 2015. Audio-visual speech recognition using deep learning. Applied Intelligence 42, 4 (01 Jun 2015), 722–737. DOI:https://doi.org/10.1007/s10489-014-0629-7Google Scholar
- K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London Series A 187 (1896), 253–318. DOI:https://doi.org/10.1098/rsta.1896.0007Google Scholar
Cross Ref
- Stefano Petrangeli, Jeroen Van Der Hooft, Tim Wauters, and Filip De Turck. 2018. Quality of experience-centric management of adaptive video streaming services: Status and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 31 (May 2018), 29 pages. DOI:https://doi.org/10.1145/3165266Google Scholar
Digital Library
- M. H. Pinson, W. Ingram, and A. Webster. 2011. Audiovisual quality components. IEEE Signal Processing Magazine 28, 6 (Nov. 2011), 60–67. DOI:https://doi.org/10.1109/MSP.2011.942470Google Scholar
Cross Ref
- W. Robitza, M. N. Garcia, and A. Raake. 2015. At home in the lab: Assessing audiovisual quality of HTTP-based adaptive streaming with an immersive test paradigm. In Proceedings of the 2015 7th International Workshop on Quality of Multimedia Experience (QoMEX). 1–6.Google Scholar
- George G. Roussas. 2003. An Introduction to Probability and Statistical Inference. Elsevier.Google Scholar
- Matti Siekkinen, Teemu Kämäräinen, Leonardo Favario, and Enrico Masala. 2018. Can you see what I see? Quality-of-experience measurements of mobile live video broadcasting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 34 (April 2018), 23 pages. DOI:https://doi.org/10.1145/3165279Google Scholar
Digital Library
- Lea Skorin-Kapov, Martín Varela, Tobias Hossfeld, and Kuan-Ta Chen. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 29 (May 2018), 29 pages. DOI:https://doi.org/10.1145/3176648Google Scholar
Digital Library
- Ivan Slivar, Mirko Suznjevic, and Lea Skorin-Kapov. 2018. Game categorization for deriving QoE-driven video encoding configuration strategies for cloud gaming. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s, Article 56 (June 2018), 24 pages. DOI:https://doi.org/10.1145/3132041Google Scholar
Digital Library
- J. You, J. Korhonen, and U. Reiter. 2011. Audiovisual quality fusion based on relative multimodal complexity. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. 3337–3340. DOI:https://doi.org/10.1109/ICIP.2011.6116386Google Scholar
- Junyong You, Ulrich Reiter, Miska M. Hannuksela, Moncef Gabbouj, and Andrew Perkis. 2010. Perceptual-based quality assessment for audio-visual services: A survey. Signal Processing: Image Communication 25, 7 (2010), 482–501. DOI:https://doi.org/10.1016/j.image.2010.02.002Special Issue on Image and Video Quality Assessment.Google Scholar
Digital Library
- Zhenhui Yuan, Shengyang Chen, Gheorghita Ghinea, and Gabriel-Miro Muntean. 2014. User quality of experience of multimedia applications. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 15 (Oct. 2014), 19 pages. DOI:https://doi.org/10.1145/2661329Google Scholar
Digital Library
- B. Zhang, Z. Yan, J. Wang, Y. Luo, S. Yang, and Z. Fei. 2018. An audio-visual quality assessment methodology in virtual reality environment. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.Google Scholar
- Wei Zhang, Ting Yao, Shiai Zhu, and Abdulmotaleb El Saddik. 2019. Deep learning-based multimedia analytics: A review. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s, Article 2 (Jan. 2019), 26 pages. DOI:https://doi.org/10.1145/3279952Google Scholar
Digital Library
- Yi Zhu, Sharath Chandra Guntuku, Weisi Lin, Gheorghita Ghinea, and Judith A. Redi. 2018. Measuring individual video QoE: A survey, and proposal for future directions using social media. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 30 (May 2018), 24 pages. DOI:https://doi.org/10.1145/3183512Google Scholar
Digital Library
Index Terms
Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters
Recommendations
Enterprise Risk Assessment Based on Machine Learning
Scientific risk assessment is an important guarantee for the healthy development of an enterprise. With the continuous development and maturity of machine learning technology, it has played an important role in the field of data prediction and risk ...
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Full Reference Stereoscopic Objective Quality Assessment using Lightweight Machine Learning
WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the WebDecades of research on Image Quality Assessment (IQA) have promoted the creation of a variety of objective quality metrics that strongly correlate to subjective image quality. However, challenges remain when considering quality assessment of 3D/stereo ...






Comments