skip to main content
research-article

Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters

Published:21 April 2021Publication History
Skip Abstract Section

Abstract

Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.

References

  1. Z. Akhtar and T. H. Falk. 2017. Audio-visual multimedia quality assessment: A comprehensive survey. IEEE Access 5 (2017), 21090–21117. DOI:https://doi.org/10.1109/ACCESS.2017.2750918Google ScholarGoogle ScholarCross RefCross Ref
  2. Benjamin Belmudez. 2015. Audiovisual Quality Assessment and Prediction for Videotelephony. DOI:https://doi.org/10.1007/978-3-319-14166-4Google ScholarGoogle Scholar
  3. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 281–305. http://dl.acm.org/citation.cfm?id=2503308.2188395Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin,.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (Jul 2019), 832. DOI:https://doi.org/10.3390/electronics8080832Google ScholarGoogle ScholarCross RefCross Ref
  6. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). Association for Computing Machinery, New York,, 785–794. DOI:https://doi.org/10.1145/2939672.2939785Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tasos Dagiuklas, Raimund Schatz, Pedro Assuncao, and Luigi Atzori. 2017. Editorial: Special issue on “QoE monitoring and management for future internet media services”. Multimedia Tools and Applications 76, 21 (01 Nov 2017), 22213–22214. DOI:https://doi.org/10.1007/s11042-017-5188-6Google ScholarGoogle Scholar
  8. Edip Demirbilek and Jean-Charles Grégoire. 2018. Perceived audiovisual quality modelling based on decison trees, genetic programming and neural networks. CoRR abs/1801.05889 (2018). arxiv:1801.05889http://arxiv.org/abs/1801.05889Google ScholarGoogle Scholar
  9. Edip Demirbilek and Jean-Charles Grégoire. 2016. INRS audiovisual quality dataset. In Proceedings of the 24th ACM International Conference on Multimedia (MM’16). ACM, New York, NY, USA, 167–171. DOI:https://doi.org/10.1145/2964284.2967204Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Edip Demirbilek and Jean-Charles Grégoire. 2017. Machine learning-based parametric audiovisual quality prediction models for real-time communications. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 16 (March 2017), 25 pages. DOI:https://doi.org/10.1145/3051482Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Demirbilek and J. Grégoire. 2017. Machine learning based reduced reference bitstream audiovisual quality prediction models for realtime communications. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME). 571–576.Google ScholarGoogle Scholar
  12. M. Garcia, P. List, S. Argyropoulos, D. Lindegren, M. Pettersson, B. Feiten, J. Gustafsson, and A. Raake. 2013. Parametric model for audiovisual quality assessment in IPTV: ITU-T Rec. P.1201.2. In Proceedings of the 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP). 482–487.Google ScholarGoogle Scholar
  13. Marie-Neige Garcia, Robert Schleicher, and Alexander Raake. 2011. Impairment-factor-based audiovisual quality model for IPTV: Influence of video resolution, degradation type, and content type. EURASIP J. Image and Video Processing 2011 (2011). DOI:https://doi.org/10.1155/2011/629284Google ScholarGoogle Scholar
  14. Marie-Neige Garcia. 2016. Parametric Packet-based Audiovisual Quality Model for IPTV Services (1st ed.). Springer Publishing Company, Inc.Google ScholarGoogle Scholar
  15. Marie-Neige Garcia, Peter Listy, Bernhard Feiteny, Ulf Wustenhageny, and Alexander Raake. 2016. Audio-video databases for H.264-bitstream-based quality assessment of IPTV services. In Proceedings of the 2016 IEEE International Conference Quality of Multimedia Experience. qomex2016.itec.aau.at/index.php/short-papers/Google ScholarGoogle Scholar
  16. M. N. Garcia, A. Raake, and B. Feiten. 2013. Parametric audio quality model for IPTV services - ITU-T P.1201.2 audio. In Proceedings of the 2013 5th International Workshop on Quality of Multimedia Experience (QoMEX). 194–199.Google ScholarGoogle Scholar
  17. P. Gastaldo, S. Rovetta, and R. Zunino. 2002. Objective quality assessment of MPEG-2 video streams by using CBP neural networks. IEEE Transactions on Neural Networks 13, 4 (July 2002), 939–947. DOI:https://doi.org/10.1109/TNN.2002.1021894Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Abdelwahab Hamam, Abdulmotaleb El Saddik, and Jihad Alja’am. 2014. A quality of experience model for haptic virtual environments. ACM Trans. Multimedia Comput. Commun. Appl. 10, 3, Article 28 (April 2014), 23 pages. DOI:https://doi.org/10.1145/2540991Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). Springer.Google ScholarGoogle Scholar
  21. Shin ichiro Iwamiya. 1994.Interactions between auditory and visual processing when listening to music in an audiovisual context: 1. Matching 2. Audio quality.Google ScholarGoogle Scholar
  22. ITU-T Recommendation G.1070. 2018. Opinion Model for Video-telephony Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google ScholarGoogle Scholar
  23. ITU-T Recommendation G.1071. 2016. Opinion Model for Network Planning of Video and Audio Streaming Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google ScholarGoogle Scholar
  24. ITU-T Recommendation P.1201. 2012. Parametric Non-intrusive Assessment of Audiovisual Media Streaming Quality. Technical Report. International Telecommunication Union, Geneva, Switzerland.Google ScholarGoogle Scholar
  25. Baris Konuk, Emin Zerman, Gokce Nur Yilmaz, and Gozde Akar. 2016. Video content analysis method for audiovisual quality assessment. 1–6. DOI:https://doi.org/10.1109/QoMEX.2016.7498965Google ScholarGoogle Scholar
  26. Helard A. Becerra Martinez and Mylène C. Q. Farias. 2018. Combining audio and video metrics to assess audio-visual quality. Multimedia Tools and Applications 77, 18 (01 Sep 2018), 23993–24012. DOI:https://doi.org/10.1007/s11042-018-5656-7Google ScholarGoogle Scholar
  27. Helard Becerra Martinez, Mylène C. Q. Farias, and Andrew Hines. 2019. NAViDAd: A no-reference audio-visual quality metric based on a deep autoencoder. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019), (A Coruña, Spain, September 2-6, 2019). IEEE, 1–5. DOI:https://doi.org/10.23919/EUSIPCO.2019.8902975Google ScholarGoogle ScholarCross RefCross Ref
  28. Helard Becerra Martinez, Andrew Hines, and Mylène C. Q. Farias. 2020. How deep is your encoder: An analysis of features descriptors for an autoencoder-based audio-visual quality metric. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX 2020) (Athlone, Ireland, May 26-28, 2020). IEEE, 1–6. DOI:https://doi.org/10.1109/QoMEX48832.2020.9123142Google ScholarGoogle ScholarCross RefCross Ref
  29. Mansfield Merriman. 1877. On the history of the method of least squares. The Analyst 4, 2 (1877), 33–36. http://www.jstor.org/stable 2635472Google ScholarGoogle ScholarCross RefCross Ref
  30. Decebal Mocanu, Jeevan Pokhrel, Juan Pablo Garella, Janne Sepp nen, Eirini Liotou, and Manish Narwaria. 2015. No-reference video quality measurement: Added value of machine learning. Journal of Electronic Imaging 24 (12 2015), 061208. DOI:https://doi.org/10.1117/1.JEI.24.6.061208Google ScholarGoogle Scholar
  31. S. Möller, B. Belmudez, M. Garcia, C. Kühnel, A. Raake, and B. Weiss. 2010. Audiovisual quality integration: Comparison of human-human and human-machine interaction scenarios of different interactivity. In Proceedings of the 2010 2nd International Workshop on Quality of Multimedia Experience (QoMEX). 58–63. DOI:https://doi.org/10.1109/QOMEX.2010.5518100Google ScholarGoogle ScholarCross RefCross Ref
  32. M. Narwaria. 2018. Toward better statistical validation of machine learning-based multimedia quality estimators. IEEE Transactions on Broadcasting 64, 2 (June 2018), 446–460. DOI:https://doi.org/10.1109/TBC.2018.2832441Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Narwaria and W. Lin. 2010. Objective image quality assessment based on support vector regression. IEEE Transactions on Neural Networks 21, 3 (March 2010), 515–519. DOI:https://doi.org/10.1109/TNN.2010.2040192Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Nightingale, P. Salva-Garcia, J. M. A. Calero, and Q. Wang. 2018. 5G-QoE: QoE modelling for ultra-HD video streaming in 5G networks. IEEE Transactions on Broadcasting 64, 2 (June 2018), 621–634. DOI:https://doi.org/10.1109/TBC.2018.2816786Google ScholarGoogle ScholarCross RefCross Ref
  35. Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, and Tetsuya Ogata. 2015. Audio-visual speech recognition using deep learning. Applied Intelligence 42, 4 (01 Jun 2015), 722–737. DOI:https://doi.org/10.1007/s10489-014-0629-7Google ScholarGoogle Scholar
  36. K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London Series A 187 (1896), 253–318. DOI:https://doi.org/10.1098/rsta.1896.0007Google ScholarGoogle ScholarCross RefCross Ref
  37. Stefano Petrangeli, Jeroen Van Der Hooft, Tim Wauters, and Filip De Turck. 2018. Quality of experience-centric management of adaptive video streaming services: Status and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 31 (May 2018), 29 pages. DOI:https://doi.org/10.1145/3165266Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. H. Pinson, W. Ingram, and A. Webster. 2011. Audiovisual quality components. IEEE Signal Processing Magazine 28, 6 (Nov. 2011), 60–67. DOI:https://doi.org/10.1109/MSP.2011.942470Google ScholarGoogle ScholarCross RefCross Ref
  39. W. Robitza, M. N. Garcia, and A. Raake. 2015. At home in the lab: Assessing audiovisual quality of HTTP-based adaptive streaming with an immersive test paradigm. In Proceedings of the 2015 7th International Workshop on Quality of Multimedia Experience (QoMEX). 1–6.Google ScholarGoogle Scholar
  40. George G. Roussas. 2003. An Introduction to Probability and Statistical Inference. Elsevier.Google ScholarGoogle Scholar
  41. Matti Siekkinen, Teemu Kämäräinen, Leonardo Favario, and Enrico Masala. 2018. Can you see what I see? Quality-of-experience measurements of mobile live video broadcasting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 34 (April 2018), 23 pages. DOI:https://doi.org/10.1145/3165279Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lea Skorin-Kapov, Martín Varela, Tobias Hossfeld, and Kuan-Ta Chen. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 29 (May 2018), 29 pages. DOI:https://doi.org/10.1145/3176648Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ivan Slivar, Mirko Suznjevic, and Lea Skorin-Kapov. 2018. Game categorization for deriving QoE-driven video encoding configuration strategies for cloud gaming. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s, Article 56 (June 2018), 24 pages. DOI:https://doi.org/10.1145/3132041Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. You, J. Korhonen, and U. Reiter. 2011. Audiovisual quality fusion based on relative multimodal complexity. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. 3337–3340. DOI:https://doi.org/10.1109/ICIP.2011.6116386Google ScholarGoogle Scholar
  45. Junyong You, Ulrich Reiter, Miska M. Hannuksela, Moncef Gabbouj, and Andrew Perkis. 2010. Perceptual-based quality assessment for audio-visual services: A survey. Signal Processing: Image Communication 25, 7 (2010), 482–501. DOI:https://doi.org/10.1016/j.image.2010.02.002Special Issue on Image and Video Quality Assessment.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhenhui Yuan, Shengyang Chen, Gheorghita Ghinea, and Gabriel-Miro Muntean. 2014. User quality of experience of multimedia applications. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 15 (Oct. 2014), 19 pages. DOI:https://doi.org/10.1145/2661329Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Zhang, Z. Yan, J. Wang, Y. Luo, S. Yang, and Z. Fei. 2018. An audio-visual quality assessment methodology in virtual reality environment. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.Google ScholarGoogle Scholar
  48. Wei Zhang, Ting Yao, Shiai Zhu, and Abdulmotaleb El Saddik. 2019. Deep learning-based multimedia analytics: A review. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s, Article 2 (Jan. 2019), 26 pages. DOI:https://doi.org/10.1145/3279952Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yi Zhu, Sharath Chandra Guntuku, Weisi Lin, Gheorghita Ghinea, and Judith A. Redi. 2018. Measuring individual video QoE: A survey, and proposal for future directions using social media. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 30 (May 2018), 24 pages. DOI:https://doi.org/10.1145/3183512Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
          May 2021
          410 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3461621
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2021
          • Accepted: 1 October 2020
          • Revised: 1 August 2020
          • Received: 1 January 2020
          Published in tomm Volume 17, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)22
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!