skip to main content
research-article

Spott: On-the-Spot e-Commerce for Television Using Deep Learning-Based Video Analysis Techniques

Published:28 June 2017Publication History
Skip Abstract Section

Abstract

Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application’s user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application.

References

  1. Muhammad Ajmal, Muhammad Husnain Ashraf, Muhammad Shakir, Yasir Abbas, and Faiz Ali Shah. 2012. Video summarization: Techniques and classification. In Computer Vision and Graphics. Springer, 1--13.Google ScholarGoogle Scholar
  2. Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. Shot and scene detection via hierarchical clustering for re-using broadcast video. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, 801--811.Google ScholarGoogle ScholarCross RefCross Ref
  3. Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 401--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. John Brooke and others. 1996. SUS-A quick and dirty usability scale. Usability Evaluation in Industry 189, 194 (1996), 4--7.Google ScholarGoogle Scholar
  5. Savvas A. Chatzichristofis and Yiannis S Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In International Conference on Computer Vision Systems. Springer, 312--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lieven De Marez and Katrien De Moor. 2007. The challenge of user-and QoE-centric research and product development in today’s ICT-environment. Observatorio (OBS*) 1, 3 (2007).Google ScholarGoogle Scholar
  7. Lieven De Marez and Gino Verleye. 2004. Innovation diffusion: The need for more accurate consumer insight. Illustration of the PSAP scale as a segmentation instrument. Journal of Targeting, Measurement and Analysis for Marketing 13, 1 (2004), 32--49.Google ScholarGoogle ScholarCross RefCross Ref
  8. Luciana dos Santos Belo, Carlos Antônio Caetano, Zenilton Kleber Gonçalves do Patrocínio, and Silvio Jamil Ferzoli Guimarães. 2016. Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing (2016), 1001--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. 2015. Where to buy it: Matching street clothing photos in online shops. In Proceedings of the IEEE International Conference on Computer Vision. 3343--3351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. iMinds Digimeter. 2014. Adoption and usage of media and ICT in Flanders. Research report, Ghent, iMinds. Retrieved from https://www.iminds.be/en/gain-insights/digimeter.Google ScholarGoogle Scholar
  11. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with graph LSTM. ArXiv Preprint ArXiv:1603.07063 (2016).Google ScholarGoogle Scholar
  13. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google ScholarGoogle Scholar
  14. Kuan-Hsien Liu, Ting-Yen Chen, and Chu-Song Chen. 2016. MVC: A dataset for view-invariant clothing retrieval and attribute prediction. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 313--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xuekan Qiu, Shuqiang Jiang, Huiying Liu, Qingming Huang, and Longbing Cao. 2008. Spatial-temporal attention analysis for home video. In IEEE International Conference on Multimedia and Expo. IEEE, 1517--1520.Google ScholarGoogle Scholar
  16. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Everett M. Rogers. 2010. Diffusion of Innovations. Simon and Schuster.Google ScholarGoogle Scholar
  18. Dimitri Schuurman. 2015. Bridging the Gap Between Open and User Innovation?: Exploring the Value of Living Labs as a Means to Structure User Contribution and Manage Distributed Innovation. Ph.D. Dissertation. Ghent University.Google ScholarGoogle Scholar
  19. Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. 2014. A high performance CRF model for clothes parsing. In Asian Conference on Computer Vision. Springer, 64--81.Google ScholarGoogle Scholar
  20. Yuxing Tang, Josiah Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, and Liming Chen. 2016. Large scale semi-supervised object detection using visual and semantic knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  21. Florian Vandecasteele, Jeroen Vervaeke, Baptist Vandersmissen, Michel De Wachter, and Steven Verstockt. 2016. Spatio-temporal wardrobe generation of actors clothing in video content. In Proceedings of the International Conference on Human-Computer Interaction. Springer International Publishing, 448--459.Google ScholarGoogle ScholarCross RefCross Ref
  22. Karel Vandenbroucke and Dimitri Schuurman. 2016. APPTVATE: The mobile shopping technology that enriches your TV experience. IMinds-Appiness-Medialaan (2016). https://www.researchgate.net/publication/301693675_APPTVATE_WHITE_PAPER_THE_MOBILE_SHOPPING_TECHNOLOGY_THAT_ENRICHES_YOUR_TV-EXPERIENCE.Google ScholarGoogle Scholar
  23. Feng Wang and Chong-Wah Ngo. 2012. Summarizing rushes videos by motion, object, and event understanding. IEEE Transactions on Multimedia 14, 1 (2012), 76--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Haoran Wang, Zhengzhong Zhou, Changcheng Xiao, and Liqing Zhang. 2015. Content based image search for clothing recommendations in e-commerce. In Multimedia Data Mining and Analytics. Springer, 253--267.Google ScholarGoogle Scholar
  25. Sergey Zagoruyko and Nikos Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353--4361.Google ScholarGoogle ScholarCross RefCross Ref
  26. Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, and Piotr Dollár. 2016. A multipath network for object detection. ArXiv Preprint ArXiv:1604.02135 (2016).Google ScholarGoogle Scholar
  27. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google ScholarGoogle Scholar

Index Terms

  1. Spott: On-the-Spot e-Commerce for Television Using Deep Learning-Based Video Analysis Techniques

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!