Abstract
Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application’s user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application.
- Muhammad Ajmal, Muhammad Husnain Ashraf, Muhammad Shakir, Yasir Abbas, and Faiz Ali Shah. 2012. Video summarization: Techniques and classification. In Computer Vision and Graphics. Springer, 1--13.Google Scholar
- Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. Shot and scene detection via hierarchical clustering for re-using broadcast video. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, 801--811.Google Scholar
Cross Ref
- Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 401--408. Google Scholar
Digital Library
- John Brooke and others. 1996. SUS-A quick and dirty usability scale. Usability Evaluation in Industry 189, 194 (1996), 4--7.Google Scholar
- Savvas A. Chatzichristofis and Yiannis S Boutalis. 2008. CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In International Conference on Computer Vision Systems. Springer, 312--322. Google Scholar
Digital Library
- Lieven De Marez and Katrien De Moor. 2007. The challenge of user-and QoE-centric research and product development in today’s ICT-environment. Observatorio (OBS*) 1, 3 (2007).Google Scholar
- Lieven De Marez and Gino Verleye. 2004. Innovation diffusion: The need for more accurate consumer insight. Illustration of the PSAP scale as a segmentation instrument. Journal of Targeting, Measurement and Analysis for Marketing 13, 1 (2004), 32--49.Google Scholar
Cross Ref
- Luciana dos Santos Belo, Carlos Antônio Caetano, Zenilton Kleber Gonçalves do Patrocínio, and Silvio Jamil Ferzoli Guimarães. 2016. Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing (2016), 1001--1016. Google Scholar
Digital Library
- M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. 2015. Where to buy it: Matching street clothing photos in online shops. In Proceedings of the IEEE International Conference on Computer Vision. 3343--3351. Google Scholar
Digital Library
- iMinds Digimeter. 2014. Adoption and usage of media and ICT in Flanders. Research report, Ghent, iMinds. Retrieved from https://www.iminds.be/en/gain-insights/digimeter.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google Scholar
Digital Library
- Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with graph LSTM. ArXiv Preprint ArXiv:1603.07063 (2016).Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google Scholar
- Kuan-Hsien Liu, Ting-Yen Chen, and Chu-Song Chen. 2016. MVC: A dataset for view-invariant clothing retrieval and attribute prediction. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 313--316. Google Scholar
Digital Library
- Xuekan Qiu, Shuqiang Jiang, Huiying Liu, Qingming Huang, and Longbing Cao. 2008. Spatial-temporal attention analysis for home video. In IEEE International Conference on Multimedia and Expo. IEEE, 1517--1520.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99. Google Scholar
Digital Library
- Everett M. Rogers. 2010. Diffusion of Innovations. Simon and Schuster.Google Scholar
- Dimitri Schuurman. 2015. Bridging the Gap Between Open and User Innovation?: Exploring the Value of Living Labs as a Means to Structure User Contribution and Manage Distributed Innovation. Ph.D. Dissertation. Ghent University.Google Scholar
- Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. 2014. A high performance CRF model for clothes parsing. In Asian Conference on Computer Vision. Springer, 64--81.Google Scholar
- Yuxing Tang, Josiah Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, and Liming Chen. 2016. Large scale semi-supervised object detection using visual and semantic knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Florian Vandecasteele, Jeroen Vervaeke, Baptist Vandersmissen, Michel De Wachter, and Steven Verstockt. 2016. Spatio-temporal wardrobe generation of actors clothing in video content. In Proceedings of the International Conference on Human-Computer Interaction. Springer International Publishing, 448--459.Google Scholar
Cross Ref
- Karel Vandenbroucke and Dimitri Schuurman. 2016. APPTVATE: The mobile shopping technology that enriches your TV experience. IMinds-Appiness-Medialaan (2016). https://www.researchgate.net/publication/301693675_APPTVATE_WHITE_PAPER_THE_MOBILE_SHOPPING_TECHNOLOGY_THAT_ENRICHES_YOUR_TV-EXPERIENCE.Google Scholar
- Feng Wang and Chong-Wah Ngo. 2012. Summarizing rushes videos by motion, object, and event understanding. IEEE Transactions on Multimedia 14, 1 (2012), 76--87. Google Scholar
Digital Library
- Haoran Wang, Zhengzhong Zhou, Changcheng Xiao, and Liqing Zhang. 2015. Content based image search for clothing recommendations in e-commerce. In Multimedia Data Mining and Analytics. Springer, 253--267.Google Scholar
- Sergey Zagoruyko and Nikos Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353--4361.Google Scholar
Cross Ref
- Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, and Piotr Dollár. 2016. A multipath network for object detection. ArXiv Preprint ArXiv:1604.02135 (2016).Google Scholar
- Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google Scholar
Index Terms
Spott: On-the-Spot e-Commerce for Television Using Deep Learning-Based Video Analysis Techniques
Recommendations
Interactive documentaries: A Golden Age
SPECIAL ISSUE: TV and Video Entertainment EnvironmentsThis article is motivated by the opportunity presented by recent advances in information and communication technology—particularly by faster broadband connections and faster digital media processing capabilities—for interactive television to extend and ...
Exploring the effects of interactivity in television drama
EuroITV '07: Proceedings of the 5th European Conference on Interactive TV and VideoInteractive television dramas have long promised to deliver entertaining experiences. In practice, however, successful interactive television dramas are rare. This paper suggests that the fault lies in attempts to abandon narrative structure in favour ...
Interactive television: new genres, new format, new content
IE '05: Proceedings of the second Australasian conference on Interactive entertainmentThe aim of this paper is to discuss some of the main issues associated with interactive genres, formats and content in the context of interactive television (ITV). First, a set of new forms or categorizations of ITV will be presented. Second, the suite ...






Comments