skip to main content
research-article

A Tucker Deep Computation Model for Mobile Multimedia Feature Learning

Authors Info & Claims
Published:10 August 2017Publication History
Skip Abstract Section

Abstract

Recently, the deep computation model, as a tensor deep learning model, has achieved super performance for multimedia feature learning. However, the conventional deep computation model involves a large number of parameters. Typically, training a deep computation model with millions of parameters needs high-performance servers with large-scale memory and powerful computing units, limiting the growth of the model size for multimedia feature learning on common devices such as portable CPUs and conventional desktops. To tackle this problem, this article proposes a Tucker deep computation model by using the Tucker decomposition to compress the weight tensors in the full-connected layers for multimedia feature learning. Furthermore, a learning algorithm based on the back-propagation strategy is devised to train the parameters of the Tucker deep computation model. Finally, the performance of the Tucker deep computation model is evaluated by comparing with the conventional deep computation model on two representative multimedia datasets, that is, CUAVE and SNAE2, in terms of accuracy drop, parameter reduction, and speedup in the experiments. Results imply that the Tucker deep computation model can achieve a large-parameter reduction and speedup with a small accuracy drop for multimedia feature learning.

References

  1. Oluwakemi A. Ademoye, Niall Murray, Gabriel-Miro Muntean, and Gheorghita Ghinea. 2016. Audio masking effect on inter-component skews in olfaction-enhanced multimedia presentations. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 4, 51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8, 1798--1828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International Conference on Machine Learning. ACM, 2285--2294.Google ScholarGoogle Scholar
  4. Xue-Wen Chen and Xiaotong Lin. 2014. Big data deep learning: Challenges and perspectives. IEEE Access 2, 514--525. Google ScholarGoogle ScholarCross RefCross Ref
  5. Andrzej Cichocki. 2014. Era of big data processing: A new approach via tensor networks and tensor decompositions. arXiv preprint arXiv. arXiv:1403.2048.Google ScholarGoogle Scholar
  6. Li Deng, Dong Yu, and John Platt. 2012. Scalable stacking and learning for building deep architectures. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2133--2136. Google ScholarGoogle ScholarCross RefCross Ref
  7. Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando de Freitas. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems. MIT, 2148--2156.Google ScholarGoogle Scholar
  8. Mianxiong Dong, Kaoru Ota, and Anfeng Liu. 2016. RMER: Reliable and energy-efficient data collection for large-scale wireless sensor networks. IEEE Internet of Things Journal 3, 4, 511--519. Google ScholarGoogle ScholarCross RefCross Ref
  9. Zhen Guo, Zhongfei Zhang, Eric P. Xing, and Christos Faloutsos. 2016. Multimodal data mining in a multimedia database based on structured max margin learning. ACM Transactions on Knowledge Discovery from Data 10, 3, 23.Google ScholarGoogle Scholar
  10. Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics 35, 14, 138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sergio Ilarri, Florence Sdes, Francesco De Natale, and Alan Hanjalic. 2014. Multimedia data management in mobile computing. IEEE Multimedia 21, 1, 10--13. Google ScholarGoogle ScholarCross RefCross Ref
  12. Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv preprint arXiv. arXiv:1412.6553.Google ScholarGoogle Scholar
  13. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2014. Deep learning. Nature 521, 7553, 436--444. Google ScholarGoogle ScholarCross RefCross Ref
  14. Xiulong Liu, Heng Qi, Keqiu Li, Ivan Stojmenovic, Alex X. Liu, Yanming Shen, and Wenyu Qu. 2015. Sampling Bloom filter-based detection of unknown RFID tags. IEEE Transactions on Communications 63, 4, 1432--1442. Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhi Liu, Mianxiong Dong, Bo Gu, Cheng Zhang, Yusheng Ji, and Yoshiaki Tanaka. 2017. Fast-start video delivery in future Internet architectures with intra-domain caching. MONET 22, 1, 98--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lei Meng, Ah-Hwee Tan, and Dong Xu. 2014. Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Transactions on Knowledge and Data Engineering 26, 9, 2293--2306. Google ScholarGoogle ScholarCross RefCross Ref
  17. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew N. Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning. ACM, 689--696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems. MIT, 442--450.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wanli Ouyang, Xiao Chu, and Xiaogang Wang. 2014. Multi-source deep learning for human pose estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2337--2344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eric Patterson, Ozgur Gurbuz, Zeynep Tufekci, and John N. Gowdy. 2002. CUAVE: A new audio-visual database for multimodal human-computer interface research. In Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2017--2020. Google ScholarGoogle ScholarCross RefCross Ref
  21. Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th International Conference on Machine learning. ACM, 873--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, and Bo Yuan. 2016. Designing reconfigurable large-scale deep learning systems using stochastic computing. In IEEE International Conference on Rebooting Computing. IEEE, 1--7. Google ScholarGoogle ScholarCross RefCross Ref
  23. Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In International Conference of Acoustics, Speech, and Signal Processing. IEEE, 6655--6659. Google ScholarGoogle ScholarCross RefCross Ref
  24. Jurgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61, 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kijung Shin, Lee Sael, and U. Kang. 2017. Fully scalable methods for distributed tensor factorization. IEEE Transactions on Knowledge and Data Engineering 29, 1, 100--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nitish Srivastava and Ruslan R. Salakhutdinov. 2012. Multimodal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems. MIT, 2222--2230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaokang Wang, Laurence T. Yang, Jun Feng, Xingyu Chen, and Mohamed J. Deen. A tensor-based big service framework for enhanced living environments. IEEE Cloud Computing 3, 6, 36--43.Google ScholarGoogle Scholar
  28. Simon Wiesler, Alexander Richard, Ralf Schlter, and Hermann Ney. 2014. Mean-normalized stochastic gradient for large-scale deep learning. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 180--184. Google ScholarGoogle ScholarCross RefCross Ref
  29. Wai K. Wong, Zhihui Lai, Yong Xu, Jiajun Wen, and Chu Po Ho. 2015. Joint tensor feature analysis for visual object recognition. IEEE Transactions on Cybernetics 45, 11, 2425--2436. Google ScholarGoogle ScholarCross RefCross Ref
  30. Jun Wu, Kaoru Ota, Mianxiong Dong, and Chunxiao Li. 2016. A hierarchical security framework for defending against sophisticated attacks on wireless sensor networks in smart cities. IEEE Access 4, 416--424. Google ScholarGoogle ScholarCross RefCross Ref
  31. Qingbo Wu, Hua Zhang, Si Liu, and Xiaochun Cao. 2015. Multimedia analysis with deep learning. In Proceedings of IEEE International Conference on Multimedia Big Data. IEEE, 20--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2015. Cross-domain feature learning in multimedia. IEEE Transactions on Multimedia 17, 1, 64--78. Google ScholarGoogle ScholarCross RefCross Ref
  33. Qingchen Zhang, Laurence T. Yang, and Zhikui Chen. 2016. Privacy preserving deep computation model on cloud for big data feature learning. IEEE Transactions on Computers 65, 5, 1351--1362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Qingchen Zhang, Laurence T. Yang, and Zhikui Chen. 2016. Deep computation model for unsupervised feature learning on big data. IEEE Transactions on Services Computing 9, 1, 161--171.Google ScholarGoogle Scholar
  35. Yanbo Zhang, Xuanqin Mou, Ge Wang, and Hengyong Yu. 2017. Tensor-based dictionary learning for spectral CT reconstruction. IEEE Transactions on Medical Imaging 36, 1, 142--154. Google ScholarGoogle ScholarCross RefCross Ref
  36. Xueyi Zhao, Xi Li, and Zhongfei Zhang. 2015. Multimedia retrieval via deep learning to rank. IEEE Signal Processing Letters 22, 9, 1481--1491. Google ScholarGoogle ScholarCross RefCross Ref
  37. Yucan Zhou, Qinghua Hu, Jie Liu, and Yuan Jia. 2015. Combining multi-modal deep neural networks with conditional random fields for Chinese dialogue act recognition. Neurocomputing 168, 408--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chunsheng Zhu, Lei Shu, Takahiro Hara, Lei Wang, Shojiro Nishio, and Laurence T. Yang. 2014. A survey on communication and data management issues in mobile sensor networks. Wireless Communications and Mobile Computing 14, 1, 19--36. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Tucker Deep Computation Model for Mobile Multimedia Feature Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 3s
            Special Section on Deep Learning for Mobile Multimedia and Special Section on Best Papers from ACM MMSys/NOSSDAV 2016
            August 2017
            258 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3119899
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 August 2017
            • Revised: 1 February 2017
            • Accepted: 1 February 2017
            • Received: 1 October 2016
            Published in tomm Volume 13, Issue 3s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!