skip to main content
research-article

Scalable multimedia content analysis on parallel platforms using python

Published:14 February 2014Publication History
Skip Abstract Section

Abstract

In this new era dominated by consumer-produced media there is a high demand for web-scalable solutions to multimedia content analysis. A compelling approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge due to the increased code complexity, limited portability and required low-level knowledge of the underlying hardware. In this article, we present PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code to a variety of parallel platforms. PyCASP is designed using a systematic, pattern-oriented approach to offer a single software development environment for multimedia content analysis applications. Using PyCASP, applications can be prototyped in a couple hundred lines of Python code and automatically scale to modern parallel processors. Applications written with PyCASP are portable to a variety of parallel platforms and efficiently scale from a single desktop Graphics Processing Unit (GPU) to an entire cluster with a small change to application code. To illustrate our approach, we present three multimedia content analysis applications that use our framework: a state-of-the-art speaker diarization application, a content-based music recommendation system based on the Million Song Dataset, and a video event detection system for consumer-produced videos. We show that across this wide range of applications, our approach achieves the goal of automatic portability and scalability while at the same time allowing easy prototyping in a high-level language and efficient performance of low-level optimized code.

References

  1. X. Amatriain, M. D. Boer, and E. Robledo. 2002. Clam: An OO framework for developing audio and music applications. In Proceedings of the 17th Annual Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Andoni 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, 459--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. X. Anguera, S. Bozonnet, N. W. D. Evans, C. Fredouille, G. Friedland, and O. Vinyals. 2012. Speaker diarization: A review of recent research. IEEE Trans. Acoust. Speech Signal Process. 20, 356--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Asanovic, R. Bodik, et al. 2006. The landscape of parallel computing research: A view from Berkeley. Tech. rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  5. D. Ascher, P. F. Dubois, K. Hinsen, J. Hugunin, and T. Oliphant. 1999. Numerical Python UCRL-MA-128569. Lawrence Livermore National Laboratory, Livermore, CA.Google ScholarGoogle Scholar
  6. E. Battenberg and D. Wessel. 2009. Accelerating non-negative matrix factorization for audio source separation on multi-core and many-core architectures. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 501--506.Google ScholarGoogle Scholar
  7. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference.Google ScholarGoogle Scholar
  8. T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million song dataset. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR'11).Google ScholarGoogle Scholar
  9. C. M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. S. Blackford, J. Demmel, et al. 2001. An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw. 28, 135--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Carletta. 2007. Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Language Resources Eval. 41, 2, 181--190.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. Catanzaro, M. Garland, and K. Keutzer. 2010. Copperhead: Compiling an embedded data parallel language. Tech. rep. UCB/EECS-2010-124, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  13. B. Catanzaro, S. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. 2009a. SEJITS: Getting productivity and performance with selective embedded JIT specialization. In Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA'09).Google ScholarGoogle Scholar
  14. B. Catanzaro, B.-Y. Su, N. Sundaram, Y. Lee, M. Murphy, and K. Keutzer. 2009b. Efficient, high-quality image contour detection. In Proceedings of the IEEE 12th International Conference on Computer Vision. 2381--2388.Google ScholarGoogle Scholar
  15. B. Catanzaro, N. Sundaram, and K. Keutzer. 2008. Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th International Conference on Machine Learning (ICML'08). ACM, New York, 104--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, New York, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu, and H. Cui. 2009. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, Springer, 213--220.Google ScholarGoogle Scholar
  18. C. Charbuillet, D. Tardieu, and G. Peeters. 2011. Gmm supervector for content based music similarity. In Proceedings of the 14th International Conference on Digital Audio Effects.Google ScholarGoogle Scholar
  19. S. Chaudhuri, M. Harvilla, and B. Raj. 2011. Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In Proceedings of the 11th Proceedings of the Annual Conference of the International Speech Communication Association.Google ScholarGoogle Scholar
  20. J. Chaves, J. Nehrbass, B. Guilfoos, J. Gardiner, S. Ahalt, A. Krishnamurthy, J. Unpingco, A. Chalker, A. Warnock, and S. Samsi. 2006. Octave and Python: High-level scripting languages productivity and performance evaluation. In Proceedings of the HPCMP Users Group Conference. 429--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Chong, G. Friedland, A. Janin, N. Morgan, and C. Oei. 2010. Opportunities and challenges of parallelizing speech recognition. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar'10). USENIX Association, Berkeley, CA, 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Chong, E. Gonina, Y. Yi, and K. Keutzer. 2009. A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. In Proceedings of the 10th Annual Conference of the International Speech Communication Association.Google ScholarGoogle Scholar
  23. H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, and A. Fox. 2011. Cuda-level performance with python-level productivity for Gaussian mixture model applications. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Dean and S. Ghemawat. 2008. Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Elizalde, G. Friedland, H. Lei, and A. Divakaran. 2012. There is no data like less data: Percepts for video concept detection on consumer-produced media. In Proceedings of the 1st ACM Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Ferraro, P. Hanna, L. Imbert, and T. Izard. 2009. Accelerating query-by-humming on gpu. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 279--284.Google ScholarGoogle Scholar
  27. G. Friedland, C. Yeo, and H. Hung. 2010. Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem. ACM Trans. Multimedia Comput. Commun. Appl. 6, 27:1--27:18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Gonina, G. Friedland, H. Cook, and K. Keutzer. 2011. Fast speaker diarization using a high-level scripting language. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 553--558.Google ScholarGoogle Scholar
  29. E. Gonina, A. Kannan, J. Shafer, and M. Budiu. 2011. Parallelizing large-scale data processing applications with data skew: A case study in product-offer matching. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications (MapReduce'11). ACM, New York, 35--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Goodale, G. Allen, G. Lanfermann, J. Mass, E. Seidel, and J. Shalf. The cactus framework and toolkit: Design and applications. In Proceedings of the 5th International Conference on High Performance Computing for Computational Science (VECPAR'02). Springer, 26--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. W. Gregory. 2000. Programmers tool chest: The OpenCV library. Dr. Dobbs Journal.Google ScholarGoogle Scholar
  32. E. Grinspun, P. Krysl, and P. Schröder. 2002. Charms: A simple framework for adaptive simulation. ACM Trans. Graphics 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. HMM Toolkit web page.Google ScholarGoogle Scholar
  34. P. Hudak and M. Jones. 1994. Haskell vs. ada vs. c++ vs. awk vs. … an experiment in software prototyping productivity. Research Report YALEU/DCS/RR-1049, Department of Computer Science, Yale University, New Haven, CT. Oct.Google ScholarGoogle Scholar
  35. D. Imseng and G. Friedland. 2009. Robust speaker diarization for short speech recordings. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 432--437.Google ScholarGoogle Scholar
  36. Intel. Cilk 5.4.6 Reference Manual. Intel. Version 5.4.6.Google ScholarGoogle Scholar
  37. S. Kamil, D. Coetzee, and A. Fox. 2011. Bringing parallel performance to python with domain-specific selective embedded just-in-time specialization. In Proceedings of the Python for Scientific Computing Conference.Google ScholarGoogle Scholar
  38. K. Keutzer and T. G. Mattson. 2010. A design pattern language for engineering (parallel) software. Intel Tech. J. 4.Google ScholarGoogle Scholar
  39. Khronos Group 2010. OpenCL 1.1 Specification. Khronos Group. Version 1.1.Google ScholarGoogle Scholar
  40. A. Kosner. 2012. Youtube turns seven today, now uploads 72 hours of video per minute. Forbes.Google ScholarGoogle Scholar
  41. Z. Liu, Y. Zhang, E. Y. Chang, and M. Sun. 2011. Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3, 26:1--26:18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. Lu and A. Hanjalic. 2008. Audio keywords discovery for text-like audio content analysis and retrieval. IEEE Trans. Multimedia 10, 1, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. R. Michael Casey and M. Slaney. 2008. Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio Speech Lang. Process 16, 10151028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Mueller. 1995. Pthreads library interface. Florida State University.Google ScholarGoogle Scholar
  45. NVIDIA Corporation 2010. NVIDIA CUDA Programming Guide. NVIDIA Corporation. Version 3.2.Google ScholarGoogle Scholar
  46. OpenMP 2008. OpenMP Application Programming Interface. OpenMP. Version 3.0.Google ScholarGoogle Scholar
  47. A. D. Pangborn. 2010. Scalable data clustering using gpus. M.S. thesis, Rochester Institute of Technology.Google ScholarGoogle Scholar
  48. L. Prechelt. 2000. An empirical comparison of seven programming languages. Computer 33, 10, 23--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. L. Ramakrishnan, P. T. Zbiegel, et al. 2011. Magellan: experiences from a science cloud. In Proceedings of the 2nd International Workshop on Scientific Cloud Computing (ScienceCloud'11). ACM, New York, 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Reynolds and P. Torres-Carrasquillo. 2005. Approaches and applications of audio diarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05). Vol. 5. v/953--v/956.Google ScholarGoogle Scholar
  51. M. Slaney. 2010. Processing web-scale multimedia data. In Proceedings of the International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. N. Sundaram, T. Brox, and K. Keutzer. 2010. Dense point trajectories by gpu-accelerated large displacement optical flow. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). Springer, 438--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Takács, I. Pilászy, B. Németh, and D. Tikk. 2009. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 10, 623--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Tzanetakis, Marsyas submissions to MIREX 2007. In Proceedings of the 8th International Conference on Music Information Retrieval.Google ScholarGoogle Scholar
  55. R. Vuduc, J. W. Demmel, and K. A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. J. Phys. Conf. Ser. 16, 1, 521.Google ScholarGoogle ScholarCross RefCross Ref
  56. R. C. Whaley and A. Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice Experi. 35, 2, 101--121. http://www.cs.utsa.edu/∼whaley/papers/spercw04.ps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. T. White. 2009. Hadoop: The Definitive Guide Ist Ed. O'Reilly. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. C. Wooters and M. Huijbregts. 2007. The ICSI RT07s Speaker Diarization System. In Proceedings of the 2nd International Workshop on Classification of Events, Activities, and Relationships (CLEAR'07) and the 5th Rich Transcription Meeting Recognition (RT'07). 509--519.Google ScholarGoogle Scholar
  59. R. Xia, T. Elmas, S. A. Kamil, A. Fox, and K. Sen. 2012. Multi-level debugging for multi-stage, parallelizing compilers. Tech. rep. UCB/EECS-2012-227, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  60. K. You, J. Chong, Y. Yi, E. Gonina, C. Hughes, Y. Chen, W. Sung, and K. Keutzer. 2009. Parallel scalability in speech recognition: Inference engine in large vocabulary continuous speech recognition. IEEE Signal Process Mag. 6, 124--135.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Scalable multimedia content analysis on parallel platforms using python

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 2
          February 2014
          142 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/2579228
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 February 2014
          • Accepted: 1 August 2013
          • Revised: 1 June 2013
          • Received: 1 January 2013
          Published in tomm Volume 10, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!