Abstract
In this new era dominated by consumer-produced media there is a high demand for web-scalable solutions to multimedia content analysis. A compelling approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge due to the increased code complexity, limited portability and required low-level knowledge of the underlying hardware. In this article, we present PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code to a variety of parallel platforms. PyCASP is designed using a systematic, pattern-oriented approach to offer a single software development environment for multimedia content analysis applications. Using PyCASP, applications can be prototyped in a couple hundred lines of Python code and automatically scale to modern parallel processors. Applications written with PyCASP are portable to a variety of parallel platforms and efficiently scale from a single desktop Graphics Processing Unit (GPU) to an entire cluster with a small change to application code. To illustrate our approach, we present three multimedia content analysis applications that use our framework: a state-of-the-art speaker diarization application, a content-based music recommendation system based on the Million Song Dataset, and a video event detection system for consumer-produced videos. We show that across this wide range of applications, our approach achieves the goal of automatic portability and scalability while at the same time allowing easy prototyping in a high-level language and efficient performance of low-level optimized code.
- X. Amatriain, M. D. Boer, and E. Robledo. 2002. Clam: An OO framework for developing audio and music applications. In Proceedings of the 17th Annual Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'02). Google Scholar
Digital Library
- A. Andoni 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, 459--468. Google Scholar
Digital Library
- X. Anguera, S. Bozonnet, N. W. D. Evans, C. Fredouille, G. Friedland, and O. Vinyals. 2012. Speaker diarization: A review of recent research. IEEE Trans. Acoust. Speech Signal Process. 20, 356--370. Google Scholar
Digital Library
- K. Asanovic, R. Bodik, et al. 2006. The landscape of parallel computing research: A view from Berkeley. Tech. rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.Google Scholar
- D. Ascher, P. F. Dubois, K. Hinsen, J. Hugunin, and T. Oliphant. 1999. Numerical Python UCRL-MA-128569. Lawrence Livermore National Laboratory, Livermore, CA.Google Scholar
- E. Battenberg and D. Wessel. 2009. Accelerating non-negative matrix factorization for audio source separation on multi-core and many-core architectures. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 501--506.Google Scholar
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference.Google Scholar
- T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million song dataset. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR'11).Google Scholar
- C. M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. Google Scholar
Digital Library
- L. S. Blackford, J. Demmel, et al. 2001. An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw. 28, 135--151. Google Scholar
Digital Library
- J. Carletta. 2007. Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Language Resources Eval. 41, 2, 181--190.Google Scholar
Cross Ref
- B. Catanzaro, M. Garland, and K. Keutzer. 2010. Copperhead: Compiling an embedded data parallel language. Tech. rep. UCB/EECS-2010-124, EECS Department, University of California, Berkeley.Google Scholar
- B. Catanzaro, S. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. 2009a. SEJITS: Getting productivity and performance with selective embedded JIT specialization. In Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA'09).Google Scholar
- B. Catanzaro, B.-Y. Su, N. Sundaram, Y. Lee, M. Murphy, and K. Keutzer. 2009b. Efficient, high-quality image contour detection. In Proceedings of the IEEE 12th International Conference on Computer Vision. 2381--2388.Google Scholar
- B. Catanzaro, N. Sundaram, and K. Keutzer. 2008. Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th International Conference on Machine Learning (ICML'08). ACM, New York, 104--111. Google Scholar
Digital Library
- H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, New York, 35--46. Google Scholar
Digital Library
- E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu, and H. Cui. 2009. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, Springer, 213--220.Google Scholar
- C. Charbuillet, D. Tardieu, and G. Peeters. 2011. Gmm supervector for content based music similarity. In Proceedings of the 14th International Conference on Digital Audio Effects.Google Scholar
- S. Chaudhuri, M. Harvilla, and B. Raj. 2011. Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In Proceedings of the 11th Proceedings of the Annual Conference of the International Speech Communication Association.Google Scholar
- J. Chaves, J. Nehrbass, B. Guilfoos, J. Gardiner, S. Ahalt, A. Krishnamurthy, J. Unpingco, A. Chalker, A. Warnock, and S. Samsi. 2006. Octave and Python: High-level scripting languages productivity and performance evaluation. In Proceedings of the HPCMP Users Group Conference. 429--434. Google Scholar
Digital Library
- J. Chong, G. Friedland, A. Janin, N. Morgan, and C. Oei. 2010. Opportunities and challenges of parallelizing speech recognition. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar'10). USENIX Association, Berkeley, CA, 2--2. Google Scholar
Digital Library
- J. Chong, E. Gonina, Y. Yi, and K. Keutzer. 2009. A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. In Proceedings of the 10th Annual Conference of the International Speech Communication Association.Google Scholar
- H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, and A. Fox. 2011. Cuda-level performance with python-level productivity for Gaussian mixture model applications. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. 2008. Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107--113. Google Scholar
Digital Library
- B. Elizalde, G. Friedland, H. Lei, and A. Divakaran. 2012. There is no data like less data: Percepts for video concept detection on consumer-produced media. In Proceedings of the 1st ACM Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis. Google Scholar
Digital Library
- P. Ferraro, P. Hanna, L. Imbert, and T. Izard. 2009. Accelerating query-by-humming on gpu. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 279--284.Google Scholar
- G. Friedland, C. Yeo, and H. Hung. 2010. Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem. ACM Trans. Multimedia Comput. Commun. Appl. 6, 27:1--27:18. Google Scholar
Digital Library
- E. Gonina, G. Friedland, H. Cook, and K. Keutzer. 2011. Fast speaker diarization using a high-level scripting language. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 553--558.Google Scholar
- E. Gonina, A. Kannan, J. Shafer, and M. Budiu. 2011. Parallelizing large-scale data processing applications with data skew: A case study in product-offer matching. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications (MapReduce'11). ACM, New York, 35--42. Google Scholar
Digital Library
- T. Goodale, G. Allen, G. Lanfermann, J. Mass, E. Seidel, and J. Shalf. The cactus framework and toolkit: Design and applications. In Proceedings of the 5th International Conference on High Performance Computing for Computational Science (VECPAR'02). Springer, 26--28. Google Scholar
Digital Library
- V. W. Gregory. 2000. Programmers tool chest: The OpenCV library. Dr. Dobbs Journal.Google Scholar
- E. Grinspun, P. Krysl, and P. Schröder. 2002. Charms: A simple framework for adaptive simulation. ACM Trans. Graphics 281--290. Google Scholar
Digital Library
- HMM Toolkit web page.Google Scholar
- P. Hudak and M. Jones. 1994. Haskell vs. ada vs. c++ vs. awk vs. … an experiment in software prototyping productivity. Research Report YALEU/DCS/RR-1049, Department of Computer Science, Yale University, New Haven, CT. Oct.Google Scholar
- D. Imseng and G. Friedland. 2009. Robust speaker diarization for short speech recordings. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 432--437.Google Scholar
- Intel. Cilk 5.4.6 Reference Manual. Intel. Version 5.4.6.Google Scholar
- S. Kamil, D. Coetzee, and A. Fox. 2011. Bringing parallel performance to python with domain-specific selective embedded just-in-time specialization. In Proceedings of the Python for Scientific Computing Conference.Google Scholar
- K. Keutzer and T. G. Mattson. 2010. A design pattern language for engineering (parallel) software. Intel Tech. J. 4.Google Scholar
- Khronos Group 2010. OpenCL 1.1 Specification. Khronos Group. Version 1.1.Google Scholar
- A. Kosner. 2012. Youtube turns seven today, now uploads 72 hours of video per minute. Forbes.Google Scholar
- Z. Liu, Y. Zhang, E. Y. Chang, and M. Sun. 2011. Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3, 26:1--26:18. Google Scholar
Digital Library
- L. Lu and A. Hanjalic. 2008. Audio keywords discovery for text-like audio content analysis and retrieval. IEEE Trans. Multimedia 10, 1, 74--85. Google Scholar
Digital Library
- C. R. Michael Casey and M. Slaney. 2008. Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio Speech Lang. Process 16, 10151028. Google Scholar
Digital Library
- F. Mueller. 1995. Pthreads library interface. Florida State University.Google Scholar
- NVIDIA Corporation 2010. NVIDIA CUDA Programming Guide. NVIDIA Corporation. Version 3.2.Google Scholar
- OpenMP 2008. OpenMP Application Programming Interface. OpenMP. Version 3.0.Google Scholar
- A. D. Pangborn. 2010. Scalable data clustering using gpus. M.S. thesis, Rochester Institute of Technology.Google Scholar
- L. Prechelt. 2000. An empirical comparison of seven programming languages. Computer 33, 10, 23--29. Google Scholar
Digital Library
- L. Ramakrishnan, P. T. Zbiegel, et al. 2011. Magellan: experiences from a science cloud. In Proceedings of the 2nd International Workshop on Scientific Cloud Computing (ScienceCloud'11). ACM, New York, 49--58. Google Scholar
Digital Library
- D. Reynolds and P. Torres-Carrasquillo. 2005. Approaches and applications of audio diarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05). Vol. 5. v/953--v/956.Google Scholar
- M. Slaney. 2010. Processing web-scale multimedia data. In Proceedings of the International Conference on Multimedia. Google Scholar
Digital Library
- N. Sundaram, T. Brox, and K. Keutzer. 2010. Dense point trajectories by gpu-accelerated large displacement optical flow. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). Springer, 438--451. Google Scholar
Digital Library
- G. Takács, I. Pilászy, B. Németh, and D. Tikk. 2009. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 10, 623--656. Google Scholar
Digital Library
- G. Tzanetakis, Marsyas submissions to MIREX 2007. In Proceedings of the 8th International Conference on Music Information Retrieval.Google Scholar
- R. Vuduc, J. W. Demmel, and K. A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. J. Phys. Conf. Ser. 16, 1, 521.Google Scholar
Cross Ref
- R. C. Whaley and A. Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice Experi. 35, 2, 101--121. http://www.cs.utsa.edu/∼whaley/papers/spercw04.ps. Google Scholar
Digital Library
- T. White. 2009. Hadoop: The Definitive Guide Ist Ed. O'Reilly. Google Scholar
Digital Library
- C. Wooters and M. Huijbregts. 2007. The ICSI RT07s Speaker Diarization System. In Proceedings of the 2nd International Workshop on Classification of Events, Activities, and Relationships (CLEAR'07) and the 5th Rich Transcription Meeting Recognition (RT'07). 509--519.Google Scholar
- R. Xia, T. Elmas, S. A. Kamil, A. Fox, and K. Sen. 2012. Multi-level debugging for multi-stage, parallelizing compilers. Tech. rep. UCB/EECS-2012-227, EECS Department, University of California, Berkeley.Google Scholar
- K. You, J. Chong, Y. Yi, E. Gonina, C. Hughes, Y. Chen, W. Sung, and K. Keutzer. 2009. Parallel scalability in speech recognition: Inference engine in large vocabulary continuous speech recognition. IEEE Signal Process Mag. 6, 124--135.Google Scholar
Cross Ref
Index Terms
Scalable multimedia content analysis on parallel platforms using python
Recommendations
Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information TechnologyThe architecture of the latest Graphic Processing Unit (GPU) consists of a number of uniform programmable units integrated on the same chip, which facilitate the general-purpose computing beyond the graphic processing. With the multiple programmable ...
On the Programmability and Performance of Heterogeneous Platforms
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed SystemsGeneral-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC SystemsThis paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...






Comments