Abstract
We present a method for foreground/background separation of audio using a background modelling technique. The technique models the background in an online, unsupervised, and adaptive fashion, and is designed for application to long term surveillance and monitoring problems. The background is determined using a statistical method to model the states of the audio over time. In addition, three methods are used to increase the accuracy of background modelling in complex audio environments. Such environments can cause the failure of the statistical model to accurately capture the background states. An entropy-based approach is used to unify background representations fragmented over multiple states of the statistical model. The approach successfully unifies such background states, resulting in a more robust background model. We adaptively adjust the number of states considered background according to background complexity, resulting in the more accurate classification of background models. Finally, we use an auxiliary model cache to retain potential background states in the system. This prevents the deletion of such states due to a rapid influx of observed states that can occur for highly dynamic sections of the audio signal. The separation algorithm was successfully applied to a number of audio environments representing monitoring applications.
- Azlan, M., Cartwright, I., Jones, N., Quirk, T., and West, G. 2005. Multimodal monitoring of the aged in their own homes. In Proceedings of the ICOST'2005: 3rd. International Conference on Smart Homes and Health Telematics (July) Magog, Canada.Google Scholar
- Chen, J., Kam, A. H., Zhang, J., Liu, N., and Shue, L. 2005a. Bathroom activity monitoring based on sound. In Pervasive Computing. Munich, Germany, 47--61. Google Scholar
Digital Library
- Chen, J., Zhang, J., Kam, A., and Shue, L. 2005b. An automatic acoustic bathroom monitoring system. In IEEE International Symposium on Circuits and Systems (ISCAS 05). vol. 2, 1750--1753.Google Scholar
- Clarkson, B., Sawhney, N., and Pentland, A. 1998. Auditory context awareness in wearable computing. In Workshop on Perceptual User Interfaces. San Francisco, U.S.A., 47--61.Google Scholar
- Clavel, C., Ehrette, T., and Richard, G. 2005. Events detection for an audio-based surveillance system. In IEEE International Conference on Multimedia and Expo (ICME 2005). Amsterdam, Netherlands.Google Scholar
- Cover, T. and Thomas, J. 1991. Elements of Information Theory. John Wiley and Sons. Google Scholar
Digital Library
- Cowling, M. and Sitte, R. 2003. Comparison of techniques for environmental sound recognition. Pattern Recognition Letters 24, 15, 2895--2907. Google Scholar
Digital Library
- Cristani, M., Bicego, M., and Murino, V. 2004. Online adaptive background modelling for audio surveillance. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 04). vol. 2, 399--402. Google Scholar
Cross Ref
- Daubechies, I. 1992. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania. Google Scholar
Digital Library
- Deller, J. R., Proakis, J. G., and Hansen, J. H. 1993. Discrete-Time Processing of Speech Signals. Maxwell Macmillan International. Google Scholar
Digital Library
- Elgammal, A., Duraiswami, R., Harwood, D., and Davis, L. S. 2000. Non-parametric model for background subtraction. In Proceedings of the 6th European Conference on Computer Vision-Part II. Springer-Verlag, Dublin, Ireland, 751--767. Google Scholar
Digital Library
- Ellis, D. P. W. 2001. Detecting alarm sounds. In Consistent and Reliable Acoustic Cues for Sound Analysis. Aalborg, Denmark.Google Scholar
- Foote, J. T. and Cooper, M. L. 2003. Media segmentation using self-similarity decomposition. In SPIE Storage and Retrieval for Multimedia Databases. vol. 5021. 167--175.Google Scholar
- Gaunard, P., Mubikangiey, C., Couvreur, C., and Fontaine, V. 1998. Automatic classification of environmental noise events by hidden markov models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98). vol. 6, 3609--3612.Google Scholar
- Härmä, A., McKinney, M., and Skowronek, J. 2005. Automatic surveillance of the acoustic activity in our living environment. In IEEE International Conference on Multimedia and Expo (ICME 2005). Amsterdam, Netherlands.Google Scholar
- Kim, K., Chalidabhongse, T. H., Harwood, D., and Davis, L. 2004. Background modeling and subtraction by codebook construction. In IEEE International Conference on Image Processing (ICIP). Singapore.Google Scholar
- Lee, L. 1999. Measures of distributional similarity. In 37th Annual Meeting of the Association for Computational Linguistics. 25--32. Google Scholar
Digital Library
- Moncrieff, S., Venkatesh, S., and West, G. 2005. Persistent audio modelling for background determination. In IEEE International Conference on Multimedia and Expo (ICME 2005). Amsterdam, Netherlands.Google Scholar
- Moncrieff, S,. Venkatesh, S., and West, G. 2006. Unifying background models over complex audio using entropy. In International Conference on Pattern Recognition (ICPR 2006). Hong Kong, China. Google Scholar
Digital Library
- Radhakrishnan, R., Divakaran, A., and Xiong, Z. 2004. A time series clustering based framework for multimedia mining and summarization using audio features. In Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR '04). ACM Press, 157--164. Google Scholar
Digital Library
- Stager, M., Lukowicz, P., Perera, N., von Buren, T., Troster, G., and Starner, T. 2003. Soundbutton: Design of a low power wearable audio classification system. In Proceedings of the Seventh IEEE International Symposium on Wearable Computers (2003). 12--17. Google Scholar
Digital Library
- Stauffer, C. and Grimson, W. 1999. Adaptive background mixture models for real-time tracking. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1999). vol. 2. Fort Collins, CO USA, 246--252.Google Scholar
- Vacher, M., Istrate, D., Besacier, L., Serignat, J. F., and Castelli, E. 2004. Sound detection and classification for medical telesurvey. In 2nd Conference on Biomedical Engineering. ACTA Press, Ed. Innsbruck, Austria, 395--398.Google Scholar
- Witten, I. H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann. Google Scholar
Digital Library
- Wren, C., Azarbayejani, A., Darrel, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. PAMI 19, 7, 780--785. Google Scholar
Digital Library
- Zhang, T. and Jay Kuo, C.-C. 1999. Hierarchical classification of audio data for archiving and retrieving. In IEEE International Conference On Acoustics, Speech, and Signal Processing. vol. 6. 3001--3004. Phoenix. Google Scholar
Digital Library
Index Terms
Online audio background determination for complex audio environments
Recommendations
Audio Surveillance: A Systematic Review
Despite surveillance systems becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in ...
Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment
In this paper, we propose a high-performance audio fingerprinting system used in real-world query-by-example applications for acoustic audio-based content identification, especially for use in heterogeneous portable consumer devices or on-line audio ...
Unifying Background Models over Complex Audio using Entropy
ICPR '06: Proceedings of the 18th International Conference on Pattern Recognition - Volume 04In this paper we extend an existing audio background modelling technique, leading to a more robust application to complex audio environments. The determination of background audio is used as an initial stage in the analysis of audio for surveillance and ...






Comments