Abstract
Enjoyment of audio has now become about flexibility and personal freedom. Digital audio content can be acquired from many sources and wireless networking allows digital media devices and associated peripherals to be unencumbered by wires. However, despite recent improvements in capacity and quality of service, wireless networks are inherently unreliable communications channels for the streaming of audio, being susceptible to the effects of range, interference, and occlusion. This time-varying reliability of wireless audio transfer introduces data corruption and loss, with unpleasant audible effects that can be profound and prolonged in duration. Traditional communications techniques for error mitigation perform poorly and in a bandwidth inefficient manner in the presence of such large-scale defects in a digital audio stream. A novel solution that can complement existing techniques takes account of the semantics and natural repetition of music. Through the use of self-similarity metadata, missing or damaged audio segments can be seamlessly replaced with similar undamaged segments that have already been successfully received. We propose a technology to generate relevant self-similarity metadata for arbitrary audio material and to utilize this metadata within a wireless audio receiver to provide sophisticated and real-time correction of large-scale errors. The primary objectives are to match the current section of a song being received with previous sections while identifying incomplete sections and determining replacements based on previously received portions of the song. This article outlines our approach to Forward Error Correction (FEC) technology that is used to “repair” a bursty dropout when listening to time-dependent media on a wireless network. Using self-similarity analysis on a music file, we can “automatically” repair the dropout with a similar portion of the music already received thereby minimizing a listener's discomfort.
- Bartsch, M. A. and Wakefield, G. H. 2001. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 15--18.Google Scholar
- Berenzweig, A., Logan, B., Ellis, D., and Whitman, B. 2004. A large-scale evaluation of acoustic and subjective music-similarity measures. Comput. Music J. 28, 2, 63-76. Google Scholar
Digital Library
- Bradley, P. and Fayyad, U. 1998. Refining initial points for k-means clustering. In Proceedings of the 15th International Conference on Machine Learning. Vol. 727. Morgan Kaufmann, 91--99. Google Scholar
Digital Library
- Casey, M. 2002. General sound classification and similarity in mpeg-7. Organised Sound 6, 2, 153--164. Google Scholar
Digital Library
- Chai, W. and Vercoe, B. 2003. Structural analysis of musical signals for indexing and thumbnailing. In Proceedings of the Joint Conference on Digital Libraries. 27--34. Google Scholar
Digital Library
- Chiariglione, L. 2010. Description of mpeg-7 audio low level descriptors. http://mpeg.chiariglione.org/technologies/mpeg-7/mp07-aud(ll)/index.htmGoogle Scholar
- Chinrungruen, C. and Sequin, C. 1995. Optimal adaptive k-means algorithm with dynamic adjustment of learning rate. IEEE Trans. Neural Netw. 6, 1, 157--169. Google Scholar
Digital Library
- Crysandt, H. 2004. Music identification with mpeg-7. Proc. SPIE 5307, 117--124.Google Scholar
- Dannenberg, R. B. and Hu, N. 2003. Pattern discovery techniques for music audio. J. New Music Res. 32, 2, 153--163.Google Scholar
- Downie, J. S. 2004. The scientific evaluation of music information retrieval systems: Foundations and future. Comput. Music J. 28, 2, 12--23. Google Scholar
Digital Library
- Doraisamy, S. and Ruger, S. 2004. A polyphonic music retrieval system using n-grams. In Proceedings of the International Conference on Music Information Retrieval. 204--209.Google Scholar
- Essid, S., Richard, G., and David, B. 2004. Efficient musical instrument recognition on solo performance music using basic features. In Proceedings of the 25th International Conference of the Audio Engineering Society.Google Scholar
- Foote, J. T. and Cooper, M. L. 2003. Media segmentation using self-similarity decomposition. Proc. SPIE 5021, 167--175.Google Scholar
- Gomez, E., Klapuri, A., and Meudic, B. 2003. Melody description and extraction in the context of music content processing. J. New Music Res. 32, 1, 23--40.Google Scholar
Cross Ref
- Ices2. 2008. Ices2. http://www.icecast.org/ices.php.Google Scholar
- Icecast. 2008. Icecast. http://www.icecast.org.Google Scholar
- Jackendoff, R. 1987. Consciousness and the Computational Mind. MIT Press Cambridge, MA.Google Scholar
- Jackson, I. 2008. Song forms and terms: A quick study. http://www.irenejackson.com/form.html.Google Scholar
- Kamata, M. and Furukawa, K. 2007. Three types of viewers' favorite music videos. In Proceedings of the International Conference on Advances in Computer Entertainment Technology. 196--199. Google Scholar
Digital Library
- Kim, H, G., Moreau, N., and Sikora, T. 2004. Audio classification based on mpeg-7 spectral basis representations. IEEE Trans. Circ. Syst. Video Technol. 14, 5, 716--725. Google Scholar
Digital Library
- Kriegel, H. P., Kunath, P., Pfeifie, M., and Renz, M. 2005. Distributed high-dimensional data. In Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD'05). Springer. Google Scholar
Digital Library
- Leman, M., Clarisse, L., De Baets, B., De Meyer, H., Lesaffre, M., Martens, G., Martens, J. P., and Van Steelant, D. 2002. Tendencies, perspectives, and opportunities of musical audio-mining. Forum Acusticum Sevilla 33, 3--4, 1--6.Google Scholar
- Lukasiak, J., Stirling, D., Harders, N., and Perrow, S. 2003. Performance of mpeg-7 low level audio descriptors with compressed data. In Proceedings of the International Conference on Multimedia and Expo (ICME'03). Vol. 3. 273--276. Google Scholar
Digital Library
- Martinez, J. M., Koenen, R., and Pereira, F. 2002. MPEG-7: The generic multimedia content description standard, part 1. IEEE Multimedia 9, 2, 78--87. Google Scholar
Digital Library
- Meredith, D., Wiggins, G. A., and Lemstrom, K. 2001. Pattern induction and matching in polyphonic music and other multidimensional datasets. In Proceedings of the 5th World Multiconference on Systemics, Cybernetics and Informatics (SCI'01). 22--25.Google Scholar
- MPEG-7. 2008. MPEG 7 library: A complete api to manipulate mpeg 7 documents. Joanneum Research. http://iiss039.joanneum.at/cms/index.php?id=84.Google Scholar
- Nafaa, A., Taleb, T., and Murphy, L. 2008. Forward error correction strategies for media streaming over wireless networks. IEEE Comm. Mag. 46, 1, 72--79. Google Scholar
Digital Library
- Olson, H. F. 1967. Music, Physics and Engineering. Dover Publications.Google Scholar
- Pan, D., Inc, M., and Schaumburg, I. L. 1995. A tutorial on mpeg/audio compression. IEEE Multimedia 2, 2, 60--74. Google Scholar
Digital Library
- Peeters, G., La Burthe, A., and Rodet, X. 2002. Toward automatic music audio summary generation from signal analysis. In Proceedings of the International Conference on Music Information Retrieval (ISMIR'02). 98--106Google Scholar
- Perkins, C., Hodson, O., and Hardman, V. 1998. A survey of packet loss recovery techniques for streaming audio. IEEE Netw. 12, 5, 40--48. Google Scholar
Digital Library
- Schubert, E., Wolfe, J., and Tarnopolsky, A. 2004. Spectral centroid and timbre in complex, multiple instrumental textures. In Proceedings of the 8th International Conference on Music Perception and Cognition. Society for Music Perception and Cognition.Google Scholar
- Seo, J. S., Jin, M., Lee, S., Jang, D., Lee, S., and Yoo, C. D. 2005. Audio fingerprinting based on normalized spectral subband centroids. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 3. 213--216.Google Scholar
- Steinbach, M., Karypis, G., and Kumar, V. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining. 34--42.Google Scholar
- Tao, D., Liu, H., and Tang, X. 2004. K-box: A query-by-singing based music retrieval system. In Proceedings of the 12th Annual ACM International Conference on Multimedia. ACM Press, New York, 464--467. Google Scholar
Digital Library
- Vorbis. 2008. Ogg vorbis. http://www.vorbis.com.Google Scholar
- Yin, J., Wang, X., and Agrawal, D. P. Impact of bursty error rates on the performance of wireless local area network (wlan). Ad Hoc Netw. 4, 5, 651--668.Google Scholar
- Zha, H., He, X., Ding, C., Gu, M., and Simon, H. 2002. Spectral relaxation for k-means clustering. Adv. Neural Inf. Process. Syst. 2, 1057--1064.Google Scholar
Index Terms
A self-similarity approach to repairing large dropouts of streamed music
Recommendations
Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks
Special Section on Visual Understanding with RGB-D SensorsStreaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks; therefore, new, robust, and highly efficient coding algorithms are necessary. One method overlooked to date, which can work ...
Acknowledging Practice: The Applications of Streaming Audio and Video for Tertiary Music and Dance Education
ICALT '09: Proceedings of the 2009 Ninth IEEE International Conference on Advanced Learning TechnologiesThis paper argues that contrary to some other ICT application, streaming audio and video has become successful in tertiary music and dance education owing that it allows to acknowledge and build forward from the core of arts education – physical ...
Music similarity: improvements of edit-based algorithms by considering music theory
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrievalEstimating the symbolic music similarity is one of the major open problems in the music information retrieval research domain. Existing systems consider sequences of notes characterized by pitches and durations. Similarity estimation is mainly based on ...






Comments