skip to main content
article

Exploring many-to-one speech-to-text correlation for web-based language learning

Published:01 August 2007Publication History
Skip Abstract Section

Abstract

This article investigates the correlations between multimedia objects (particularly speech and text) involved in language lectures in order to design an effective presentation mechanism for web-based learning. The cross-media correlations are classified into implicit relations (retrieved by computing) and explicit relations (recorded during the preprocessing stage). The implicit temporal correlation between speech and text is primarily to help to negotiate supplementary lecture navigations like tele-pointer movement, lips-sync movement, and content scrolling. We propose a speech-text alignment framework, using an iterative algorithm based on local alignment, to probe many-to-one temporal correlations, and not the one-to-one only. The proposed framework is a more practical method for analyzing general language lectures, and the algorithm's time complexity conforms to the best-possible computation cost, O(nm), without introducing additional computation. In addition, we have shown the feasibility of creating vivid presentations by exploiting implicit relations and artificially simulating some explicit media. To facilitate the navigation of integrated multimedia documents, we develop several visualization techniques for describing media correlations, including guidelines for speech-text correlations, visible-automatic scrolling, and levels of detail of timeline, to provide intuitive and easy-to-use random access mechanisms. We evaluated the performance of the analysis method and human perceptions of the synchronized presentation. The overall performance of the analysis method is that about 99.5% of the words analyzed are of a temporal error within 0.5 sec and the subjective evaluation result shows that the synchronized presentation is highly acceptable to human beings.

References

  1. Abowd, G. D. 1999. Classroom 2000: An experiment with the instrumentation of a living educational environment. IBM Syst. J. 38, 4, 508--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving video speech with audio. In Proceedings of the 24h Annual ACM-SIGGRAPH Conference on Computer Graphics and Interactive Techniques. 353--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, H. Y., Chen, G. Y., and Hong, J. S. 1999. Design of a web-based synchronized multimedia lecture system for distance education. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, vol. 2 (Jun. 7--11). 887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cheng, W. H., Chu, W. T., and Wu, J. L. 2003. Semantic context detection based on hierarchical audio models. In Proceedings of the 5th ACM-SIGMM International Workshop on Multimedia Information Retrieval. 109--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chu, W. T. 2001. Exploring computed synchronization and its application for navigated hypermedia documents. Masters thesis.Google ScholarGoogle Scholar
  6. Chu, W. T. and Chen, H. Y. 2004. Toward better retrieval and presentation by exploring cross-media correlations. Multimedia Syst. 10, 3 (Mar.), 183--198.Google ScholarGoogle Scholar
  7. Chu, W. T. and Chen, H. Y. 2002. Cross-Media correlations: A case study of navigated hypermedia documents. In Proceedings of the 10th ACM International Conference on Multimedia (Juan-les-Pins, France). 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Digital Signal Processing Committee. 1979. Programs for Digital Signal Processing. IEEE Press, Piscataway, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dohi, H. and Ishizuka, M. 1997. Visual software agent: A realistic face-to-face style interface connected with www/Netscape. In Proceedings of the IJCAI Workshop on Intelligent Multimodal Systems. 17--22.Google ScholarGoogle Scholar
  10. Gadd, T. N. 1988. ‘Fisching fore weds’: Phonetic retrieval of written text in information system. Program 22, 3 (Jul.), 222--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hall, P. and Dowling, G. 1980. Approximate string matching. ACM Comput. Surv. 12, 4, 381--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Huang, X., Alleva, F., Hon, H. W., Hwang, M. Y., and Rosenfeld, R. 1993. The SPHINX II speech recognition system: An overview. Comput. Speech Lang. 2, 7, 137--148.Google ScholarGoogle ScholarCross RefCross Ref
  14. Lopresti, D. and Wilfong, G. 1999. Cross-Domain approximate string matching. In Proceedings of the 6th International Symposium on String Processing and Information Retrieval. IEEE Computer Society Press, Los Alamitos, CA. 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Moreno, P. J., Joerg, C., Van Thong, J. M., and Glickman, O. 1998. A recursive algorithm for the forced alignment of very long audio segments. In International Conference on Spoken Language Processing (ICSLP) (Sydney, Australia).Google ScholarGoogle Scholar
  16. Muller, R. and Ottmann, T. 2000. The authoring on the fly system for automated recording and reply of (tele)presentations. ACM Multimedia Syst. J. 8, 3, 158--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nitta, N. and Babaguchi, N. 2002. Automatic story segmentation of closed-caption text for semantic content analysis of broadcasted sports video. In Proceedings of the 8th International Workshop on Multimedia Information Systems (MIS). 110--116.Google ScholarGoogle Scholar
  18. Okimi, K. and Fukinuki, H. 1981. Master-Slave synchronization techniques. IEEE Commun. Mag. 19, 12--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Owen, C. B. 1998. Computed Synchronization for Multimedia Applications. Kluwer Academic, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Philip, L. 1990. Hanging on the metaphone. Comput. Lang. Mag. 7, 12, 38--43.Google ScholarGoogle Scholar
  21. Pratt, W. K. 1978. Digital Image Processing. Wiley, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Richter, H. A., Brotherton, J. A., Abowd, G. D., and Khai, N. T. 1999. A multi-scale timeline slider for stream visualization and control. Tech. Rep. GIT-GVU-99-30, Georgia Institute of Technology.Google ScholarGoogle Scholar
  23. Steinmetz, R. 1996. Human perception of jitter and media synchronization. IEEE J. Selected Areas Commun. 14, 1, 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. VOA. 2005. Voice of America. http://www.voanews.com/.Google ScholarGoogle Scholar
  25. Wagner, R. A. and Fischer, M. J. 1974. The string-to-string correction problem. J. ACM 21, 1, 168--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Waterman, M. S. and Eggert, M. 1987. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J. Molecular Biol. 197, 723--728.Google ScholarGoogle ScholarCross RefCross Ref
  27. Waters, K. and Levergood, T. M. 1993. DECface: An automatic lip-synchronization algorithm for synthetic faces. In Proceedings of the 2nd ACM International Conference on Multimedia. 149--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Weide, R. 1998. The CMU pronunciation dictionary, release 0.6. Carnegie Mellon University, http://www.speech.cs.cmu.edu/cgi-bin/cmudict.Google ScholarGoogle Scholar
  29. WSML. 1997. NCNU multimedia English classroom. http://english.csie.ncnu.edu.tw.Google ScholarGoogle Scholar
  30. Wu, H. L. 2002. A synchronization framework for navigated hypermedia document presentation. Masters thesis, Taiwan University.Google ScholarGoogle Scholar

Index Terms

  1. Exploring many-to-one speech-to-text correlation for web-based language learning

        Recommendations

        Reviews

        Fjodor J. Ruzic

        The wealth of information on the Web affords language teachers and learners access to resources like never before. The authors of this paper explore the field of research that deals with Web-based language learning. For this reason, the authors begin with the presumption that incorporating multimedia into Web documents is a direct and efficient means of conveying knowledge. This paper is a continuing effort, and builds on results from Chen’s prior work [1]. In the introduction, the authors state that advances in multimedia and Web technologies make online language learning easier and more efficient. Since text and speech integration is an important factor for success in language learning, the authors are looking for correlations between speech and text in order to design more effective learning tools. The authors examine cross-media correlations as a prerequisite for exploring cross-domain local similarities between speech and text. They propose a heuristic-based local alignment algorithm as a novel method for exploring local and cross-media similarities. These findings are accompanied by an explanation of the proposed heuristic as the tool for exploring many-to-one correlations with the transcription of the prepared data into the phonetic domain. The results from these activities are explained in the section on cooperative cross-media synchronization; the elementary semantic structures are incorporated. The overall value of this work is presented in the section on experimental results, which show the reader in a concise way the objective evaluation of an iterative algorithm, and the evaluation of the integrated cross-media presentation model. Since this paper deals with the synchronization problem that inhibits Web-based language learning processes, the findings will help teachers and learners with new tool designs. This will serve to provide a fine interactive access environment with full control functions for online learners of foreign languages. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!