skip to main content
research-article

Recurrent Neural Hidden Markov Model for High-order Transition

Authors Info & Claims
Published:31 October 2021Publication History
Skip Abstract Section

Abstract

We propose a method to pay attention to high-order relations among latent states to improve the conventional HMMs that focus only on the latest latent state, since they assume Markov property. To address the high-order relations, we apply an RNN to each sequence of latent states, because the RNN can represent the information of an arbitrary-length sequence with their cell: a fixed-size vector. However, the simplest way, which provides all latent sequences explicitly for the RNN, is intractable due to the combinatorial explosion of the search space of latent states.

Thus, we modify the RNN to represent the history of latent states from the beginning of the sequence to the current state with a fixed number of RNN cells whose number is equal to the number of possible states. We conduct experiments on unsupervised POS tagging and synthetic datasets. Experimental results show that the proposed method achieves better performance than previous methods. In addition, the results on the synthetic dataset indicate that the proposed method can capture the high-order relations.

REFERENCES

  1. Allan James and Raghavan Hema. 2002. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 307314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berg-Kirkpatrick Taylor, Bouchard-Côté Alexandre, DeNero John, and Klein Dan. 2010. Painless unsupervised learning with features. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 582590.Google ScholarGoogle Scholar
  3. Blunsom Phil and Cohn Trevor. 2011. A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 865874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carter Simon, Dymetman Marc, and Bouchard Guillaume. 2012. Exact sampling and decoding in high-order hidden Markov models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 11251134.Google ScholarGoogle Scholar
  5. Chelba Ciprian, Mikolov Tomas, Schuster Mike, Ge Qi, Brants Thorsten, Koehn Phillipp, and Robinson Tony. 2014. One billion word benchmark for measuring progress in statistical language modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  6. Chiu Justin and Rush Alexander M.. 2020. Scaling hidden Markov language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 13411349.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chomsky Noam. 1957. Syntactic structures. Lightning Source, Inc. (2015).Google ScholarGoogle Scholar
  8. Christodoulopoulos Christos, Goldwater Sharon, and Steedman Mark. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 575584.Google ScholarGoogle Scholar
  9. Collobert Ronan and Weston Jason. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Curme George Oliver et al. 1935. Parts of speech and accidence. D.C. Health and company, Boston. (1935).Google ScholarGoogle Scholar
  11. Dai Hanjun, Dai Bo, Zhang Yan-Ming, Li Shuang, and Song Le. 2017. Recurrent hidden semi-Markov model. In Proceedings of the 5th International Conference on Learning Representations.Google ScholarGoogle Scholar
  12. Elman Jeffrey L.. 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179211.Google ScholarGoogle ScholarCross RefCross Ref
  13. Eriguchi Akiko, Hashimoto Kazuma, and Tsuruoka Yoshimasa. 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823833.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gal Yarin and Ghahramani Zoubin. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 10191027.Google ScholarGoogle Scholar
  15. Gupta Vikram, Shi Haoyue, Gimpel Kevin, and Sachan Mrinmaya. 2020. Clustering contextualized representations of text for unsupervised syntax induction. arXiv preprint arXiv:2010.12784 (2020).Google ScholarGoogle Scholar
  16. He Junxian, Neubig Graham, and Berg-Kirkpatrick Taylor. 2018. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 12921302.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ho Anh Khoa Ngo and Yvon François. 2020. Neural baselines for word alignment. arXiv preprint arXiv:2009.13116 (2020).Google ScholarGoogle Scholar
  18. Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Johnson Mark. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 296305.Google ScholarGoogle Scholar
  20. Jozefowicz Rafal, Zaremba Wojciech, and Sutskever Ilya. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 23422350.Google ScholarGoogle Scholar
  21. Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  22. Kingma Durk P., Salimans Tim, and Welling Max. 2015. Variational dropout and the local reparameterization trick. In Proceedings of the Advances in Neural Information Processing Systems. 25752583.Google ScholarGoogle Scholar
  23. Kiperwasser Eliyahu and Goldberg Yoav. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 1 (2016), 313327.Google ScholarGoogle Scholar
  24. Lin Chu-Cheng, Ammar Waleed, Dyer Chris, and Levin Lori. 2015. Unsupervised POS induction with word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 13111316.Google ScholarGoogle ScholarCross RefCross Ref
  25. McDonald Ryan, Nivre Joakim, Quirmbach-Brundage Yvonne, Goldberg Yoav, Das Dipanjan, Ganchev Kuzman, Hall Keith, Petrov Slav, Zhang Hao, Oscar T., et al. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 9297.Google ScholarGoogle Scholar
  26. Polguère Alain et al. 2009. Dependency in Linguistic Description. Vol. 111. John Benjamins Publishing.Google ScholarGoogle ScholarCross RefCross Ref
  27. Rosenberg Andrew and Hirschberg Julia. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410420.Google ScholarGoogle Scholar
  28. Shao Xi, Xu Changsheng, and Kankanhalli Mohan S.. 2004. Unsupervised classification of music genre using hidden Markov model. In Proceedings of the IEEE International Conference on Multimedia and Expo. 20232026.Google ScholarGoogle Scholar
  29. Sperber Matthias, Neubig Graham, Niehues Jan, and Waibel Alex. 2017. Neural lattice-to-sequence models for uncertain inputs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 13801389.Google ScholarGoogle ScholarCross RefCross Ref
  30. Stratos Karl. 2019. Mutual information maximization for simple and accurate part-of-speech induction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 10951104.Google ScholarGoogle ScholarCross RefCross Ref
  31. Stratos Karl, Collins Michael, and Hsu Daniel. 2016. Unsupervised part-of-speech tagging with anchor hidden Markov models. Trans. Assoc. Comput. Ling. 4, 1 (2016), 245257.Google ScholarGoogle Scholar
  32. Tran Ke M., Bisk Yonatan, Vaswani Ashish, Marcu Daniel, and Knight Kevin. 2016. Unsupervised neural hidden Markov models. In Proceedings of the Workshop on Structured Prediction for NLP. 6371.Google ScholarGoogle ScholarCross RefCross Ref
  33. Gael Jurgen Van, Vlachos Andreas, and Ghahramani Zoubin. 2009. The infinite HMM for unsupervised PoS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 678687.Google ScholarGoogle Scholar
  34. Viterbi Andrew. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 2 (1967), 260269.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yang Songlin, Jiang Yong, Han Wenjuan, and Tu Kewei. 2020. Second-order unsupervised neural dependency parsing. In Proceedings of the 28th International Conference on Computational Linguistics. 39113924.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yule George and Mathis Terrie. 1992. The role of staging and constructed dialogue in establishing speaker’s topic. Linguistics 30, 1 (1992), 199216.Google ScholarGoogle ScholarCross RefCross Ref
  37. Zhang Yue and Yang Jie. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 15541564.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Recurrent Neural Hidden Markov Model for High-order Transition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
      March 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494070
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 October 2021
      • Accepted: 1 July 2021
      • Revised: 1 May 2021
      • Received: 1 October 2020
      Published in tallip Volume 21, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!