Abstract
We propose a method to pay attention to high-order relations among latent states to improve the conventional HMMs that focus only on the latest latent state, since they assume Markov property. To address the high-order relations, we apply an RNN to each sequence of latent states, because the RNN can represent the information of an arbitrary-length sequence with their cell: a fixed-size vector. However, the simplest way, which provides all latent sequences explicitly for the RNN, is intractable due to the combinatorial explosion of the search space of latent states.
Thus, we modify the RNN to represent the history of latent states from the beginning of the sequence to the current state with a fixed number of RNN cells whose number is equal to the number of possible states. We conduct experiments on unsupervised POS tagging and synthetic datasets. Experimental results show that the proposed method achieves better performance than previous methods. In addition, the results on the synthetic dataset indicate that the proposed method can capture the high-order relations.
- . 2002. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 307–314.Google Scholar
Digital Library
- . 2010. Painless unsupervised learning with features. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 582–590.Google Scholar
- . 2011. A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 865–874.Google Scholar
Digital Library
- . 2012. Exact sampling and decoding in high-order hidden Markov models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1125–1134.Google Scholar
- . 2014. One billion word benchmark for measuring progress in statistical language modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- . 2020. Scaling hidden Markov language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1341–1349.Google Scholar
Cross Ref
- . 1957. Syntactic structures. Lightning Source, Inc. (2015).Google Scholar
- . 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 575–584.Google Scholar
- . 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.Google Scholar
Digital Library
- . 1935. Parts of speech and accidence. D.C. Health and company, Boston. (1935).Google Scholar
- . 2017. Recurrent hidden semi-Markov model. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
- . 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.Google Scholar
Cross Ref
- . 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833.Google Scholar
Cross Ref
- . 2016. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1019–1027.Google Scholar
- . 2020. Clustering contextualized representations of text for unsupervised syntax induction. arXiv preprint arXiv:2010.12784 (2020).Google Scholar
- . 2018. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1292–1302.Google Scholar
Cross Ref
- . 2020. Neural baselines for word alignment. arXiv preprint arXiv:2009.13116 (2020).Google Scholar
- . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- . 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 296–305.Google Scholar
- . 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350.Google Scholar
- . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- . 2015. Variational dropout and the local reparameterization trick. In Proceedings of the Advances in Neural Information Processing Systems. 2575–2583.Google Scholar
- . 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 1 (2016), 313–327.Google Scholar
- . 2015. Unsupervised POS induction with word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1311–1316.Google Scholar
Cross Ref
- . 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 92–97.Google Scholar
- . 2009. Dependency in Linguistic Description. Vol. 111. John Benjamins Publishing.Google Scholar
Cross Ref
- . 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410–420.Google Scholar
- . 2004. Unsupervised classification of music genre using hidden Markov model. In Proceedings of the IEEE International Conference on Multimedia and Expo. 2023–2026.Google Scholar
- . 2017. Neural lattice-to-sequence models for uncertain inputs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1380–1389.Google Scholar
Cross Ref
- . 2019. Mutual information maximization for simple and accurate part-of-speech induction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1095–1104.Google Scholar
Cross Ref
- . 2016. Unsupervised part-of-speech tagging with anchor hidden Markov models. Trans. Assoc. Comput. Ling. 4, 1 (2016), 245–257.Google Scholar
- . 2016. Unsupervised neural hidden Markov models. In Proceedings of the Workshop on Structured Prediction for NLP. 63–71.Google Scholar
Cross Ref
- . 2009. The infinite HMM for unsupervised PoS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 678–687.Google Scholar
- . 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 2 (1967), 260–269.Google Scholar
Digital Library
- . 2020. Second-order unsupervised neural dependency parsing. In Proceedings of the 28th International Conference on Computational Linguistics. 3911–3924.Google Scholar
Cross Ref
- . 1992. The role of staging and constructed dialogue in establishing speaker’s topic. Linguistics 30, 1 (1992), 199–216.Google Scholar
Cross Ref
- . 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1554–1564.Google Scholar
Cross Ref
Index Terms
Recurrent Neural Hidden Markov Model for High-order Transition
Recommendations
Hybrid high order neural networks
Neural networks (NNs) represent a familiar artificial intelligence approach widely applied in many fields and to a wide range of issues. The back propagation network (BPN) is one of the most well-known NNs, comprising multilayer perceptrons (MLPs) with ...
High-Order hopfield neural networks
ISNN'05: Proceedings of the Second international conference on Advances in Neural Networks - Volume Part IIn 1984 Hopfield showed that the time evolution of a symmetric Hopfield neural networks are a motion in state space that seeks out minima in the energy function (i.e., equilibrium point set of Hopfield neural networks). Because high-order Hopfield ...






Comments