Abstract
Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and objective function used in a composition model for phrase representation. Specifically, we first discuss how the augmented word representations affect the performance of the composition model. Then, we investigate whether different types of training data influence the performance of the composition model and, if so, how they influence it. Finally, we evaluate combinations of different composition and objective functions and discuss the factors related to composition model performance. All evaluations were conducted in both English and Chinese. Our main findings are as follows: (1) The Additive model with semantic enhanced word vectors performs comparably to the state-of-the-art model; (2) The Additive model which updates augmented word vectors and the Matrix model with semantic enhanced word vectors systematically outperforms the state-of-the-art model in bigram and multi-word phrase similarity task, respectively; (3) Representing the high frequency phrases by estimating their surrounding contexts is a good training objective for bigram phrase similarity tasks; and (4) The performance gain of composition model with semantic enhanced word vectors is due to the composition function and the greater weight attached to important words. Previous works focus on the composition function; however, our findings indicate that other components in the composition model (especially word representation) make a critical difference in phrase representation.
- Marco Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 1183--1193.Google Scholar
- Marco Baroni, Georgiana Dinu, and German Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 238--247.Google Scholar
- Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, 1137--1155.Google Scholar
Digital Library
- William Blacoe and Mirella Lapata. 2012. A comparison of vector-based representations for semantic composition. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 546--556.Google Scholar
Digital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, 993--1022.Google Scholar
Digital Library
- Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara, and Ken-ichi Kawarabayashi. 2016. Joint word representation learning using a corpus and a semantic lexicon. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2690--2696.Google Scholar
- Antoine Bride, Tim Van de Cruys, and Nicholas Asher. 2015. A generalisation of lexical functions for composition in distributional semantics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 281--291. Google Scholar
Cross Ref
- Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. Ltp: A chinese language technology platform. In Coling2010: Demonstrations. Beijing, China, 13--16.Google Scholar
- Callison-Burch Chris, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 17--24.Google Scholar
- Kai-min K. Chang, Vladimir L. Cherkassky, Tom M. Mitchell, and Marcel Adam Just. 2009. Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing. 638--646.Google Scholar
Cross Ref
- Kai-min K. Chang. 2011. Quantitative Modeling of the Neural Representation of Nouns and Phrases. Doctoral dissertation, University of Trento.Google Scholar
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160--167. Google Scholar
Digital Library
- Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493--2537.Google Scholar
Digital Library
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6, 391.Google Scholar
Cross Ref
- Corina Dima. 2015. Reverse-engineering language: A study on the semantic compositionality of german compounds. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1637--1642. Google Scholar
Cross Ref
- Georgiana Dinu, Nahia The Pham, and Marco Baroni. 2013. General estimation and evaluation of compositional distributional semantic models. In Proceedings of the ACL Workshop on Continuous Vector Space Models and Their Compositionality, 50--58.Google Scholar
- J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121--2159.Google Scholar
Digital Library
- Manaal Faruqui, Jesse Dodge, Sujay Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1606--1615 Google Scholar
Cross Ref
- Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 1199--1209. Google Scholar
Cross Ref
- Alona Fyshe. 2015. Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain. Doctoral dissertation, Air Force Research Laboratory.Google Scholar
- Edward Grefenstette and Mehrnoosh Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1394--1404.Google Scholar
- Edward Grefenstette, Georgiana Dinu, Yao-Zhong Zhang, Mehrnoosh Sadrzadeh, and Marco Baroni. 2013. Multi-step regression learning for compositional distributional semantics. arXiv preprint arXiv:1301.6939.Google Scholar
- Emiliano Guevara. 2010. A regression model of adjective-noun compositionality in distributional semantics. In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics. 33--37.Google Scholar
Digital Library
- Abhijeet Gupta, Gemma Boleda, Marco Baroni, and Sebastian Pado. 2015. Distributional vectors encode referential attributes. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 12--21. Google Scholar
Cross Ref
- Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, and Yoshimasa Tsuruoka. 2014. Jointly learning word representations and composition functions using predicate-argument structures. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1544--1555. Google Scholar
Cross Ref
- Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1367--1377. Google Scholar
Cross Ref
- Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882.Google Scholar
Digital Library
- Miki Iwai, Takashi Ninomiya, and Kyo Kageura. 2015. Acquiring distributed representations for verb-object pairs by using word2vec. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 328--336.Google Scholar
- Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1681--1691. Google Scholar
Cross Ref
- Douwe Kiela, Felix Hill, and Stephen Clark. 2015. Specializing word embeddings for similarity or relatedness. In Proceeding of the 2015 Conference on Empirical Methods in Natural Language Processing. 2044--2048. Google Scholar
Cross Ref
- Arne Köhn. 2015. What's in an embedding? Analyzing word embeddings through multilingual evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2067--2073. Google Scholar
Cross Ref
- Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185.Google Scholar
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. ACL 2014 Demo Session. Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
- Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Matthew Purver. 2014. Evaluating neural word representations in tensor-based compositional settings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 708--719. Google Scholar
Cross Ref
- Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. 236--244.Google Scholar
- Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429. Google Scholar
Cross Ref
- Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google Scholar
- Donald A. Norman. 1972. Memory, knowledge, and the answering of questions[J]. Contemporary Issues in Cognitive Psychology the Loyola Symposium.Google Scholar
- Nghia The Pham, Germán Kruszewski, Angeliki Lazaridou, and Marco Baroni 2015. Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 971--981.Google Scholar
Cross Ref
- Michael Roth and Kristian Woodsend. 2014. Composition of word representations improves semantic role labelling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 407--413. Google Scholar
Cross Ref
- Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.Google Scholar
Digital Library
- Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1201--1211.Google Scholar
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631, 1642.Google Scholar
- Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, and Andrew Y. Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics 2, 207--218.Google Scholar
Cross Ref
- Kaveh Taghipour and Hwee Tou Ng. 2015. Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In Proceedings of the 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 314--323. Google Scholar
Cross Ref
- Ran Tian, Naoaki Okazaki, and Kentaro Inui. 2015. The mechanism of additive composition. arXiv preprint arXiv:1511.08407.Google Scholar
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010, July. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 384--394.Google Scholar
- Tim Van de Cruys, Thierry Poibeau, and Anna Korhonen. 2013. A tensor-based factorization model of semantic compositionality. In Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics. 1142--1151.Google Scholar
- John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth. 2015. From paraphrase database to compositional paraphrase model and back. arXiv preprint arXiv:1506.03487.Google Scholar
- John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016a. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google Scholar
- John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016b. CHARAGRAM: Embedding words and sentences via character n-grams. arXiv preprint arXiv:1607.02789.Google Scholar
- Mo Yu and Mark Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 545--550. Google Scholar
Cross Ref
- Mo Yu and Mark Dredze. 2015. Learning composition models for phrase embeddings. Transactions of the Association for Computational Linguistics 3, 227--242.Google Scholar
Cross Ref
- Fabio M. Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.Google Scholar
Digital Library
- Yu Zhao, Zhiyuan Liu, and Maosong Sun. 2015. Phrase type sensitive tensor indexing model for semantic composition. In Proceedings of AAAI. 2195--2202.Google Scholar
Index Terms
Comparison Study on Critical Components in Composition Model for Phrase Representation
Recommendations
Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters,...
Phrase embedding learning based on external and internal context with compositionality constraint
Different methods are proposed to learn phrase embedding, which can be mainly divided into two strands. The first strand is based on the distributional hypothesis to treat a phrase as one non-divisible unit and to learn phrase embedding based on its ...
Phrase type sensitive tensor indexing model for semantic composition
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial IntelligenceCompositional semantic aims at constructing the meaning of phrases or sentences according to the compositionality of word meanings. In this paper, we propose to synchronously learn the representations of individual words and extracted high-frequency ...






Comments