skip to main content
research-article

Comparison Study on Critical Components in Composition Model for Phrase Representation

Authors Info & Claims
Published:20 January 2017Publication History
Skip Abstract Section

Abstract

Phrase representation, an important step in many NLP tasks, involves representing phrases as continuous-valued vectors. This article presents detailed comparisons concerning the effects of word vectors, training data, and the composition and objective function used in a composition model for phrase representation. Specifically, we first discuss how the augmented word representations affect the performance of the composition model. Then, we investigate whether different types of training data influence the performance of the composition model and, if so, how they influence it. Finally, we evaluate combinations of different composition and objective functions and discuss the factors related to composition model performance. All evaluations were conducted in both English and Chinese. Our main findings are as follows: (1) The Additive model with semantic enhanced word vectors performs comparably to the state-of-the-art model; (2) The Additive model which updates augmented word vectors and the Matrix model with semantic enhanced word vectors systematically outperforms the state-of-the-art model in bigram and multi-word phrase similarity task, respectively; (3) Representing the high frequency phrases by estimating their surrounding contexts is a good training objective for bigram phrase similarity tasks; and (4) The performance gain of composition model with semantic enhanced word vectors is due to the composition function and the greater weight attached to important words. Previous works focus on the composition function; however, our findings indicate that other components in the composition model (especially word representation) make a critical difference in phrase representation.

References

  1. Marco Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 1183--1193.Google ScholarGoogle Scholar
  2. Marco Baroni, Georgiana Dinu, and German Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 238--247.Google ScholarGoogle Scholar
  3. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, 1137--1155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William Blacoe and Mirella Lapata. 2012. A comparison of vector-based representations for semantic composition. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 546--556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, 993--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara, and Ken-ichi Kawarabayashi. 2016. Joint word representation learning using a corpus and a semantic lexicon. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2690--2696.Google ScholarGoogle Scholar
  7. Antoine Bride, Tim Van de Cruys, and Nicholas Asher. 2015. A generalisation of lexical functions for composition in distributional semantics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 281--291. Google ScholarGoogle ScholarCross RefCross Ref
  8. Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. Ltp: A chinese language technology platform. In Coling2010: Demonstrations. Beijing, China, 13--16.Google ScholarGoogle Scholar
  9. Callison-Burch Chris, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 17--24.Google ScholarGoogle Scholar
  10. Kai-min K. Chang, Vladimir L. Cherkassky, Tom M. Mitchell, and Marcel Adam Just. 2009. Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing. 638--646.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kai-min K. Chang. 2011. Quantitative Modeling of the Neural Representation of Nouns and Phrases. Doctoral dissertation, University of Trento.Google ScholarGoogle Scholar
  12. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493--2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6, 391.Google ScholarGoogle ScholarCross RefCross Ref
  15. Corina Dima. 2015. Reverse-engineering language: A study on the semantic compositionality of german compounds. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1637--1642. Google ScholarGoogle ScholarCross RefCross Ref
  16. Georgiana Dinu, Nahia The Pham, and Marco Baroni. 2013. General estimation and evaluation of compositional distributional semantic models. In Proceedings of the ACL Workshop on Continuous Vector Space Models and Their Compositionality, 50--58.Google ScholarGoogle Scholar
  17. J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121--2159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Manaal Faruqui, Jesse Dodge, Sujay Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1606--1615 Google ScholarGoogle ScholarCross RefCross Ref
  19. Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 1199--1209. Google ScholarGoogle ScholarCross RefCross Ref
  20. Alona Fyshe. 2015. Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain. Doctoral dissertation, Air Force Research Laboratory.Google ScholarGoogle Scholar
  21. Edward Grefenstette and Mehrnoosh Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1394--1404.Google ScholarGoogle Scholar
  22. Edward Grefenstette, Georgiana Dinu, Yao-Zhong Zhang, Mehrnoosh Sadrzadeh, and Marco Baroni. 2013. Multi-step regression learning for compositional distributional semantics. arXiv preprint arXiv:1301.6939.Google ScholarGoogle Scholar
  23. Emiliano Guevara. 2010. A regression model of adjective-noun compositionality in distributional semantics. In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics. 33--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Abhijeet Gupta, Gemma Boleda, Marco Baroni, and Sebastian Pado. 2015. Distributional vectors encode referential attributes. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 12--21. Google ScholarGoogle ScholarCross RefCross Ref
  25. Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, and Yoshimasa Tsuruoka. 2014. Jointly learning word representations and composition functions using predicate-argument structures. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1544--1555. Google ScholarGoogle ScholarCross RefCross Ref
  26. Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1367--1377. Google ScholarGoogle ScholarCross RefCross Ref
  27. Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873--882.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Miki Iwai, Takashi Ninomiya, and Kyo Kageura. 2015. Acquiring distributed representations for verb-object pairs by using word2vec. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 328--336.Google ScholarGoogle Scholar
  29. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1681--1691. Google ScholarGoogle ScholarCross RefCross Ref
  30. Douwe Kiela, Felix Hill, and Stephen Clark. 2015. Specializing word embeddings for similarity or relatedness. In Proceeding of the 2015 Conference on Empirical Methods in Natural Language Processing. 2044--2048. Google ScholarGoogle ScholarCross RefCross Ref
  31. Arne Köhn. 2015. What's in an embedding? Analyzing word embeddings through multilingual evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2067--2073. Google ScholarGoogle ScholarCross RefCross Ref
  32. Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185.Google ScholarGoogle Scholar
  33. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. ACL 2014 Demo Session. Google ScholarGoogle ScholarCross RefCross Ref
  34. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google ScholarGoogle Scholar
  35. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google ScholarGoogle Scholar
  36. Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Matthew Purver. 2014. Evaluating neural word representations in tensor-based compositional settings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 708--719. Google ScholarGoogle ScholarCross RefCross Ref
  37. Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. 236--244.Google ScholarGoogle Scholar
  38. Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34, 8, 1388--1429. Google ScholarGoogle ScholarCross RefCross Ref
  39. Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  40. Donald A. Norman. 1972. Memory, knowledge, and the answering of questions[J]. Contemporary Issues in Cognitive Psychology the Loyola Symposium.Google ScholarGoogle Scholar
  41. Nghia The Pham, Germán Kruszewski, Angeliki Lazaridou, and Marco Baroni 2015. Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 971--981.Google ScholarGoogle ScholarCross RefCross Ref
  42. Michael Roth and Kristian Woodsend. 2014. Composition of word representations improves semantic role labelling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 407--413. Google ScholarGoogle ScholarCross RefCross Ref
  43. Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1201--1211.Google ScholarGoogle Scholar
  45. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631, 1642.Google ScholarGoogle Scholar
  46. Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, and Andrew Y. Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics 2, 207--218.Google ScholarGoogle ScholarCross RefCross Ref
  47. Kaveh Taghipour and Hwee Tou Ng. 2015. Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In Proceedings of the 2015 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 314--323. Google ScholarGoogle ScholarCross RefCross Ref
  48. Ran Tian, Naoaki Okazaki, and Kentaro Inui. 2015. The mechanism of additive composition. arXiv preprint arXiv:1511.08407.Google ScholarGoogle Scholar
  49. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010, July. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 384--394.Google ScholarGoogle Scholar
  50. Tim Van de Cruys, Thierry Poibeau, and Anna Korhonen. 2013. A tensor-based factorization model of semantic compositionality. In Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics. 1142--1151.Google ScholarGoogle Scholar
  51. John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth. 2015. From paraphrase database to compositional paraphrase model and back. arXiv preprint arXiv:1506.03487.Google ScholarGoogle Scholar
  52. John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016a. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google ScholarGoogle Scholar
  53. John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016b. CHARAGRAM: Embedding words and sentences via character n-grams. arXiv preprint arXiv:1607.02789.Google ScholarGoogle Scholar
  54. Mo Yu and Mark Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 545--550. Google ScholarGoogle ScholarCross RefCross Ref
  55. Mo Yu and Mark Dredze. 2015. Learning composition models for phrase embeddings. Transactions of the Association for Computational Linguistics 3, 227--242.Google ScholarGoogle ScholarCross RefCross Ref
  56. Fabio M. Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics. 1263--1271.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yu Zhao, Zhiyuan Liu, and Maosong Sun. 2015. Phrase type sensitive tensor indexing model for semantic composition. In Proceedings of AAAI. 2195--2202.Google ScholarGoogle Scholar

Index Terms

  1. Comparison Study on Critical Components in Composition Model for Phrase Representation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!