Abstract
Word embeddings, which map words into a unified vector space, capture rich semantic information. From a linguistic point of view, words have two carriers, speech and writing. Yet the most recent word embedding models focus on only the writing carrier and ignore the role of the speech carrier in semantic expressions. However, in the development of language, speech appears before writing and plays an important role in the development of writing. For phonetic language systems, the written forms are secondary symbols of spoken ones. Based on this idea, we carried out our work and proposed double-carrier word embedding (DCWE). We used DCWE to conduct a simulation of the generation order of speech and writing. We trained written embedding based on phonetic embedding. The final word embedding fuses writing and phonetic embedding. To illustrate that our model can be applied to most languages, we selected Chinese, English, and Spanish as examples and evaluated these models through word similarity and text classification experiments.
- S. Bengio and G. Heigold. 2014. Word embeddings for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2014.Google Scholar
- Y. Bengio, H. Schwenk, J. Senécal, et al. 2003. Neural probabilistic language models. Journal of Machine Learning Research 3, 6 (2003), 1137--1155.Google Scholar
- L. Bloomfield. 1933. Language. Holt, New York, 1933.Google Scholar
- P. Bojanowski, E. Grave, A. Joulin, et al. 2016. Enriching word vectors with subword information. ArXiv Preprint Arxiv:1607.04606, 2016.Google Scholar
- J. A. Botha and P. Blunsom. 2014. Compositional morphology for word representations and language modelling. Computer Science 2014, 1899--1907.Google Scholar
- S. Cao and W. Lu. 2017. Improving word embeddings with convolutional feature learning and subword information. AAAI. 2017, 3144--3151.Google Scholar
- S. Cao, W. Lu, J. Zhou, et al. 2018. cw2vec: Learning Chinese word embeddings with stroke n-gram information. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.Google Scholar
- X. Chen, L. Xu, Z. Liu, et al. 2015. Joint learning of character and word embeddings.In Proceedings of the International Conference on Artificial Intelligence. AAAI Press, 2015:1236--1242.Google Scholar
- Y. C. Chen, S. F. Huang, C. H. Shen, et al. 2018. Phonetic-and-semantic embedding of spoken words with applications in spoken content retrieval. Arxiv Preprint Arxiv:1807.08089, 2018.Google Scholar
- R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.Google Scholar
- M. Etcheverry and D. Wonsever. 2016. Spanish word vectors from Wikipedia. LREC. 2016.Google Scholar
- Rong-En Fan, Kai-Wei Chang, Cho-JuiHsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871--1874.Google Scholar
- L. Finkelstein, E. Gabrilovich, Y. Matias, et al. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web. ACM, 2001, 406--414.Google Scholar
Digital Library
- Guy Halawi, Gideon Dror, Evgeniy Gabrilovich, and Yehuda Koren. 2012. Large-scale learning of word relatedness with constraints. In KDD.Google Scholar
- S. Hassan and R. Mihalcea. 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- W. He, W. Wang, and K. Livescu. 2016. Multi-view recurrent neural acoustic word embeddings. Arxiv Preprint Arxiv:1611.04496, 2016.Google Scholar
- P. Jin and Y. Wu. 2012. SemEval-2012 task 4: Evaluating Chinese word similarity. In Proceedings of the Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics 2013, 374--377.Google Scholar
- A. Joulin, E. Grave, P. Bojanowski, et al. 2016. Bag of tricks for efficient text classification. Arxiv Preprint Arxiv:1607.01759, 2016.Google Scholar
- A. Jansen, M. Plakal, R. Pandya, et al. 2017. Unsupervised learning of semantic audio representations. ArXiv Preprint ArXiv:1711.02209, 2017.Google Scholar
- H. Kamper, W. Wang, and K. Livescu. 2016. Deep convolutional acoustic word embeddings using word-pair side information. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016:4950--4954.Google Scholar
- D. Kiela and S. Clark. 2015. Multi-and cross-modal semantics beyond vision: Grounding in auditory perception. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 2461--2470.Google Scholar
- D. Kiela and S. Clark. 2017. Learning neural audio embeddings for grounding semantics in auditory perception. Journal of Artificial Intelligence Research 60 (2017) 1003--1030.Google Scholar
Digital Library
- K. Levin, A. Jansen, and B. Van Durme. 2015. Segmental acoustic indexing for zero resource keyword search. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, 5828--5832.Google Scholar
Cross Ref
- O. Levy and Y. Goldberg. 2014. Dependency-based word embeddings. Annual Meeting of the Association for Computational Linguistics 2014, 302--308.Google Scholar
- E. Martínez-Cámara, M. C. Díaz-Galiano, M. Á. G. Cumbreras, et al. 2017. Overview of TASS 2017. TASS 2017: Workshop on Sentiment Analysis at SEPLN. 2017.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, et al. 2013a. Efficient estimation of word representations in vector space. Arxiv Preprint Arxiv:1301.3781. 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, et al. 2013b. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 2013, 26:3111--3119.Google Scholar
- A. Mnih and G. E. Hinton. 2009. A scalable hierarchical distributed language model. Advances in Neural Information Processing Systems. 2009, 1081--1088.Google Scholar
- J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). 1532--1543.Google Scholar
- L. Qiu, Y. Cao, Z. Nie, et al. 2014. Learning word representation considering proximity and ambiguity. In Proceedings of the AAAI Conference on Artificial Intelligence 2014, 1572--1578.Google Scholar
- H. Rubenstein and J. B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM 8, 10 (1965), 627--633Google Scholar
Digital Library
- S. Ryu, S. Kim, J. Choi, et al. 2017. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems. Pattern Recognition Letters 2017, 26--32Google Scholar
- Edward Sapir. 1921. Language: An introduction to the study of speech. Harcourt, Brace, 1921.Google Scholar
- F. D. Saussure. 1915. Course in General Linguistics. 1915.Google Scholar
- A. K. Vijayakumar, R. Vedantam, and D. Parikh. 2017. Sound-word2vec: Learning word representations grounded in sounds. ArXiv Preprint Arxiv:1703.01720, 2017.Google Scholar
- S. Wang, J. Zhang, and C. Zong. 2018. Associative multichannel autoencoder for multimodal word representation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018, 115--124.Google Scholar
- Y. Xu and J. Liu. 2017. Implicitly incorporating morphological information into word embedding. arXiv preprint. 2017:1701.02481.Google Scholar
- J Xu, J. Liu, L. Zhang, et al. 2016. Improve Chinese word embeddings by exploiting internal structure. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016, 1041--1050.Google Scholar
Cross Ref
- J. Yu, X. Jian, H. Xin, et al. 2017. Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 286--291.Google Scholar
- J. Zhang, S. Liu, M. Li, et al. 2014. Bilingually constrained phrase embeddings for machine translation. Annual Meeting of the Association for Computational Linguistics. 2014, 111--121.Google Scholar
Cross Ref
- X. Zhang, J. Zhao, and Y. LeCun, et al. 2015. Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems. 2015, 649--657.Google Scholar
Index Terms
Enhanced Double-Carrier Word Embedding via Phonetics and Writing
Recommendations
Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
AbstractWord Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are ...
Chinese Textual Entailment Recognition Enhanced with Word Embedding
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big DataAbstractTextual entailment has been proposed as a unifying generic framework for modeling language variability and semantic inference in different Natural Language Processing (NLP) tasks. By evaluating on NTCIR-11 RITE3 Simplified Chinese subtask data set,...
Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement
Word embedding-based methods have received increasing attention for their flexibility and effectiveness in many natural language-processing (NLP) tasks, including Word Similarity (WS). However, these approaches rely on high-quality corpus and neglect ...






Comments