Abstract
WordNets built for low-resource languages, such as Assamese, often use the expansion methodology. This may result in missing lexical entries and missing synonymy relations. As the Assamese WordNet is also built using the expansion method, using the Hindi WordNet, it also has missing synonymy relations. As WordNets can be visualized as a network of unique words connected by synonymy relations, link prediction in complex network analysis is an effective way of predicting missing relations in a network. Hence, to predict the missing synonyms in the Assamese WordNet, link prediction methods were used in the current work that proved effective. It is also observed that for discovering missing relations in the Assamese WordNet, simple local proximity-based methods might be more effective as compared to global and complex supervised models using network embedding. Further, it is noticed that though a set of retrieved words are not synonyms per se, they are semantically related to the target word and may be categorized as semantic cohorts.
- . 2003. Friends and neighbors on the web. Soc. Netw. 25, 3 (2003), 211–230.Google Scholar
Cross Ref
- . 2000. Power-law distribution of the world wide web. Science 287, 5461 (2000), 2115–2115.Google Scholar
Cross Ref
- . 1998. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
ACM , 37–45.Google ScholarDigital Library
- . 2009. Scale-free networks: A decade and beyond. Science 325, 5939 (2009), 412–413.Google Scholar
Cross Ref
- . 2014. An analytical study of synonymy in Assamese language using WorldNet: Classification and structure. In Proceedings of the 7th Global WordNet Conference. 250–255.Google Scholar
- . 2017. IndoWordNet. In The WordNet in Indian Languages. Springer, 1–18.Google Scholar
Cross Ref
- . 2004. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev. 46, 4 (2004), 647–666.Google Scholar
Digital Library
- . 2002. Automatic extraction of synonyms in a dictionary. Vertex 1 (2002), x1.Google Scholar
- . 2011. Census of India. Rural Urban Distribution of Population, Provisional Population Total. New Delhi: Office of the Registrar General and Census Commissioner, India.Google Scholar
- . 1993. The semantic and stylistic differentiation of synonyms and near-synonyms. In Proceedings of the AAAI Spring Symposium on Building Lexicons for Machine Translation. 114–121.Google Scholar
- . 2009. Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106, 36 (2009), 15274–15278.Google Scholar
Cross Ref
- . 2002. Near-synonymy and lexical choice. Comput. Ling. 28, 2 (2002), 105–144.Google Scholar
Digital Library
- . 2000. Semantic Representations of Near-synonyms for Automatic Lexical Choice.University of Toronto.Google Scholar
- . 2019. Hierarchical multi-task word embedding learning for synonym prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
ACM , 834–842.Google ScholarDigital Library
- . 2004. Analysing scientific networks through co-authorship. In Handbook of Quantitative Science and Technology Research. Springer, 257–276.Google Scholar
- . 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM , 855–864.Google ScholarDigital Library
- . 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.Google Scholar
Cross Ref
- . 2020. Synonymy= translational equivalence. arXiv preprint arXiv:2004.13886 (2020).Google Scholar
- . 2016. Automatic discovery of attribute synonyms using query logs and table corpora. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1429–1439.Google Scholar
Digital Library
- . 1999. Thesaurus entry extraction from an on-line dictionary. In Proceedings of Fusion, Vol. 99. Citeseer.Google Scholar
- . 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM , 538–543.Google ScholarDigital Library
- . 2009. Link propagation: A fast semi-supervised learning algorithm for link prediction. In Proceedings of the SIAM International Conference on Data Mining.
SIAM , 1100–1111.Google ScholarCross Ref
- . 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).Google Scholar
- . 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604–632.Google Scholar
Digital Library
- . 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.Google Scholar
Digital Library
- . 2019. Network-based prediction of protein interactions. Nat. Commun. 10, 1 (2019), 1240.Google Scholar
Cross Ref
- . 2016. A minimally supervised approach for synonym extraction with word embeddings. Prague Bull. Math. Ling. 105, 1 (2016), 111–142.Google Scholar
Cross Ref
- . 2012. A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29, 3 (2012), 355–364.Google Scholar
Digital Library
- . 2007. The link-prediction problem for social networks. J. Amer. Soc. Inf. Sci. Technol. 58, 7 (2007), 1019–1031.Google Scholar
Digital Library
- . 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases.
Springer , 437–452.Google ScholarDigital Library
- . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- . 1990. Introduction to WordNet: An on-line lexical database. Int. J. Lexicog. 3, 4 (1990), 235–244.Google Scholar
Cross Ref
- . 2003. Merriam-Webster’s Collegiate Dictionary (11th ed.). Merriam-Webster, Springfield, MA.Google Scholar
- . 2010. Key Terms in Semantics. A&C Black.Google Scholar
- . 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources.
Association for Computational Linguistics , 19–27.Google ScholarCross Ref
- . 1998. The European Physical Journal: Condensed Matter and Complex Systems. B. Springer.Google Scholar
- . 1999. The PageRank Citation Ranking: Bringing Order to the Web.
Technical Report . Stanford InfoLab.Google Scholar - . 2008. Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2, 1–2 (2008), 1–135.Google Scholar
Digital Library
- . 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.Google Scholar
Cross Ref
- . 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM , 701–710.Google ScholarDigital Library
- . 2012. Link prediction in complex networks by supervised rank aggregation. In Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence.
IEEE , 782–789.Google ScholarDigital Library
- . 2009. Semi-supervised learning for semantic relation classification using stratified sampling strategy. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
Association for Computational Linguistics , 1437–1445.Google ScholarCross Ref
- . 2017. Automatic synonym discovery with knowledge bases. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM , 997–1005.Google ScholarDigital Library
- . 1969. Two Dictionary Transcripts and Programs for Processing Them. Volume I. The Encoding Scheme, Parsent and Conix.
Technical Report . System Development Corp., Santa Monica, CA.Google Scholar - . 2010. Foundation and structure of developing Assamese WordNet. In Proceedings of the 5th International Conference of the Global WordNet Association (GWC).Google Scholar
- . 2018. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference.
Springer , 593–607.Google ScholarDigital Library
- . 2000. A network of protein–protein interactions in yeast. Nature Biotechnol. 18, 12 (2000), 1257–1261.Google Scholar
Cross Ref
- . 2019. Mining entity synonyms with efficient neural set generation. In Proceedings of the AAAI Conference on Artificial Intelligence. 249–256.Google Scholar
Digital Library
- . 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35–43.Google Scholar
- . 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1297–1304.Google Scholar
- . 2005. The large-scale structure of semantic networks. Cogn. Sci. 29, 1 (2005), 41–78.
DOI: 10.1207/s15516709cog2901_3Google Scholar - . 2014. Transitive node similarity: Predicting and recommending links in signed social networks. World Wide Web 17, 4 (2014), 743–776.Google Scholar
Digital Library
- . 2018. Verse: Versatile graph embeddings from similarity measures. In Proceedings of the World Wide Web Conference on World Wide Web.
International World Wide Web Conferences Steering Committee , 539–548.Google ScholarDigital Library
- . 2017. Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts.
Springer , 94–105.Google Scholar - . 2015. Link prediction in social networks: The state-of-the-art. Sci. China Inf. Sci. 58, 1 (2015), 1–38.Google Scholar
Cross Ref
- . 2009. Extracting synonyms from dictionary definitions. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 471–477.Google Scholar
- . 2003. Complex networks: Small-world, scale-free and beyond. IEEE Circ. Syst. Mag. 3, 1 (2003), 6–20.Google Scholar
Cross Ref
- . 2014. Learning to distinguish hypernyms and co-hyponyms. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers.Google Scholar
Index Terms
Synonymy Expansion Using Link Prediction Methods: A Case Study of Assamese WordNet
Recommendations
A Lemmatizer for Low-resource Languages: WSD and Its Role in the Assamese Language
The morphological variations of highly inflected languages that appear in a text impede the progress of computer processing and root word determination tasks while extracting an abstract. As a remedy to this difficulty, a lemmatization algorithm is ...
Development of Part of Speech Tagger for Assamese Using HMM
This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model HMM. Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for ...
POS Tagging of Assamese Language and Performance Analysis of CRF++ and fnTBL Approaches
UKSIM '13: Proceedings of the 2013 UKSim 15th International Conference on Computer Modelling and SimulationAssamese is one of the regional languages of India spoken by the people of Assam and other north eastern states of India. Parts Of Speech (POS) tagging is one of the most important research issue as it is the basic need for any Natural Language ...






Comments