Abstract
In this study, a novel confidence indexing algorithm is proposed to minimize human labor in controlling the reliability of automatically extracted synsets from a non-machine-readable monolingual dictionary. Contemporary Turkish Dictionary of Turkish Language Association is used as the monolingual dictionary data. First, the synonym relations are extracted by traditional text processing methods from dictionary definitions and a graph is prepared in Lemma-Sense network architecture. After each synonym relation is labeled by a proper confidence index, synonym pairs with desired confidence indexes are analyzed to detect synsets with a spanning tree-based method. This approach can label synsets with one of three cumulative confidence levels (CL-1, CL-2, and CL-3). According to the confidence levels, synsets are compared with KeNet which is the only open access Turkish Wordnet. Consequently, while most matches with the synsets of KeNet is determined in CL-1 and CL-2 confidence levels, the synsets determined at CL-3 level reveal errors in the dictionary definitions. This novel approach does not find only the reliability of automatically detected synsets, but it can also point out errors of detected synsets from the dictionary.
- [1] . 2016. Word sense disambiguation in monolingual dictionaries for building russian WordNet. In Proceedings of the 8th Global WordNet Conference. 9–14.Google Scholar
- [2] . 2005. Automatic construction of turkish wordnet. In Proceedings of the IEEE 13th Signal Processing and Communications Applications Conference.
IEEE , 248–251.Google ScholarCross Ref
- [3] . 2007. TKB-UO: Using sense clustering for WSD. In Proceedings of the 4th International Workshop on Semantic Evaluations. 322–325.Google Scholar
Cross Ref
- [4] . 2019. Integrating turkish WordNet KeNet to princeton WordNet: The case of one-to-many correspondences. In Proceedings of the 2019 Innovations in Intelligent Systems and Applications Conference.
IEEE , 1–5.Google ScholarCross Ref
- [5] . 2019. Problems caused by semantic drift in wordnet synset construction. In Proceedings of the 2019 4th International Conference on Computer Science and Engineering.
IEEE , 1–5.Google ScholarCross Ref
- [6] . 2004. Building a wordnet for Turkish. Romanian Journal of Information Science and Technology 7, 1–2 (2004), 163–172.Google Scholar
- [7] . 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 1352–1362.Google Scholar
- [8] . 2017. Strategies for building wordnets for under-resourced languages: The case of African languages. Literator (Potchefstroom. Online) 38, 1 (2017), 1–12.Google Scholar
- [9] . 2018. From word to sense embeddings: A survey on vector representations of meaning. Journal of Artificial Intelligence Research 63, 1 (2018), 743–788.Google Scholar
Digital Library
- [10] . 2016. An overview of Portuguese wordnets. In Proceedings of the 8th Global WordNet Conference. 74–82.Google Scholar
- [11] . 2018. Coming to your senses: On controls and evaluation sets in polysemy research. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1732–1740.Google Scholar
Cross Ref
- [12] . 2018. Constructing a wordnet for Turkish using manual and automatic annotation. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 3 (2018), 1–15.Google Scholar
Digital Library
- [13] . 2019. Synset expansion on translation graph for automatic wordnet construction. Information Processing & Management 56, 1 (2019), 130–150.Google Scholar
Cross Ref
- [14] . 2016. Adam Kilgarriff’s legacy to computational linguistics and beyond. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 3–25.Google Scholar
- [15] . 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
Cross Ref
- [16] . 2014. Java libraries for accessing the princeton wordnet: Comparison and evaluation. In Proceedings of the 7th Global Wordnet Conference. 78–85.Google Scholar
- [17] . 2014. ECO and Onto. PT: A flexible approach for creating a Portuguese wordnet automatically. Language Resources and Evaluation 48, 2 (2014), 373–393.Google Scholar
Digital Library
- [18] . 2009. Relations extracted from a portuguese dictionary: Results and first evaluation.In Proceedings of the New Trends in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence
(EPIA’09) . 541–551.Google Scholar - [19] . 1997. Germanet-a lexical-semantic net for german. In Proceedings of the Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications.Google Scholar
- [20] . 2021. Knowledge graphs. ACM Computing Surveys 54, 4, Article 71 (May 2022), 37 pages.
DOI: https://doi.org/10.1145/3447772Google ScholarDigital Library
- [21] . 2006. DEBVisDic–first version of new client-server WordNet browsing and editing tool. In Proceedings of the 3rd International WordNet Conference. Citeseer, 325–328.Google Scholar
- [22] . 2002. Lexicography: An Introduction. Routledge.Google Scholar
- [23] . 2015. Embedding a semantic network in a word space. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1428–1433.Google Scholar
Cross Ref
- [24] . 2013. Dil Bilimi Terimleri Sözlüğü. Türk Dil Kurumu Yayınları.Google Scholar
- [25] . 2011. The paradox of translating the untranslatable: Equivalence vs. non-equivalence in translating from Arabic into English. Journal of King Saud University-Languages and Translation 23, 1 (2011), 47–57.Google Scholar
Cross Ref
- [26] . 2017. The semantic distance task: Quantifying semantic distance with semantic network path length.Journal of Experimental Psychology: Learning, Memory, and Cognition 43, 9 (2017), 1470.Google Scholar
Cross Ref
- [27] . 2019. Automatic synset extraction from text documents using a graph-based clustering approach via maximal cliques finding. International Journal of Information and Communication Technology Research 11, 1 (2019), 27–35.Google Scholar
- [28] . 1992. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities 26, 5–6 (
Dec. 1992), 365–387.DOI: DOI: DOI: https://doi.org/10.1007/BF00136981Google ScholarCross Ref
- [29] . 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 2, 768–774.Google Scholar
- [30] . 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11 (1995), 39–41.Google Scholar
Digital Library
- [31] . 2015. Learning semantically rich event inference rules using definition of verbs. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics.
Springer , 402–416.Google ScholarCross Ref
- [32] . 2020. Development synonym set for the English wordnet using the method of comutative and agglomerative clustering. Jurnal Sisfokom (Sistem Informasi dan Komputer) 9, 2 (2020), 171–176.Google Scholar
Cross Ref
- [33] . 2020. Detecting communities in social networks based on cliques. Physica A: Statistical Mechanics and its Applications 551, 12 (2020), 124100.Google Scholar
Cross Ref
- [34] . 2012. Parallel corpora for wordnet construction: Machine translation vs. automatic sense tagging. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics.
Springer , 110–121.Google ScholarDigital Library
- [35] . 2005. Maximizing Semantic Relatedness to Perform Word Sense Disambiguation.
Technical Report . Research Report UMSI 2005/25. University of Minnesota Supercomputing Institute.Google Scholar - [36] . 2016. Embedding senses for efficient graph-based word sense disambiguation. In Proceedings of the 2016 Workshop on Graph-based Methods for Natural Language Processing. 1–5.Google Scholar
- [37] . 2008. Building an Indonesian wordnet. In Proceedings of the 2nd International MALINDO Workshop. 12–13.Google Scholar
- [38] . 2008. Building a free French wordnet from multilingual resources. In Proceedings of the OntoLex.Google Scholar
- [39] . 1979. Semantics of conceptual graphs. In Proceedings of the 17th Annual Meeting of the Association for Computational Linguistics. 39–44.Google Scholar
Digital Library
- [40] . 2002. Balkanet: A multilingual semantic network for the balkan languages. In Proceedings of the International Wordnet Conference. 21–25.Google Scholar
- [41] . 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2 (1972), 146–160.Google Scholar
Digital Library
- [42] . 2009. Thai wordnet construction. In Proceedings of the 7th Workshop on Asian Language Resources. 139–144.Google Scholar
Digital Library
- [43] . 2018. Building a Turkish semantic network and connecting synonym senses bidirectionally. In Proceedings of the 2018 Innovations in Intelligent Systems and Applications.
IEEE , 1–6.Google ScholarCross Ref
- [44] . 2017. Watset: Automatic induction of synsets from a graph of synonyms. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1.
Association for Computational Linguistics , 1579–1590.DOI: DOI: DOI: https://doi.org/10.18653/v1/P17-1145Google Scholar - [45] . 1998. A multilingual database with lexical semantic networks. 1st Ed. Springer Netherlands.
DOI: 10.1007/978-94-017-1491-4Google Scholar - [46] . 2002. Visualisation techniques for analysing meaning. In Proceedings of the International Conference on Text, Speech and Dialogue.
Springer , 107–114.Google ScholarDigital Library
- [47] . 2002. A graph model for unsupervised lexical acquisition. In Proceedings of the 19th International Conference on Computational Linguistics.Google Scholar
Digital Library
- [48] . 2005. Automatic extraction of idioms using graph analysis and asymmetric lexicosyntactic patterns. In Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition. 48–56.Google Scholar
Digital Library
- [49] . 1975. What’s in a link: Foundations for semantic networks. Representation and Understanding. Elsevier, 35–82.Google Scholar
Cross Ref
- [50] . 2011. Automatic extraction of semantic relationships using Turkish dictionary definitions. EMO Bilimsel Dergi 1, 1 (2011), 1–13.Google Scholar
- [51] . 2014. Cross-linguistic evidence for cognitive foundations of polysemy. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 36.Google Scholar
Index Terms
Confidence Indexing of Automated Detected Synsets: A Case Study on Contemporary Turkish Dictionary
Recommendations
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary
Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The ...
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary
Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The ...
Data-driven synset induction and disambiguation for wordnet development
Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...






Comments