Abstract
End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequence-to-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained from this study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages.
- [1] . 2010. A simple domain-independent probabilistic approach to generation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 502–512.Google Scholar
Digital Library
- [2] . 2000. Morphosyntactic generation of Turkish from predicate-argument structure. In Proceedings of the COLING Student Session. Association for Computational Linguistics, Saarbrucken, Germany.Google Scholar
- [3] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. OpenReview.net, San Diego, California.Google Scholar
- [4] . 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65–72.Google Scholar
Digital Library
- [5] . 2005. Collective content selection for concept-to-text generation. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Vancouver, British Columbia, Canada, 331–338.Google Scholar
Digital Library
- [6] . 1990. Upper modeling: A general organization of knowledge for natural language processing. In Proceedings of the Information Sciences Institute. USC.Google Scholar
- [7] . 2005. Statistical generation: Three methods compared and evaluated. In Proceedings of the 10th European Workshop on Natural Language Generation. Association for Computational Linguistics, Aberdeen, Scotland, 15–23.Google Scholar
- [8] . 2018. Wikipedia infobox type prediction using embeddings. In Proceedings of the 1st Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, ESWC. CEUR-WS.org, Crete, Greece.Google Scholar
- [9] . 2019. Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 552–562.Google Scholar
- [10] . 2001. YAG a template-based text realization system for dialog. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems 9, 6 (2001), 649–659.Google Scholar
Cross Ref
- [11] . 2008. Learning to sportscast: A test of grounded language acquisition. In Proceedings of the 25th International Conference on Machine Learning. ACM, New York, NY, 128–135.Google Scholar
Digital Library
- [12] . 2018. A general model for neural text generation from structured data. In Proceedings of the E2E NLG Challenge System Descriptions, 11th International Conference on Natural Language Generation. Association for Computational Linguistics, Tilburg, The Netherlands.Google Scholar
- [13] . 2020. Few-shot NLG with pre-trained language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 183–190.Google Scholar
Cross Ref
- [14] . 2017. Learning to generate one-sentence biographies from Wikidata. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Valencia, Spain, 633–642.Google Scholar
Cross Ref
- [15] . 1998. Generation of simple Turkish sentences with systemic-functional grammar. In Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 165–173.Google Scholar
Digital Library
- [16] . 2016. The WebNLG challenge: Generating text from DBPedia data. In Proceedings of the 9th International Natural Language Generation Conference. Association for Computational Linguistics, Edinburgh, UK, 163–167.Google Scholar
Cross Ref
- [17] . 2018. A sequence-to-sequence model for semantic role labeling. In Proceedings of the 3rd Workshop on Representation Learning for NLP. Association for Computational Linguistics, Melbourne, Australia, 207–216.Google Scholar
Cross Ref
- [18] . 2022. A benchmark dataset for Turkish data-to-text generation. Computer Speech and Language (to appear). https://www.sciencedirect.com/science/article/abs/pii/S0885230822000614.Google Scholar
- [19] . 2002. Automatic evaluation of machine translation quality using N-Gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc., San Francisco, CA, 138–145.Google Scholar
Digital Library
- [20] . 2018. Generation of original text with text mining and deep learning methods for Turkish and other languages. In Proceedings of the International Conference on Artificial Intelligence and Data Processing. IEEE, Malatya, Turkey, 1–9.Google Scholar
Cross Ref
- [21] . 2013. Generating natural language from linked data: Unsupervised template extraction. In Proceedings of the 10th International Conference on Computational Semantics. Association for Computational Linguistics, Potsdam, Germany, 83–94.Google Scholar
- [22] . 2020. Evaluating the state-of-the-art of end-to-end natural language generation: The E2E NLG challenge. Computer Speech & Language 59, 1 (2020), 123–156.Google Scholar
Digital Library
- [23] . 2018. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Melbourne, Australia, 889–898.Google Scholar
Cross Ref
- [24] . 2019. Neural sentence fusion for diversity driven abstractive multi-document summarization. Computer Speech & Language 58, 6 (2019), 216–230.Google Scholar
Digital Library
- [25] . 2021. RDF-to-text generation with graph-augmented structural neural encoders. In Proceedings of the 29th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Yokohama, Yokohama, Japan, Article
419 , 7 pages.Google Scholar - [26] . 2017. The WebNLG challenge: Generating text from RDF data. In Proceedings of the 10th International Conference on Natural Language Generation. Association for Computational Linguistics, Santiago de Compostela, Spain, 124–133.Google Scholar
Cross Ref
- [27] . 2018. Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research 61, 1 (2018), 65–170. Google Scholar
Cross Ref
- [28] . 2018. End-to-end content and plan selection for data-to-text generation. In Proceedings of the 11th International Conference on Natural Language Generation. Association for Computational Linguistics, Tilburg, The Netherlands, 46–56.Google Scholar
Cross Ref
- [29] . 2016. Coreference in Wikipedia: Main concept resolution. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 229–238.Google Scholar
Cross Ref
- [30] . 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57, 1 (2016), 345–420.Google Scholar
Cross Ref
- [31] . 2020. TableGPT: Few-shot table-to-text generation with table structure reconstruction and content matching. In Proceedings of the 28th International Conference on Computational Linguistics. Association for Computational Linguistics, Barcelona, Spain (Online), 1978–1988.Google Scholar
Cross Ref
- [32] . 2019. Enhanced transformer model for data-to-text generation. In Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, Hong Kong, 148–156.Google Scholar
Cross Ref
- [33] . 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning. JMLR.org, Beijing, China, 1764–1772.Google Scholar
- [34] . 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 1631–1640.Google Scholar
Cross Ref
- [35] . 2013. Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences 21, 5 (2013), 1411–1425.Google Scholar
Cross Ref
- [36] . 1996. Design and Implementation of a Wide-coverage Tactical Generator for Turkish, a Free Constituent Order Language. Master’s thesis. Bilkent University.Google Scholar
- [37] . 2008. Building a large-scale commercial NLG system for an EMR. In Proceedings of the Fifth International Natural Language Generation Conference. Association for Computational Linguistics, Salt Fork, Ohio, 157–160.Google Scholar
Digital Library
- [38] . 2016. WikiReading: A novel large-scale language understanding task over Wikipedia. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 1535–1545.Google Scholar
Cross Ref
- [39] . 2016. A prioritized grid long short-term memory RNN for speech recognition. In Proceedings of the IEEE Spoken Language Technology Workshop. IEEE, San Diego, California, 467–473.Google Scholar
Cross Ref
- [40] . 2019. Sentence simplification from non-parallel corpus with adversarial learning. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. ACM, New York, NY, 43–50.Google Scholar
Digital Library
- [41] . 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. ACM, San Diego, CA.Google Scholar
- [42] . 2019. Text generation from knowledge graphs with graph transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 2284–2293.Google Scholar
- [43] . 2013. Inducing document plans for concept-to-text generation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, 1503–1514.Google Scholar
- [44] . 2010. Generic text summarization for Turkish. Computer Journal 53, 8 (
Oct. 2010), 1315–1323.Google ScholarDigital Library
- [45] . 2018. Turkish meaningful text generation with class based n-gram model. In Proceedings of the 26th Signal Processing and Communications Applications Conference. IEEE, Izmir, Turkey, 1–4.Google Scholar
Cross Ref
- [46] . 2018. Image captioning in Turkish with subword units. In Proceedings of the 26th Signal Processing and Communications Applications Conference. IEEE, Izmir, Turkey, 1–4.Google Scholar
Cross Ref
- [47] . 2018. Extracting linguistic resources from the web for concept-to-text generation. arXiv: 1810.13414. Retrieved from https://arxiv.org/abs/1810.13414.Google Scholar
- [48] . 2010. Extracting structured information from Wikipedia articles to populate Infoboxes. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 1661–1664.Google Scholar
Digital Library
- [49] . 2002. An empirical verification of coverage and correctness for a general-purpose sentence generator. In Proceedings of the International Natural Language Generation Conference. Association for Computational Linguistics, Harriman, New York, 17–24.Google Scholar
- [50] . 2016. Neural text generation from structured data with application to the biography domain. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1203–1213.Google Scholar
Cross Ref
- [51] . 2021. Pretrained language model for text generation: A survey. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conferences on Artificial Intelligence Organization, Online, 4492–4499.Google Scholar
Cross Ref
- [52] . 2009. Learning semantic correspondences with less supervision. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore, 91–99.Google Scholar
Digital Library
- [53] . 2004. ROUGE: A package for automatic evaluation of summaries. In Procedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.Google Scholar
- [54] . 2018. Table-to-text generation by structure-aware Seq2seq learning. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018).Google Scholar
- [55] . 2015. Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Beijing, China, 11–19.Google Scholar
Cross Ref
- [56] . 2016. Statistical natural language generation from tabular non-textual data. In Proceedings of the 9th International Natural Language Generation Conference. Association for Computational Linguistics, Edinburgh, Scotland, 143–152.Google Scholar
Cross Ref
- [57] . 2010. Phrase-based statistical language generation using graphical models and active learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Uppsala, Sweden, 1552–1561.Google Scholar
Digital Library
- [58] . 2014. Stochastic language generation in dialogue using factored language models. Computational Linguistics 40, 4 (2014), 763–799.Google Scholar
Digital Library
- [59] . 2017. Abstractive morphological learning with a recurrent neural network. Morphology 27, 4 (2017), 431–458.Google Scholar
Cross Ref
- [60] . 2018. A context-aware convolutional natural language generation model for dialogue systems. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Melbourne, Australia, 191–200.Google Scholar
Cross Ref
- [61] . 2016. Data-driven natural language generation using statistical machine translation and discriminative learning. Ph.D. Dissertation. Université d’Avignon, France.Google Scholar
- [62] . 2016. Automatic corpus extension for data-driven natural language generation. In Proceedings of the 10th International Conference on Language Resources and Evaluation. European Language Resources Association, Portorož, Slovenia, 3624–3631.Google Scholar
- [63] . 2016. Aligning texts and knowledge bases with semantic sentence simplification. In Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web. Association for Computational Linguistics, Edinburgh, Scotland, 29–36.Google Scholar
Cross Ref
- [64] . 2016. Crowd-sourcing NLG data: Pictures elicit better data. In Proceedings of the 9th International Natural Language Generation Conference. Association for Computational Linguistics, Edinburgh, UK, 265–273.Google Scholar
Cross Ref
- [65] . 2014. Analyzing stemming approaches for Turkish multi-document summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Doha, Qatar, 702–706.Google Scholar
Cross Ref
- [66] . 2001. ILEX: The architecture for a dynamic hypertext generation system. Natural Language Engineering 7, 3 (2001), 225–250.Google Scholar
Digital Library
- [67] . 2018. Turkish Natural Language Processing. Springer.Google Scholar
Digital Library
- [68] . 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, 311–318.Google Scholar
Digital Library
- [69] . 2018. Bootstrapping generators from noisy data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies. Association for Computational Linguistics, New Orleans, Louisiana, 1516–1527.Google Scholar
Cross Ref
- [70] . 2015. chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Lisbon, Portugal, 392–395.Google Scholar
Cross Ref
- [71] . 2009. Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence 173, 7–8 (2009), 789–816.Google Scholar
Digital Library
- [72] . 2017. A comparison of sequence-to-sequence models for speech recognition. In Proceedings of the 18th International Speech Communication Association (Interspeech). International Speech Communication Association, Stockholm, Sweden, 939–943.Google Scholar
Cross Ref
- [73] . 2019. Data-to-text generation with content selection and planning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI Press, Honolulu, Hawaii, 6908–6915.Google Scholar
Digital Library
- [74] . 2016. SQuAD: 100,000 + questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383–2392.Google Scholar
Cross Ref
- [75] . 2020. A hierarchical model for data-to-text generation. In Proceedings of the Advances in Information Retrieval, , , , , , , and (Eds.), Springer International Publishing, 65–80.Google Scholar
Digital Library
- [76] . 1997. Building applied natural language generation systems. Natural Language Engineering 3, 1 (1997), 57–87.Google Scholar
Digital Library
- [77] . 2011. Data-driven response generation in social media. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Edinburgh, Scotland, UK., 583–593.Google Scholar
Digital Library
- [78] . 2014. Text summarization using Wikipedia. Information Processing and Management 50, 3 (2014), 443–461. Google Scholar
Cross Ref
- [79] . 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, Canada, 1073–1083.Google Scholar
Cross Ref
- [80] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 1715–1725.Google Scholar
Cross Ref
- [81] . 2018. Order-planning neural text generation from structured data. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, New Orleans, Louisiana, 5414–5421.Google Scholar
Cross Ref
- [82] . 2018. Handling rare items in data-to-text generation. In Proceedings of the 11th International Conference on Natural Language Generation. Association for Computational Linguistics, Tilburg University, The Netherlands, 360–370.Google Scholar
Cross Ref
- [83] . 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, Madison, WI, 129–136.Google Scholar
Digital Library
- [84] . 2003. Exploiting a parallel TEXT - DATA corpus. In Proceedings of the Corpus Linguistics. Lancaster University, UK, 734–743.Google Scholar
- [85] . 2021. Towards table-to-text generation with numerical reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 1451–1465.Google Scholar
Cross Ref
- [86] . 1998. A language-independent system for generating feature structures from interlingua representations. In Proceedings of Natural Language Generation. Association for Computational Linguistics, Niagara-on-the-Lake, Ontario, Canada, 188–197.Google Scholar
- [87] . 2018. Adversarial domain adaptation for variational neural language generation in dialogue systems. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 1205–1217.Google Scholar
- [88] . 2017. Neural-based natural language generation in dialogue using RNN encoder-decoder with semantic aggregation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Saarbrücken, Germany, 231–240.Google Scholar
Cross Ref
- [89] . 2021. GCP: Graph encoder with content-planning for sentence generation from knowledge base. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1.Google Scholar
- [90] . 2005. Real versus template-based natural language generation: A false opposition? Computational Linguistics 31, 1 (2005), 15–24.Google Scholar
Digital Library
- [91] . 2019. An XML parser for Turkish Wikipedia. In Proceedings of the 27th Signal Processing and Communications Applications Conference, SIU. IEEE, Sivas, Turkey, 1–4.Google Scholar
Cross Ref
- [92] . 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., Long Beach, CA, 6000–6010.Google Scholar
Digital Library
- [93] . 2018. Graph-based bilingual word embedding for statistical machine translation. ACM Transactions on Asian Low-Resource Language Information Processing 17, 4 (2018), 1–23.Google Scholar
Digital Library
- [94] . 2015. Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 1711–1721.Google Scholar
Cross Ref
- [95] . 2020. Recurrent neural network language generation for spoken dialogue systems. Computer Speech and Language 63, 5 (2020), 101017.Google Scholar
Cross Ref
- [96] . 2021. Biomedical data-to-text generation via fine-tuning transformers. In Proceedings of the 14th International Conference on Natural Language Generation. Association for Computational Linguistics, Aberdeen, Scotland, UK, 364–370.Google Scholar
- [97] . 2017. Syllable-level neural language model for agglutinative language. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, Copenhagen, Denmark, 92–96.Google Scholar
Cross Ref
- [98] . 2019. A comparative study of author gender identification. Turkish Journal of Electrical Engineering and Computer Science 27, 2 (2019), 1052–1064.Google Scholar
Cross Ref
- [99] . 2014. Chinese poetry generation with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Doha, Qatar, 670–680.Google Scholar
Cross Ref
- [100] . 2018. End-to-end dense video captioning with masked transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 8739–8748.Google Scholar
Cross Ref
- [101] . 2019. Multi-task learning for natural language generation in task-oriented dialogue. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 1261–1266.Google Scholar
Cross Ref
- [102] . 2016. A survey to text summarization methods for Turkish. International Journal of Computer Applications 144, 6 (2016), 23–28.Google Scholar
Cross Ref
- [103] . 2019. MSVD-Turkish: A large-scale dataset for video captioning in Turkish. In Proceedings of the 27th Signal Processing and Communications Applications Conference (SIU). IEEE, Sivas, Turkey.Google Scholar
Index Terms
Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks
Recommendations
A benchmark dataset for Turkish data-to-text generation
AbstractIn the last decades, data-to-text (D2T) systems that directly learn from data have gained a lot of attention in natural language generation. These systems need data with high quality and large volume, but unfortunately some natural ...
Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language ProcessingIn this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and ...
Context-Dependent Sequence-to-Sequence Turkish Spelling Correction
In this article, we make use of sequence-to-sequence (seq2seq) models for spelling correction in the agglutinative Turkish language. In the baseline system, misspelled and target words are split into their letters and the letter sequences are fed into ...






Comments