Abstract
For identifying speakers of quoted speech or extracting social networks from literature, it is indispensable to extract character names and nominals. However, detecting proper nouns in the novels translated into or written in Korean is harder than in English because Korean does not have a capitalization feature. In addition, it is almost impossible for any proper noun dictionary to include all kinds of character names that have been created or will be created by authors. Fortunately, a previous study shows that utilizing postpositions for animate nouns is a simple and effective tool for character identification in Korean novels without a proper noun dictionary and a training corpus. In this article, we propose a character identification method utilizing the semantic relation with known animate nouns. For 80 novels in Korean, the proposed method increases the micro- and macro-average recall by 13.68% and 11.86%, respectively, while decreasing the micro-average precision by 0.28% and increasing the macro-average precision by 0.07% compared to the previous study. If we focus on characters that are responsible for more than 1% of the character name mentions in each novel, the micro- and macro-average F-measure of the proposed method are 96.98% and 97.32%, respectively.
- D. K. Elson, N. Dames, and K. R. McKeown. 2010. Extracting social networks from literary fiction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). ACL, 138--147. Google Scholar
Digital Library
- D. K. Elson and K. McKeown. 2010. Automatic attribution of quoted speech in literary narrative. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10). AAAI, 1013--1019. Google Scholar
Digital Library
- J. R. Finkel, T. Grenager, and C. Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). ACL, 363--370. Google Scholar
Digital Library
- H. He, D. Barbosa, and G. Kondrak. 2013. Identification of speakers in novels. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). ACL, 1312--1320.Google Scholar
- E. Iosif and T. Mishra. 2014. From speaker identification to affective analysis: A multi-step system for analyzing children’s stories. In Proceedings of the 3rd Workshop on Computational Linguistics for Literature. ACL, 40--49.Google Scholar
- B. K. Kwak and J. W. Cha. 2005. Named Entity Tagging for Korean Using DL-CoTrain Algorithm. Lecture Notes in Computer Science, Vol. 3689. Springer-Verlag, New York, NY. 589--594 pages. Google Scholar
Digital Library
- D. Küçük and A. Yazici. 2012. A hybrid named entity recognizer for Turkish. Expert Systems with Applications 39, 3 (2012), 2733--2742. Google Scholar
Digital Library
- D. Lee, J. Yeon, I. Hwang, and S. G. Lee. 2010. KKMA: A tool for utilizing Sejong corpus based on relational database. Journal of KIISE: Computing Practices and Letters 16, 11 (2010), 1046--1050.Google Scholar
- E. Lee. 2009. Named entity detection and relation extraction in the personal chronology of the 19th century. Journal of EONEOHAG 53 (2009), 141--162.Google Scholar
- Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. 2014. The Stanford coreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. ACL, 55--60.Google Scholar
- George A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11 (1995), 39--41. Google Scholar
Digital Library
- D. Nadeau and S. Kekine. 2007. A survey of named entity recognition and classification. Lingvisticæ Investigationes 30, 1 (2007), 3--26.Google Scholar
Cross Ref
- The National Institute of the Korean Language. 2002. The Standard Dictionary of the Korean Language. Retrieved May 15, 2017 from http://stdweb2.korean.go.kr/.Google Scholar
- G. M. Park, S. H. Kim, and H. G. Cho. 2013. Analysis of social network according to the distance of characters statements. Journal of the Korea Contents Association 13, 4 (2013), 427--439.Google Scholar
Cross Ref
- S. Y. Park, Y. J. Kwak, H. C. Rim, and H. S. Lim. 2005. Feature-based Korean grammar utilizing learned constraint rules. Computational Intelligence 21, 1 (2005), 69--89.Google Scholar
Cross Ref
- T. Park and S. H. Kim. 2016. A character identification method using postpositions for animate nouns in Korean novels. Journal of Information Technology Services 15, 3 (2016), 115--125.Google Scholar
- G. Petasis, F. Vichot, and F. Wolinski. 2001. Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (ACL’01). ACL, Toulouse, France, 426--433. Google Scholar
Digital Library
- C. N. Seon, Y. Ko, J. S. Kim, and J. Seo. 2001. Named entity recognition using machine learning methods and pattern-selection rules. In Proceedings of the 6th Natural Language Processing Pacific Rim Symposium. NLPRS, Tokyo, Japan, 229--236.Google Scholar
- K. Shaalan and M. Oudah. 2014. A hybrid approach to Arabic named entity recognition. Journal of Information Science 40, 1 (2014), 67--87. Google Scholar
Digital Library
- SWRC. 1999. HanNanum. Retrieved May 15, 2017 from http://semanticweb.kaist.ac.kr/home/index.php/HanNanum.Google Scholar
- T. H. Tsai, S. H. Wu, C. W. Lee, C. W. Shih, and W. L. Hsu. 2004. Mencius: A Chinese named entity recognizer using maximum entropy-based hybrid model. International Journal of Computational Linguistics and Chinese Language Processing 9, 1 (2004), 65--82.Google Scholar
Index Terms
Novel Character Identification Utilizing Semantic Relation with Animate Nouns in Korean
Recommendations
Acquisition of Hypernymy-Hyponymy Relation between Nouns for WordNet Building
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingAutomatic extraction of hypernym-hyponym pairs has been done in many researches. But none is described as an automatic method to incorporate the result to Word Net or on Word Net building. This paper proposes a method to automatically acquire hypernym-...
Translating noun compounds using semantic relations
A hybrid scheme for noun compound translation from English to Hindi.A scheme for semantic relation identification is proposed for 2-word noun compounds.A scheme for generation of translation pattern(s) for 2/3/4-word noun compounds.Bracketing scheme for ...
Compound Noun Analysis for Process of Korean Unregistered Word
ICCIS '12: Proceedings of the 2012 Fourth International Conference on Computational and Information SciencesIn this paper, a new method of compound noun analysis is proposed. It uses decomposition model and unregistered words recognition. The latter contains loanword nouns, name nouns and place name. Loanword noun is recognized based on it's formed by ...






Comments