Abstract
Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed. First, a rule-based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach performs with a strict F1-measure of 0.83. In another approach, a CRF-based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the time expressions from plain text and further classifies them to various classes. This approach performs with a strict F1-measure of 0.78. Next, the CRF is replaced by an SVM-based classifier and the same experiment is performed with the same features. This approach is shown to be comparable to the CRF and performs with a strict F1-measure of 0.77. Using the rule base information as an additional feature enhances the performances to 0.86 and 0.84 for the CRF and SVM respectively. With three different comparable systems performing the extraction task, merging them to take advantage of their positives is the next step. As the first merge experiment, rule-based tagged data is fed to the CRF and SVM classifiers as additional training data. Evaluation results report an increase in F1-measure of the CRF from 0.78 to 0.8. Second, a voting-based approach is implemented, which chooses the best class for each token from the outputs of the three approaches. This approach results in the best performance for this task with a strict F1-measure of 0.88. In this process a reusable gold standard dataset for temporal tagging in Hindi is also developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.
Supplemental Material
Available for Download
The proof is given in an electronic appendix, available online in the ACM Digital Library.
- Ahn, D., Rantwijk, J., and Rijke, M. 2007. A cascaded machine learning approach to interpreting temporal expressions. In Proceedings of HLT-NAACL. 420--427.Google Scholar
- Allan, J., Gupta, R., and Khandelwal, V. 2001. Temporal summaries of new topics. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 10--18. Google Scholar
Digital Library
- Alonso, O. Strötgen, J., Baeza-Yates, R. A., and Gertz, M. 2011. Temporal information retrieval: Challenges and opportunities. In Proceedings of the International Temporal Web Analytics Workshop (TWAW’11). 1--8.Google Scholar
- Bharati, A., Chaitanya, V., Sangal, R., and Ramakrishnamacharyulu, K. V. 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi.Google Scholar
- Cunningham, H. 2002. GATE, a general architecture for text engineering. Comput. Humanities 36, 2, 223--254.Google Scholar
Cross Ref
- Ekbal, A. and Saha S. 2011. Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithm-based approach. ACM Trans. Asian Lang. Info. Process. 10, 2. Google Scholar
Digital Library
- Hacioglu, K., Chen, Y., and Douglas, B. 2005. Automatic time expression labeling for English and Chinese text. In Proceedings of Computational Linguistics and Intelligent Text Processing. 548--559. Google Scholar
Digital Library
- Jha, G. N. 2010. The TDIL program and the Indian language corpora intitiative (ILCI). In Proceedings of LREC.Google Scholar
- Kodu, T. 2005a. CRF++: Yet another CRF toolkit. http://crfpp.googlecode.com/svn/trunk/doc/index.html.Google Scholar
- Kodu, T. 2005b. YamCha: Yet another multipurpose CHunk annotator. http://www.chasen.org/~taku/software/yamcha/.Google Scholar
- Mani, I. and Schiffman, B. 2005. Temporally anchoring and ordering events in news. In Time and Event Recognition in Natural Language, John Benjamins.Google Scholar
- Mani, I., Wilson, G., Sundheim, B., and Ferro, L. 2001. A multilingual approach to annotating and extracting temporal information. In Proceedings of the Workshop on Temporal and Spatial Information Processing, Vol. 1. Association for Computational Linguistics, 12. Google Scholar
Digital Library
- Mazur, P. 2008. TIMEX Portal. http://www.timexportal.info/.Google Scholar
- Mazur, P. and Dale, R. 2006. An intermediate representation for the interpretation of temporal expressions. In Proceedings of the COLING/ACL Interactive Presentation Sessions. Association for Computational Linguistics, 33--36. Google Scholar
Digital Library
- Mazur, P. and Dale, R. 2009. The DANTE temporal expression tagger. In Human Language Technology. Challenges of the Information Society, Springer. 245--257. Google Scholar
Digital Library
- MITRE-Corporation. 2001. TIDES Temporal Annotation Guide. The MITRE Corporation.Google Scholar
- MITRE-Corporation. 2005. Standard for the Annotation of Temporal Expressions. The MITRE Corporation.Google Scholar
- MUC-7. 1998. Message Understanding Conference. In Proceedings of the 7th Message Understanding Conference. DARPA.Google Scholar
- M. Negri and L. Marseglia. 2004. Recognition and normalization of time expressions: ITC-irst at TERN. Rapport Interne, ITC-irst, Trento.Google Scholar
- NIST. 2004a. Automatic content extraction 2004. http://www.itl.nist.gov/iad/mig/tests/ace/2004/index.html.Google Scholar
- NIST. 2004b. The ACE 2004 evaluation plan. http://www.itl.nist.gov/iad/mig/tests/ace/2004/doc/ace04-evalplan-v7.pdf.Google Scholar
- Oracle Corporation. 2012. Lessons: Regular expressions. http://docs.oracle.com/javase/tutorial/essential/regex/index.html.Google Scholar
- Palchowdhury, S., Majumder, P., Pal, D., Bandyopadhyay, A., and Mitra, M. 2013. Overview of FIRE 2011. In Multilingual Information Access in South Asian Languages, Springer, 1--12.Google Scholar
- Pustejovsky, J. 2002. TERQAS: Time and Event Recognition for Question Answering Systems. In Proceedings of the ARDA Workshop.Google Scholar
- Pustejovsky, J., Castano, J., Ingria, R., Sauri, R., Gaizauskas, R., Setzer, A., Katz, G., and Radev, D. 2003. TimeML: Robust specification of event and temporal expressions in text. In New Directions in Question Answering, 28--34.Google Scholar
- Ramrakhiyani, N. and Majumder, P. 2013. Temporal expression recognition in Hindi. In Mining Intelligence and Knowledge Exploration. Springer, 740--750. Google Scholar
Digital Library
- Saha, S. and Ekbal, A. 2012. Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Engin. Google Scholar
Digital Library
- Saquete, E., Muñoz, R., and Martínez-Barco, P. 2006. Event ordering using TERSEO system. Data Knowl. Engin. 58, 1, 70--89. Google Scholar
Digital Library
- Shokouhi, M. 2012. SIGIR Workshop on Time Aware Information Access. http://research.microsoft.com/en-us/people/milads/taia2012.aspx.Google Scholar
- Strötgen, J., Armiti, A., Van Canh, T., Zell, J., and Gertz, M. 2014. Time for more languages: Temporal tagging of Arabic, Italian, Spanish, and Vietnamese. ACM Trans. Asian Lang. Inform. Process. 13, 1. Google Scholar
Digital Library
- Strötgen J. and Gertz, M. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324. Google Scholar
Digital Library
- Strötgen J. and Gertz, M. 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47, 2, 269--298.Google Scholar
Cross Ref
- Swan, R. and Allan, J. 2000. Automatic generation of overview timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google Scholar
Digital Library
- TDIL. 2011. POS Tagger for Hindi Language. http://tdil-dc.in/index.php?option=com_up-download&task=view-download-tool&view=&toolid=558.Google Scholar
- UzZaman, N., Llorens, H., Derczynski, L., Verhagen, M., Allen, J., and Pustejovsky, J. 2013. SemEval Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval’13) in conjunction with the 2nd Joint Conference on Lexical and Computational Semantcis (* SEM ’13). Association for Computational Linguistics, June.Google Scholar
- Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Moszkowicz, J., and Pustejovsky, J. 2009. The TempEval challenge: Identifying temporal relations in text. Lang. Resources Eval. (Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond) 43, 2, 161--179.Google Scholar
- Verhagen, M., Sauri, R., Caselli, T., and Pustejovsky, J. 2010. SemEval-2010 task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 57--62. Google Scholar
Digital Library
- Wikipedia. 2014. Hindi. http://en.wikipedia.org/wiki/Hindi.Google Scholar
- Xu, K., Hong, K., Tsujii, J., Eric, I., and Chang, C. 2012. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Amer. Medical Informatics Assoc. 19, 5, 824--832.Google Scholar
Cross Ref
- Xu, X., Jones, G., Li, J., Wang, B., and Sun, C. 2007. A study on mutual information-based feature selection for text categorization. J. Computational Info. Syst. 3, 3, 1007--1012.Google Scholar
Index Terms
Approaches to Temporal Expression Recognition in Hindi
Recommendations
Temporal Expression Recognition in Hindi
MIKE 2013: Proceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 8284Temporal annotation of plain text is considered as a useful component of modern information retrieval tasks. In this work, two approaches for identification and classification of temporal entities in Hindi are developed and analyzed. Firstly, a rule ...
Learning Recognition of Ambiguous Proper Names in Hindi
ICMLA '11: Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01An ambiguous proper name is a name which is also a valid dictionary word with a meaning of its own when used in the text. For example in English, the word 'bush' in 'Mr. Bush' is a proper name whereas in 'a dense bush' it is a lexical entity. Almost all ...
Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & ComputingWord Sense Disambiguation (WSD) is a vital task which provides the definition of particular words according to their sense or according to given context. Lesk algorithm is originally based on the gloss overlap that can be observed as the measure, ...






Comments