skip to main content
research-article

Approaches to Temporal Expression Recognition in Hindi

Published:30 January 2015Publication History
Skip Abstract Section

Abstract

Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed. First, a rule-based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach performs with a strict F1-measure of 0.83. In another approach, a CRF-based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the time expressions from plain text and further classifies them to various classes. This approach performs with a strict F1-measure of 0.78. Next, the CRF is replaced by an SVM-based classifier and the same experiment is performed with the same features. This approach is shown to be comparable to the CRF and performs with a strict F1-measure of 0.77. Using the rule base information as an additional feature enhances the performances to 0.86 and 0.84 for the CRF and SVM respectively. With three different comparable systems performing the extraction task, merging them to take advantage of their positives is the next step. As the first merge experiment, rule-based tagged data is fed to the CRF and SVM classifiers as additional training data. Evaluation results report an increase in F1-measure of the CRF from 0.78 to 0.8. Second, a voting-based approach is implemented, which chooses the best class for each token from the outputs of the three approaches. This approach results in the best performance for this task with a strict F1-measure of 0.88. In this process a reusable gold standard dataset for temporal tagging in Hindi is also developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ahn, D., Rantwijk, J., and Rijke, M. 2007. A cascaded machine learning approach to interpreting temporal expressions. In Proceedings of HLT-NAACL. 420--427.Google ScholarGoogle Scholar
  2. Allan, J., Gupta, R., and Khandelwal, V. 2001. Temporal summaries of new topics. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alonso, O. Strötgen, J., Baeza-Yates, R. A., and Gertz, M. 2011. Temporal information retrieval: Challenges and opportunities. In Proceedings of the International Temporal Web Analytics Workshop (TWAW’11). 1--8.Google ScholarGoogle Scholar
  4. Bharati, A., Chaitanya, V., Sangal, R., and Ramakrishnamacharyulu, K. V. 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi.Google ScholarGoogle Scholar
  5. Cunningham, H. 2002. GATE, a general architecture for text engineering. Comput. Humanities 36, 2, 223--254.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ekbal, A. and Saha S. 2011. Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithm-based approach. ACM Trans. Asian Lang. Info. Process. 10, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hacioglu, K., Chen, Y., and Douglas, B. 2005. Automatic time expression labeling for English and Chinese text. In Proceedings of Computational Linguistics and Intelligent Text Processing. 548--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jha, G. N. 2010. The TDIL program and the Indian language corpora intitiative (ILCI). In Proceedings of LREC.Google ScholarGoogle Scholar
  9. Kodu, T. 2005a. CRF++: Yet another CRF toolkit. http://crfpp.googlecode.com/svn/trunk/doc/index.html.Google ScholarGoogle Scholar
  10. Kodu, T. 2005b. YamCha: Yet another multipurpose CHunk annotator. http://www.chasen.org/~taku/software/yamcha/.Google ScholarGoogle Scholar
  11. Mani, I. and Schiffman, B. 2005. Temporally anchoring and ordering events in news. In Time and Event Recognition in Natural Language, John Benjamins.Google ScholarGoogle Scholar
  12. Mani, I., Wilson, G., Sundheim, B., and Ferro, L. 2001. A multilingual approach to annotating and extracting temporal information. In Proceedings of the Workshop on Temporal and Spatial Information Processing, Vol. 1. Association for Computational Linguistics, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mazur, P. 2008. TIMEX Portal. http://www.timexportal.info/.Google ScholarGoogle Scholar
  14. Mazur, P. and Dale, R. 2006. An intermediate representation for the interpretation of temporal expressions. In Proceedings of the COLING/ACL Interactive Presentation Sessions. Association for Computational Linguistics, 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mazur, P. and Dale, R. 2009. The DANTE temporal expression tagger. In Human Language Technology. Challenges of the Information Society, Springer. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MITRE-Corporation. 2001. TIDES Temporal Annotation Guide. The MITRE Corporation.Google ScholarGoogle Scholar
  17. MITRE-Corporation. 2005. Standard for the Annotation of Temporal Expressions. The MITRE Corporation.Google ScholarGoogle Scholar
  18. MUC-7. 1998. Message Understanding Conference. In Proceedings of the 7th Message Understanding Conference. DARPA.Google ScholarGoogle Scholar
  19. M. Negri and L. Marseglia. 2004. Recognition and normalization of time expressions: ITC-irst at TERN. Rapport Interne, ITC-irst, Trento.Google ScholarGoogle Scholar
  20. NIST. 2004a. Automatic content extraction 2004. http://www.itl.nist.gov/iad/mig/tests/ace/2004/index.html.Google ScholarGoogle Scholar
  21. NIST. 2004b. The ACE 2004 evaluation plan. http://www.itl.nist.gov/iad/mig/tests/ace/2004/doc/ace04-evalplan-v7.pdf.Google ScholarGoogle Scholar
  22. Oracle Corporation. 2012. Lessons: Regular expressions. http://docs.oracle.com/javase/tutorial/essential/regex/index.html.Google ScholarGoogle Scholar
  23. Palchowdhury, S., Majumder, P., Pal, D., Bandyopadhyay, A., and Mitra, M. 2013. Overview of FIRE 2011. In Multilingual Information Access in South Asian Languages, Springer, 1--12.Google ScholarGoogle Scholar
  24. Pustejovsky, J. 2002. TERQAS: Time and Event Recognition for Question Answering Systems. In Proceedings of the ARDA Workshop.Google ScholarGoogle Scholar
  25. Pustejovsky, J., Castano, J., Ingria, R., Sauri, R., Gaizauskas, R., Setzer, A., Katz, G., and Radev, D. 2003. TimeML: Robust specification of event and temporal expressions in text. In New Directions in Question Answering, 28--34.Google ScholarGoogle Scholar
  26. Ramrakhiyani, N. and Majumder, P. 2013. Temporal expression recognition in Hindi. In Mining Intelligence and Knowledge Exploration. Springer, 740--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Saha, S. and Ekbal, A. 2012. Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Engin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Saquete, E., Muñoz, R., and Martínez-Barco, P. 2006. Event ordering using TERSEO system. Data Knowl. Engin. 58, 1, 70--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shokouhi, M. 2012. SIGIR Workshop on Time Aware Information Access. http://research.microsoft.com/en-us/people/milads/taia2012.aspx.Google ScholarGoogle Scholar
  30. Strötgen, J., Armiti, A., Van Canh, T., Zell, J., and Gertz, M. 2014. Time for more languages: Temporal tagging of Arabic, Italian, Spanish, and Vietnamese. ACM Trans. Asian Lang. Inform. Process. 13, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Strötgen J. and Gertz, M. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Strötgen J. and Gertz, M. 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47, 2, 269--298.Google ScholarGoogle ScholarCross RefCross Ref
  33. Swan, R. and Allan, J. 2000. Automatic generation of overview timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. TDIL. 2011. POS Tagger for Hindi Language. http://tdil-dc.in/index.php?option=com_up-download&task=view-download-tool&view=&toolid=558.Google ScholarGoogle Scholar
  35. UzZaman, N., Llorens, H., Derczynski, L., Verhagen, M., Allen, J., and Pustejovsky, J. 2013. SemEval Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval’13) in conjunction with the 2nd Joint Conference on Lexical and Computational Semantcis (* SEM ’13). Association for Computational Linguistics, June.Google ScholarGoogle Scholar
  36. Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Moszkowicz, J., and Pustejovsky, J. 2009. The TempEval challenge: Identifying temporal relations in text. Lang. Resources Eval. (Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond) 43, 2, 161--179.Google ScholarGoogle Scholar
  37. Verhagen, M., Sauri, R., Caselli, T., and Pustejovsky, J. 2010. SemEval-2010 task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wikipedia. 2014. Hindi. http://en.wikipedia.org/wiki/Hindi.Google ScholarGoogle Scholar
  39. Xu, K., Hong, K., Tsujii, J., Eric, I., and Chang, C. 2012. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Amer. Medical Informatics Assoc. 19, 5, 824--832.Google ScholarGoogle ScholarCross RefCross Ref
  40. Xu, X., Jones, G., Li, J., Wang, B., and Sun, C. 2007. A study on mutual information-based feature selection for text categorization. J. Computational Info. Syst. 3, 3, 1007--1012.Google ScholarGoogle Scholar

Index Terms

  1. Approaches to Temporal Expression Recognition in Hindi

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 14, Issue 1
      January 2015
      83 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2730923
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 January 2015
      • Accepted: 1 May 2014
      • Revised: 1 March 2014
      • Received: 1 December 2013
      Published in tallip Volume 14, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!