10.1145/2361354.2361365acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

A methodology for evaluating algorithms for table understanding in PDF documents

Online:04 September 2012Publication History

ABSTRACT

This paper presents a methodology for the evaluation of table understanding algorithms for PDF documents. The evaluation takes into account three major tasks: table detection, table structure recognition and functional analysis. We provide a general and flexible output model for each task along with corresponding evaluation metrics and methods. We also present a methodology for collecting and ground-truthing PDF documents based on consensus-reaching principles and provide a publicly available ground-truthed dataset.

References

  1. M. J. Cafarella, A. Halevy, and J. Madhavan. Structured data on the web. Commun. ACM, 54(2):72--79, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. C. e Silva. Metrics for evaluating performance in document analysis: application to tables. IJDAR, 14(1):101--109, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Hassan. Towards a common evaluation strategy for table structure recognition algorithms. In Proc. of DocEng, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. IJDAR, 4(3):140--153, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. of ICDAR, pages 129--133, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Hurst. The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh, 2000.Google ScholarGoogle Scholar
  7. M. Hurst. A constraint-based approach to table structure derivation. In Proc. of ICDAR, pages 911--915, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. of ICDAR, pages 1232--1236, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. of SIGIR, pages 246--254, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Oro and M. Ruffolo. PDF-TREX: An approach for recognizing and extracting tables from PDF documents. In Proc. of ICDAR, pages 906--910, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. T. Phillips. User's reference manual for the uw english/technical document image database III. Technical report, Seattle University, 1996.Google ScholarGoogle Scholar
  12. A. Shahab, F. Shafait, T. Kieninger, and A. Dengel. An open approach towards the benchmarking of table structure recognition systems. In Proc. of DAS, pages 113--120, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Wang. Tabular Abstraction, Editing and Formatting. PhD thesis, University of Waterloo, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A methodology for evaluating algorithms for table understanding in PDF documents

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!