ABSTRACT
This paper presents a methodology for the evaluation of table understanding algorithms for PDF documents. The evaluation takes into account three major tasks: table detection, table structure recognition and functional analysis. We provide a general and flexible output model for each task along with corresponding evaluation metrics and methods. We also present a methodology for collecting and ground-truthing PDF documents based on consensus-reaching principles and provide a publicly available ground-truthed dataset.
References
- M. J. Cafarella, A. Halevy, and J. Madhavan. Structured data on the web. Commun. ACM, 54(2):72--79, 2011. Google Scholar
Digital Library
- A. C. e Silva. Metrics for evaluating performance in document analysis: application to tables. IJDAR, 14(1):101--109, 2011. Google Scholar
Digital Library
- T. Hassan. Towards a common evaluation strategy for table structure recognition algorithms. In Proc. of DocEng, 2010. Google Scholar
Digital Library
- J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. IJDAR, 4(3):140--153, 2002.Google Scholar
Cross Ref
- J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. of ICDAR, pages 129--133, 2001. Google Scholar
Digital Library
- M. Hurst. The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh, 2000.Google Scholar
- M. Hurst. A constraint-based approach to table structure derivation. In Proc. of ICDAR, pages 911--915, 2003. Google Scholar
Digital Library
- T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. of ICDAR, pages 1232--1236, 2005. Google Scholar
Digital Library
- D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. of SIGIR, pages 246--254, 1995. Google Scholar
Digital Library
- E. Oro and M. Ruffolo. PDF-TREX: An approach for recognizing and extracting tables from PDF documents. In Proc. of ICDAR, pages 906--910, 2009. Google Scholar
Digital Library
- I. T. Phillips. User's reference manual for the uw english/technical document image database III. Technical report, Seattle University, 1996.Google Scholar
- A. Shahab, F. Shafait, T. Kieninger, and A. Dengel. An open approach towards the benchmarking of table structure recognition systems. In Proc. of DAS, pages 113--120, 2010. Google Scholar
Digital Library
- X. Wang. Tabular Abstraction, Editing and Formatting. PhD thesis, University of Waterloo, 1996. Google Scholar
Digital Library
Index Terms
A methodology for evaluating algorithms for table understanding in PDF documents





Comments