ABSTRACT
A number of methods for evaluating table structure recognition systems have been proposed in the literature, which have been used successfully for automatic and manual optimization of their respective algorithms. Unfortunately, the lack of standard, ground-truthed datasets coupled with the ambiguous nature of how humans interpret tabular data has made it difficult to compare the obtained results between different systems developed by different research groups.
With reference to these approaches, we describe our experiences in comparing our algorithm for table detection and structure recognition to another recently published system using a freely available dataset of 75 PDF documents. Based on examples from this dataset, we define several classes of errors and propose how they can be treated consistently to eliminate ambiguities and ensure the repeatability of the results and their comparability between different systems from different research groups.
References
- }}F. Cesarini, S. Marinai, L. Sarti, and G. Soda. Trainable table location in document images. In Proc. of ICPR 2002, Vol. 3, pp. 236--240, 2002.Google Scholar
Cross Ref
- }}T. Hassan. Evaluating Table Structure Recognition Algorithms. PRIP Technical Report #125, ftp://ftp.prip.tuwien.ac.at/pub/publications/trs/tr125.pdf July 201.Google Scholar
- }}T. Hassan and R. Baumgartner. Table recognition and understanding from PDF files. In Proc. of ICDAR 2007. vol. 2, pp. 1143--1147, 2007. Google Scholar
Digital Library
- }}J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Table structure recognition and its evaluation. In Proc. of DR VIII, 2001.Google Scholar
- }}J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. Intl. J. of Doc. Anal. and Recog., 4(3):140--153, March 2002.Google Scholar
Cross Ref
- }}J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. of ICDA. 2001, pp. 129--133, 2001. Google Scholar
Digital Library
- }}T. Kieninger and A. Dengel. Applying the T-Recs table recognition system to the business letter domain. In Proc. of ICDAR 2001, pp. 518--522, 2001. Google Scholar
Digital Library
- }}T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. of ICDAR 2005, pp. 1232--1236, 2005. Google Scholar
Digital Library
- }}M. Ruffolo and E. Oro. PDF-TREX: An approach for recognizing and extracting tables from PDF documents. In Proc. of ICDAR 2009, pp. 906--910, 2009. Google Scholar
Digital Library
- }}M. Ruffolo and E. Oro. PDF-TREX dataset. http://staff.icar.cnr.it/ruffolo/files/PDF-TREX/Dataset.zip accessed Sept. 2005.Google Scholar
- }}B. Yildiz, K. Kaiser, and S. Miksch. pdf2table: A method to extract table information from PDF files. In Proc. of Indian Intl. Conf. on AI 2005, pp. 1773--1785, 2005..Google Scholar
Index Terms
Towards a common evaluation strategy for table structure recognition algorithms





Comments