ABSTRACT
Table spotting and structural analysis are just a small fraction of tasks relevant when speaking of table analysis. Today, quite a large number of different approaches facing these tasks have been described in literature or are available as part of commercial OCR systems that claim to deal with tables on the scanned documents and to treat them accordingly.
However, the problem of detecting tables is not yet solved at all. Different approaches have different strengths and weak points. Some fail in certain situations or layouts where others perform better. How shall one know, which approach or system is the best for his specific job? The answer to this question raises the demand for an objective comparison of different approaches which address the same task of spotting tables and recognizing their structure.
This paper describes our approach towards establishing a complete and publicly available, hence open environment for the benchmarking of table spotting and structural analysis. We provide free access to the ground truthing tool and evaluation mechanism described in this paper, describe the ideas behind and we also provide ground truth for the 547 documents of the UNLV and UW-3 datasets that contain tables.
In addition, we applied the quality measures to the results that were generated by the T-Recs system which we developed some years ago and which we started to further advance since a few months.
References
- http://www.isri.unlv.edu/ISRI/OCRtk.Google Scholar
- http://www.nuance.com/imaging/products/omnipage.asp.Google Scholar
- http://www.dfki.uni-kl.de/shahab/t-truth.Google Scholar
- T. M. Breuel. Representations and metrics for off-line handwriting segmentation. In Proc. 8th Int. Workshop on Frontiers in Handwriting Recognition, pages 428--433, Ontario, Canada, Aug. 2002. Google Scholar
Digital Library
- T. M. Breuel. The OCRopus open source OCR system. In Proc. SPIE Document Recognition and Retrieval XV, pages 0F1--0F15, San Jose, CA, USA, Jan. 2008.Google Scholar
Cross Ref
- A. Costa e Silva. New metrics for evaluating performance in document analysis tasks - application to the table case. In Proc. ICDAR'07, pages 481--485, Washington, DC, USA, 2007. IEEE Computer Society. Google Scholar
Digital Library
- D. W. Embley, M. Hurst, D. Lopresti, and G. Nagy. Table-processing paradigms: a research survey. IJDAR'06, 8(2):66--86, 2006.Google Scholar
Cross Ref
- B. Gatos, D. Danatsas, I. Pratikakis, and S. J. Perantonis. Automatic table detection in document images. In Proc. Int. Conf. on Advances in Pattern Recognition, pages 612--621, Path, UK, Aug. 2005. Google Scholar
Digital Library
- I. Guyon, R. M. Haralick, J. J. Hull, and I. T. Phillips. Data sets for OCR and document image understanding research. In H. Bunke and P. Wang, editors, Handbook of character recognition and document image analysis, pages 779--799. World Scientific, Singapore, 1997.Google Scholar
- A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke, D. B. Goldgof, K. Bowyer, D. W. Eggert, A. Fitzgibbon, and R. B. Fisher. An experimental comparison of range image segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(7):673--689, 1996. Google Scholar
Digital Library
- J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In Proc. SPIE Document Recognition and Retrieval VII, pages 291--302, San Jose, CA, USA, Jan. 2000.Google Scholar
- J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. Int. Conf. on Document Analysis and Recognition, pages 129--133, Seattle, WA, USA, Sep. 2001. Google Scholar
Digital Library
- J. Hu, R. S. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. IJDAR'02, 4(3):140--153, 2002.Google Scholar
Cross Ref
- X. Jiang, C. Marti, C. Irniger, and H. Bunke. Distance measures for image segmentation evaluation. EURASIP Journal on Applied Signal Processing, 2006(1): Article ID 35909, 10 pages, 2006. Google Scholar
Digital Library
- T. Kieninger and A. Dengel. A paper-to-HTML table converting system. In Proc. DAS'98, pages 356--365, Nagano, Japan, Nov. 1998.Google Scholar
- T. Kieninger and A. Dengel. Applying the T-RECS table recognition system to the business letter domain. In Proc. ICDAR'01, pages 518--522, Seattle, WA, USA, Sep. 2001. Google Scholar
Digital Library
- T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. 8th ICDAR'05, pages 1232--1236, Seoul, Korea, Aug. 2005. Google Scholar
Digital Library
- B. Klein, S. Gökkus, T. Kieninger, and A. Dengel. Three approaches to "industrial" table spotting. In ICDAR'01, volume 0, CA, USA, 2001. Google Scholar
Digital Library
- D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. of the 18th annual Intl. ACM SIGIR conf. on Research and development in information retrieval, pages 246--254, New York, NY, USA, 1995. ACM. Google Scholar
Digital Library
- S. Mandal, S. Chowdhury, A. Das, and B. Chanda. A simple and effective table detection system from document images. IJDAR'06, 8(2--3):172--182, 2006.Google Scholar
Cross Ref
- I. T. Phillips. User's reference manual for the UW English/Technical Document Image Database III. Technical report, Seattle University, Washington, 1996.Google Scholar
- S. V. Rice, F. R. Jenkins, and T. A. Nartker. The fourth annual test of OCR accuracy. Technical report, Information Science Research Institute, University of Nevada, Las Vegas, 1995.Google Scholar
- F. Shafait, D. Keysers, and T. M. Breuel. Performance evaluation and benchmarking of six page segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(6):941--954, 2008. Google Scholar
Digital Library
- Y. Wang, R. Haralick, and I. T. Phillips. Automatic table ground truth generation and a background analysis based table structure extraction method. In Proc. ICDAR'01, pages 528--532, Seattle, WA, USA, Sep. 2001. Google Scholar
Digital Library
- R. Zanibbi, D. Blostein, and J. R. Cordy. A survey of table recognition: Models, observations, transformations, and inferences. IJDAR'04, 7(1):1--16, 2004. Google Scholar
Digital Library
Index Terms
An open approach towards the benchmarking of table structure recognition systems




Comments