10.1145/1815330.1815345acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

An open approach towards the benchmarking of table structure recognition systems

Authors Info & Claims
Online:09 June 2010Publication History

ABSTRACT

Table spotting and structural analysis are just a small fraction of tasks relevant when speaking of table analysis. Today, quite a large number of different approaches facing these tasks have been described in literature or are available as part of commercial OCR systems that claim to deal with tables on the scanned documents and to treat them accordingly.

However, the problem of detecting tables is not yet solved at all. Different approaches have different strengths and weak points. Some fail in certain situations or layouts where others perform better. How shall one know, which approach or system is the best for his specific job? The answer to this question raises the demand for an objective comparison of different approaches which address the same task of spotting tables and recognizing their structure.

This paper describes our approach towards establishing a complete and publicly available, hence open environment for the benchmarking of table spotting and structural analysis. We provide free access to the ground truthing tool and evaluation mechanism described in this paper, describe the ideas behind and we also provide ground truth for the 547 documents of the UNLV and UW-3 datasets that contain tables.

In addition, we applied the quality measures to the results that were generated by the T-Recs system which we developed some years ago and which we started to further advance since a few months.

References

  1. http://www.isri.unlv.edu/ISRI/OCRtk.Google ScholarGoogle Scholar
  2. http://www.nuance.com/imaging/products/omnipage.asp.Google ScholarGoogle Scholar
  3. http://www.dfki.uni-kl.de/shahab/t-truth.Google ScholarGoogle Scholar
  4. T. M. Breuel. Representations and metrics for off-line handwriting segmentation. In Proc. 8th Int. Workshop on Frontiers in Handwriting Recognition, pages 428--433, Ontario, Canada, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. M. Breuel. The OCRopus open source OCR system. In Proc. SPIE Document Recognition and Retrieval XV, pages 0F1--0F15, San Jose, CA, USA, Jan. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Costa e Silva. New metrics for evaluating performance in document analysis tasks - application to the table case. In Proc. ICDAR'07, pages 481--485, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. W. Embley, M. Hurst, D. Lopresti, and G. Nagy. Table-processing paradigms: a research survey. IJDAR'06, 8(2):66--86, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Gatos, D. Danatsas, I. Pratikakis, and S. J. Perantonis. Automatic table detection in document images. In Proc. Int. Conf. on Advances in Pattern Recognition, pages 612--621, Path, UK, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Guyon, R. M. Haralick, J. J. Hull, and I. T. Phillips. Data sets for OCR and document image understanding research. In H. Bunke and P. Wang, editors, Handbook of character recognition and document image analysis, pages 779--799. World Scientific, Singapore, 1997.Google ScholarGoogle Scholar
  10. A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke, D. B. Goldgof, K. Bowyer, D. W. Eggert, A. Fitzgibbon, and R. B. Fisher. An experimental comparison of range image segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(7):673--689, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In Proc. SPIE Document Recognition and Retrieval VII, pages 291--302, San Jose, CA, USA, Jan. 2000.Google ScholarGoogle Scholar
  12. J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. Int. Conf. on Document Analysis and Recognition, pages 129--133, Seattle, WA, USA, Sep. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Hu, R. S. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. IJDAR'02, 4(3):140--153, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. X. Jiang, C. Marti, C. Irniger, and H. Bunke. Distance measures for image segmentation evaluation. EURASIP Journal on Applied Signal Processing, 2006(1): Article ID 35909, 10 pages, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Kieninger and A. Dengel. A paper-to-HTML table converting system. In Proc. DAS'98, pages 356--365, Nagano, Japan, Nov. 1998.Google ScholarGoogle Scholar
  16. T. Kieninger and A. Dengel. Applying the T-RECS table recognition system to the business letter domain. In Proc. ICDAR'01, pages 518--522, Seattle, WA, USA, Sep. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. 8th ICDAR'05, pages 1232--1236, Seoul, Korea, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Klein, S. Gökkus, T. Kieninger, and A. Dengel. Three approaches to "industrial" table spotting. In ICDAR'01, volume 0, CA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. of the 18th annual Intl. ACM SIGIR conf. on Research and development in information retrieval, pages 246--254, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Mandal, S. Chowdhury, A. Das, and B. Chanda. A simple and effective table detection system from document images. IJDAR'06, 8(2--3):172--182, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  21. I. T. Phillips. User's reference manual for the UW English/Technical Document Image Database III. Technical report, Seattle University, Washington, 1996.Google ScholarGoogle Scholar
  22. S. V. Rice, F. R. Jenkins, and T. A. Nartker. The fourth annual test of OCR accuracy. Technical report, Information Science Research Institute, University of Nevada, Las Vegas, 1995.Google ScholarGoogle Scholar
  23. F. Shafait, D. Keysers, and T. M. Breuel. Performance evaluation and benchmarking of six page segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(6):941--954, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Wang, R. Haralick, and I. T. Phillips. Automatic table ground truth generation and a background analysis based table structure extraction method. In Proc. ICDAR'01, pages 528--532, Seattle, WA, USA, Sep. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Zanibbi, D. Blostein, and J. R. Cordy. A survey of table recognition: Models, observations, transformations, and inferences. IJDAR'04, 7(1):1--16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An open approach towards the benchmarking of table structure recognition systems

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!