Concepts inTable detection in heterogeneous documents
Table (information)
A table is a means of arranging data in rows and columns. Location Production (tons) % of goal North 40 87 102% South 40 93 110% The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs and many other places. The precise conventions and terminology for describing tables varies depending on the context.
more from Wikipedia
Page layout
Page layout is the part of graphic design that deals in the arrangement and style treatment of elements on a page.
more from Wikipedia
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records.
more from Wikipedia
Tesseract (software)
Tesseract is a free software optical character recognition engine for various operating systems. Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett Packard and UNLV. Tesseract development has been sponsored by Google since 2006. It is released under the Apache License, Version 2.0.
more from Wikipedia