We welcome you to the Second International Workshop on Historical Document Imaging and Processing (HIP'13), held in conjunction with ICDAR 2013. HIP'11 was a gratifying and overwhelming success as attendees from six continents assembled in Beijing to share their work. The proceedings of HIP'11 have become part of the ACM Digital Library. The proceedings of HIP'13 will also be included in the ACM Digital Library.
As the workshop organizers, we were very pleased with both the quality and quantity of submissions. The 31 papers submitted represented sixteen different countries, some with shared authorship (Brazil (2), Canada (1), China (1), France (4), Germany (2), Greece (2), India (1), Ireland (1), Israel (2), Italy (1), Japan (2), Spain (4), Sweden (1), Tunisia (1), UK (2), US (6). This is certainly a strong indication of the growing worldwide interest in this topic. Each paper was peer-reviewed by at least two knowledgeable researchers in the field. We were pleased to be able to accept 18 papers from eleven different countries (for a 58% acceptance rate). We regret that we were not able to accept some good papers due to time constraints and the workshop focus.
The majority of early photographs were captured on acetate-based film. However, it has been determined that these negatives will deteriorate beyond repair even with proper conservation and no suitable restoration method is available without physically ...
Accurate registration of document recto and verso sides with bleed-through degradation is essential for accurate automatic non-blind bleed-through removal. This paper presents a registration method for documents with bleed-through degradation, and also ...
Digitization of historical documents is extremely useful as it allows easy access to the documents from remote locations and removes the need for potentially harmful physical handling. Traditional imaging methods are unsuitable for documents with ...
Historical documents are invaluable to study the society and culture in old ages everywhere in the world. In Japan, unearthed wooden tablets called Mokkan excavated from ancient palace sites and so on in the Nara period provide important clues to know ...
This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations ...
Natural languages can often be modelled by suitable grammars whose knowledge can improve the word spotting results. The implicit contextual information is even more useful when dealing with information that is intrinsically described as one collection ...
A method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would contribute to making a variety of historical knowledge machine searchable, queryable, and linkable. To work well, such a process must be ...
In this paper, we present the Multi Angular Descriptor (MAD), a new shape descriptor for shape based object recognition and image retrieval. In the binary case, the MAD descriptor captures the angular view to multi resolution rings from each contour ...
Some of the sliding window features commonly used in off-line handwritten text recognition are inherently noisy or sensitive to image noise. In this paper, we investigate the effects of several de-noising filters applied in the feature space and not in ...
Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of ...
Historical documents pose challenging problems for training handwriting recognition systems. Besides the high variability of character shapes inherent to all handwriting, the image quality can also differ greatly, for instance due to faded ink, ink ...
Historical text presents numerous challenges for contemporary different techniques, e.g. information retrieval, OCR and POS tagging. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any ...
Our previous work has shown that the error correction of optical character recognition (OCR) on degraded historical machine-printed documents is improved with the use of multiple information sources and multiple OCR hypotheses including from multiple ...
Renaissance portraits were depictions of some important royals of those times. Analysis of faces in these portraits can provide valuable dynastical information in addition to enriching personal details of the depicted sitter. Such studies can offer ...
Texture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmentation has become a ...
We present a method to segment historical document images into regions of different content. First, we segment text elements from non-text elements using a binarized version of the document. Then, we refine the segmentation of the non-text regions into ...
This paper proposes a character segmentation and retrieval method for a learning support system that analyzes digitized Japanese historical woodblock printed books. The proposed system detects text lines, segments characters, and retrieves similar ...
Representative and comprehensive datasets are a prerequisite for any research activity, from studying specific types of problems through training of algorithms to evaluating results of actual implementations. This paper describes an invaluable resource ...