Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres

@inproceedings{Allen2008AutomatedPO,
  title={Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres},
  author={Robert B. Allen and Ilya Waldstein and Weizhong Zhu},
  booktitle={ICADL},
  year={2008}
}
Many historical newspapers are being digitized. We aim to support access to them via text analysis of the OCRd content. However, the OCR includes many errors; so extracting meaningful content from it is difficult. A pipeline of processing steps is proposed. Here, we describe the first two steps: segmentation and genre identification. The segmentation procedure based on headings was quite successful. Genre identification worked well for easily defined genre categories such as weather reports. We… CONTINUE READING

From This Paper

Figures and tables from this paper.
6 Citations
8 References
Similar Papers

References

Publications referenced by this paper.
Showing 1-8 of 8 references

ReCAPTCHA: HumanBased Character Recognition via Web Security Measures

  • L. von Ahn, B. Maurer, C. McMillen, D. Abraham, M. Blum
  • Science 321, 1465–1468
  • 2008
1 Excerpt

A Focus-Context Timeline for Browsing Historical Newspapers

  • R. B. Allen
  • ACM/IEEE Joint Conference on Digital Libraries…
  • 2005
1 Excerpt

Similar Papers

Loading similar papers…