Context-Dependent Confusions Rules for Building Error Model Using Weighted Finite State Transducers for OCR Post-Processing

@article{Azawi2014ContextDependentCR,
  title={Context-Dependent Confusions Rules for Building Error Model Using Weighted Finite State Transducers for OCR Post-Processing},
  author={Mayce Ibrahim Ali Al Azawi and Thomas M. Breuel},
  journal={2014 11th IAPR International Workshop on Document Analysis Systems},
  year={2014},
  pages={116-120}
}
In this paper, we propose a new technique to correct the OCR errors by means of weighted finite state transducers(WFST) with context-dependent confusion rules. We translate the OCR confusions which appear in the recognition outputs into edit operations, e.g. insertions, deletions and substitutions using Levenshtein edit distance algorithm. The edit operations are extracted in a form of rules with respect to the context of the incorrect string to build an error model using weighted finite state… CONTINUE READING

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • The evaluation shows the error rate of our model on the UWIII testset is 0.68%, while the baseline is 1.14% and the error rate of the existing state-of-the-art single character rules-based approach is 1.0%.

Citations

Publications citing this paper.

References

Publications referenced by this paper.
Showing 1-10 of 21 references

Fast string correction with Levenshtein automata

International Journal on Document Analysis and Recognition • 2002
View 5 Excerpts
Highly Influenced

Binary codes capable of correcting deletions, insertions, and reversals

V. I. Levenshtein
Tech. Rep. 8, 1966. • 1966
View 4 Excerpts
Highly Influenced

Algorithms on strings

C. H. Maxime Crochemore, T. Lecroq
2007. • 2007
View 1 Excerpt

Similar Papers

Loading similar papers…