Table of Contents Recognition and Extraction for Heterogeneous Book Documents

@article{Wu2013TableOC,
  title={Table of Contents Recognition and Extraction for Heterogeneous Book Documents},
  author={Zhaohui Wu and Prasenjit Mitra and C. Lee Giles},
  journal={2013 12th International Conference on Document Analysis and Recognition},
  year={2013},
  pages={1205-1209}
}
Existing work on book table of contents (TOC) recognition has been almost all on small size, application-dependent, and domain-specific datasets. However, TOC of books from different domains differ significantly in their visual layout and style, making TOC recognition a challenging problem for a large scale collection of heterogeneous books. We observed that TOCs can be placed into three basic styles, namely "flat", "ordered", and "divided", giving insights into how to achieve effective TOC… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.

Explore Further: Topics Discussed in This Paper

Citations

Publications citing this paper.
SHOWING 1-10 OF 11 CITATIONS

References

Publications referenced by this paper.
SHOWING 1-10 OF 18 REFERENCES

Similar Papers

Loading similar papers…