Flexible Web document analysis for delivery to narrow-bandwidth devices

@article{Penn2001FlexibleWD,
  title={Flexible Web document analysis for delivery to narrow-bandwidth devices},
  author={Gerald Penn and Jianying Hu and Hengbin Luo and Ryan T. McDonald},
  journal={Proceedings of Sixth International Conference on Document Analysis and Recognition},
  year={2001},
  pages={1074-1078}
}
We propose a set of baseline heuristics for identifying genuinely tabular information and news links in HTML documents. A prototype implementation of these heuristics is described for delivering content from news providers' home pages to a narrow-bandwidth device such as a portable digital assistant or cellular phone display. Its evaluation on 75 Web sites is provided, along with a discussion of topics for future research. 

Citations

Publications citing this paper.
SHOWING 1-10 OF 40 CITATIONS

Web-scale table census and classification

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

A fine-grained taxonomy of tables on the web

VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Detecting Tables in HTML Documents

  • Document Analysis Systems
  • 2002
VIEW 7 EXCERPTS
CITES BACKGROUND & METHODS

Building the Dresden Web Table Corpus: A Classification Approach

  • 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC)
  • 2015
VIEW 3 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Fonduer: Knowledge Base Construction from Richly Formatted Data

  • SIGMOD Conference
  • 2018
VIEW 1 EXCERPT
CITES BACKGROUND

Text and non-text separation in offline document images: a survey

  • International Journal on Document Analysis and Recognition (IJDAR)
  • 2018
VIEW 1 EXCERPT

References

Publications referenced by this paper.
SHOWING 1-6 OF 6 REFERENCES