Scaling to Very Very Large Corpora for Natural Language Disambiguation

@inproceedings{Banko2001ScalingTV,
  title={Scaling to Very Very Large Corpora for Natural Language Disambiguation},
  author={Michele Banko and Eric Brill},
  booktitle={ACL},
  year={2001}
}
The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 378 CITATIONS, ESTIMATED 24% COVERAGE

A Proposed Hierarchy of Deep Learning Tasks

VIEW 5 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

From distributional to semantic similarity

VIEW 10 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

Common Crawled Web Corpora: Constructing corpora from large amounts of web data

VIEW 6 EXCERPTS
CITES BACKGROUND, RESULTS & METHODS
HIGHLY INFLUENCED

Determining the Function of Political Tweets

  • 2017 IEEE 13th International Conference on e-Science (e-Science)
  • 2017
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Mining large streams of user data for personalized recommendations

  • SIGKDD Explorations
  • 2012
VIEW 6 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

The Bulgarian National Corpus: Theory and Practice in Corpus Design

  • J. Language Modelling
  • 2012
VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Latent semantic sentence clustering for multi-document summarization

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

1999
2019

CITATION STATISTICS

  • 22 Highly Influenced Citations

  • Averaged 29 Citations per year over the last 3 years

References

Publications referenced by this paper.
SHOWING 1-10 OF 20 REFERENCES

Tree-Bank Grammars

  • AAAI/IAAI, Vol. 2
  • 1996
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation

T. Pedersen
  • In Proceedings of the First
  • 2000

The role of unlabeled data in supervised learning

T. M. Mitchell
  • Proceedings of the Sixth International Colloquium on Cognitive Science,
  • 1999
VIEW 1 EXCERPT

Similar Papers

Loading similar papers…