Language Specific and Topic Focused Web Crawling

  title={Language Specific and Topic Focused Web Crawling},
  author={Olena Medelyan and Stefan Schulz and Jan Paetzold and Michael Poprat and Korn{\'e}l G. Mark{\'o}},
The Web has been successfully explored as training and test corpus for a variety of NLP tasks ([8], [2], [6]). However, corpora derived from the Web are usually inconsistent and highly heterogeneuos in their nature, which is normally counterbalanced by extending their size to billions of words. We assume that web crawling that takes into account domain and language represented by the content of the webpages would allow to acquire huge high-quality corpora. This would bring additional benefits… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 11 extracted citations

Using Big Data and sentiment analysis in product evaluation

2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) • 2013
View 4 Excerpts
Highly Influenced

A hybrid approach for Sarcasm Detection of Social Media Data

N. Vijayalaksmi, Dr. A. Senthilrajan
View 1 Excerpt

Language based web crawling on big data

2014 22nd Signal Processing and Communications Applications Conference (SIU) • 2014
View 1 Excerpt

Performance Optimization of Focused Web Crawling Using Content Block Segmentation

2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies • 2014

Similar Papers

Loading similar papers…