Web search engines collect data from the Web by " crawling " it – performing a simulated browsing of the web by extracting links from pages, downloading all of them and repeating the process ad infinitum. This process requires enormous amounts of hardware and network resources, ending up with a large fraction of the vis ible web 1 on the crawler's storage… (More)
In many machine learning problem domains large amounts of data are available but the cost of correctly labeling it prohibits its use. This paper presents a short overview of methods for using a small set of labeled data together with a large supplementary unlabeled dataset in order to learn a better hypothesis than just by using the labeled information.
In many machine learning problem domains large amounts of data are available but the cost of correctly labelling it prohibits its use for model training. For us especially relevant are large quantities of raw information available on the internet that present an interesting challenge of how to successfully exploit information hidden within it without first… (More)