Learn More
Large volumes of content (bookmarks, reviews, videos, etc.) are currently being created on the " Social Web " , i.e. on Web 2.0 community sites, and this content is being annotated and commented upon. The ability to view an individual's entire contribution to the Social Web would be an interesting and valuable service, particularly important as social(More)
We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art IE algorithms based on specialized learning algorithms. We(More)
Web pages are discriminated based on their topic and genre. Web page genres are capable to improve the modern search engines to focus on the user's information need. In this paper, web pages are represented using character n-grams. Character n-gram representation is language independent and allows automatic extraction of features from a web page. Character(More)
The need for labeled documents is a key bottleneck in adaptive information extraction. One way to solve this problem is through active learning algorithms that require users to label only the most informative documents. We investigate several document selection strategies that are particularly relevant to information extraction. We show that some strategies(More)