Learn More
With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. Our novel approach to wrapper(More)
In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically(More)
Inductive learning algorithms typically use a set of labeled examples to learn class descriptions for a set of user-specified concepts of interest. In practice, labeling the training examples is a tedious, time consuming, error-prone process. Furthermore, in some applications, the labeling of each example also may be extremely expensive (e.g., it may(More)
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web sites and transforming it into a structured data format, such as XML. The resulting data can then be used to build new applications without having to deal with unstructured data. The(More)
The Web is based on a browsing paradigm that makes it difficult to retrieve and integrate data from multiple sites. y-k&y, the only w&y t;g & this is to bl_?ild_ specialized applications, which are time-consuming to develop and difficult to maintain. We are addressing this problem by creating the technology and tools for rapidly constructing information(More)
Information Extraction systems rely on a set of extraction patterns that they use in order to retrieve from each document the relevant information. In this paper we survey the various types of extraction patterns that are generated by machine learning algorithms. We identify three main categories of patterns, which cover a variety of application domains,(More)
The Web is based on a browsing paradigm that makes it diÆcult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applications, which are time-consuming to develop and diÆcult to maintain. We have addressed this problem by creating the technology and tools for rapidly constructing information agents(More)
Multi-view learners reduce the need for labeled data by exploiting disjoint sub-sets of features (views), each of which is sufficient for learning. Such algorithms assume that each view is a strong view (i.e., perfect learning is possible in each view). We extend the multi-view framework by introducing a novel algorithm, Aggressive Co-Testing, that exploits(More)