Learn More
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and(More)
We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled(More)
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a neverending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by(More)
A key question regarding the future of the semantic web is " how will we acquire structured information to populate the semantic web on a vast scale? " One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this(More)
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed–crop competitiveness using as input categorical variables for the total density of weeds(More)
This paper proposes a feature weighting method based on X<sup>2 </sup> statistical test, to be used in conjunction with a k-NN classifier. Results of empirical experiments conducted using data from several knowledge domains are presented and discussed. Forty four out of forty five conducted experiments favoured the feature weighted approach and are(More)
The definition of the fuzzy rule base is one of the most important and difficult tasks when designing fuzzy systems. This paper discusses the results of two different hybrid methods investigated earlier, for the automatic generation of fuzzy rules from numerical data. One of the methods proposes the creation of fuzzy rule bases using genetic algorithms in(More)
Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated(More)
Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods(More)