INFERRING AND REVISING THEORIES WITH CONFIDENCE: ANALYZING BILINGUALISM IN THE 1901 CANADIAN CENSUS

@article{Drummond2006INFERRINGAR,
  title={INFERRING AND REVISING THEORIES WITH CONFIDENCE: ANALYZING BILINGUALISM IN THE 1901 CANADIAN CENSUS},
  author={Chris Drummond and Stan Matwin and Chad Gaffield},
  journal={Applied Artificial Intelligence},
  year={2006},
  volume={20},
  pages={1 - 33}
}
This paper shows how machine learning can help in analyzing and understanding historical change. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at the beginning of the last century. The discovered theories partly agree with, and partly complement, the existing views of historians on this question. Our approach, based around a decision tree, not only infers theories directly from data, but also evaluates existing theories and revises them to… Expand
IMPACT OF HIGH-LEVEL KNOWLEDGE ON ECONOMIC WELFARE THROUGH INTERACTIVE DATA MINING
TLDR
A novel algorithm for finding the most important relations with the use of data mining based on interactive data mining, specialized for the analysis of macroeconomic data that often contains incomplete and noisy attributes and, initially, complex relations. Expand
What (not) to expect when classifying rare events
TLDR
It is proved that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Expand
Inner Ensembles: Using Ensemble Methods in Learning Step
TLDR
The results show that the overall performance of Inner Ensembles is significantly better than the original methods, and Inner Ensembleles provide similar performance improvements as regular ensembles. Expand
Statistical Inference, Learning and Models in Big Data
TLDR
An overview of the topics covered is given, describing challenges and strategies that seem common to many different areas of application and including some examples of applications to make these challenges and Strategies more concrete. Expand
'A Flag that Knows No Colour Line': Aboriginal Veteranship in Canada, 1914-1939
Historians have rightly considered the period from 1914 to 1939 as the time when Canadian Indigenous soldiers and veterans of the First World War faced unique challenges because of their legal statusExpand
The Class Imbalance Problem
TLDR
It is shown that there exist a wide range of real-world applications involving extremely skewed (imbalanced) data sets and the class imbalance problem stems from the fact that the class of interest occupies only a negligible volume within the complete pattern space. Expand
Real Time Robot Policy Adaptation Based on Intelligent Algorithms
TLDR
Results show that evolution can generate an optimal relation between the robot performance and exploration-exploitation of reinforcement learning, enabling the robot to adapt online its strategy as the environment conditions change. Expand

References

SHOWING 1-10 OF 32 REFERENCES
Inferring and Revising Theories with Confidence: Analyzing the 1901 Canadian Census
TLDR
Using data from the Canadian census of 1901, the influences on bilingualism in Canada at beginning of the last century are discovered and a semantic measure of similarity between trees is proposed to limit the changes made. Expand
Pruning Decision Trees and Lists
TLDR
This thesis presents pruning algorithms for decision trees and lists that are based on significance tests and explains why pruning is often necessary to obtain small and accurate models and shows that the performance of standard pruned algorithms can be improved by taking the statistical significance of observations into account. Expand
Linearity, Nonlinearity, and the Competing Constructions of Social Hierarchy in Early Twentieth-Century Canada: The Question of Language in 1901
ince the 1960s, scholars have increasingly emphasized the ways in which routinely generated sources such as the census should be understood as creations of a “quantitative mentality” or “statisticalExpand
Induction over the unexplained: Using overly-general domain theories to aid concept learning
This paper describes and evaluates an approach to combining empirical and explanation-based learning calledInduction Over the Unexplained (IOU). IOU is intended for learning concepts that can beExpand
Knowledge Discovery Through Induction with Randomization Testing
design IRT embodies a view of induction as a four-phase process (shown in Figure 1). The process alters a current model by generating a group of new competitor models, fitting those competitor modelsExpand
What if there were no significance tests
Contents: Preface. Part I: Overview. L.L. Harlow, Significance Testing Introduction and Overview. Part II: The Debate: Against and For Significance Testing. J.Cohen, The Earth Is Round. F.L. Schmidt,Expand
Tree Induction Vs Logistic Regression: A Learning Curve Analysis
TLDR
A large-scale experimental comparison of logistic regression and tree induction is presented, assessing classification accuracy and the quality of rankings based on class-membership probabilities, and a learning-curve analysis is used to examine the relationship of these measures to the size of the training set. Expand
The Effects of Training Set Size on Decision Tree Complexity
This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when thatExpand
Tree Induction for Probability-Based Ranking
TLDR
It is concluded that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required, and is shown that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings. Expand
C4.5: Programs for Machine Learning
TLDR
A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Expand
...
1
2
3
4
...