Bias Reduction via End-to-End Shift Learning: Application to Citizen Science

  title={Bias Reduction via End-to-End Shift Learning: Application to Citizen Science},
  author={Di Chen and Carla P. Gomes},
Citizen science projects are successful at gathering rich datasets for various applications. [...] Key Result Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data.Expand

Paper Mentions

Accelerating Ecological Sciences from Above: Spatial Contrastive Learning for Remote Sensing
This work considers spatially augmenting contrastive learning by training neural networks to correctly classify two nearby patches of a landscape as such and demonstrates that this approach improves upon previous methods and naive classification for a large-scale data set of remote sensing images derived from invasive species observations obtained over 30 years. Expand
Do Models of Mental Health Based on Social Media Data Generalize?
It is revealed that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Expand
A Review of Domain Adaptation without Target Labels
  • Wouter M. Kouw, M. Loog
  • Computer Science, Mathematics
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2021
Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: How can a classifier learn from a source domain and generalize to aExpand
Tackling Climate Change with Machine Learning
From smart grids to disaster management, it is described how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. Expand
Artificial Intelligence for Social Good: A Survey
This work quantitatively analyzes the distribution and trend of the AI4SG literature in terms of application domains and AI techniques used and proposes three conceptual methods to systematically group the existing literature and analyze the eight AI4 SG application domains in a unified framework. Expand
Presence-Only Geographical Priors for Fine-Grained Image Classification
An efficient spatio-temporal prior is proposed, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location. Expand
TeaTime4Schools: Using Data Mining Techniques to Model Litter Decomposition in Austrian Urban School Soils
Litter decomposition plays a pivotal role in the global carbon cycle, but is difficult to measure on a global scale, especially by citizen scientists. Here, citizen scientists, i.e., school studentsExpand
Using multiple data sources to explore disease transmission risk between commercial poultry, backyard poultry, and wild birds in New Zealand.
The study findings highlight how the spatial patterns of trading activity within the commercial poultry industry, alongside the movement of backyard poultry and wild birds, have the potential to contribute significantly to the spread of diseases between these populations. Expand
Macro-plastic pollution in the tidal Thames: An analysis of composition and trends for the optimization of data collection
Abstract Plastic pollution is a major issue affecting the oceans. Despite rivers being the principal source of plastic debris, few of the studies on plastic pollution are focused on freshwaterExpand


Avicaching: A Two Stage Game for Bias Reduction in Citizen Science
A novel two-stage game for reducing data-bias in citizen science in which the game organizer, a citizen-science program, incentivizes the agents, the citizen scientists, to visit under-sampled areas, called Avicaching. Expand
The eBird enterprise: An integrated approach to development and application of citizen science
eBird has become a major source of biodiversity data, increasing the knowledge of the dynamics of species distributions, and having a direct impact on the conservation of birds and their habitats. Expand
Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation
This work proposes a novel method that allows us to directly estimate the importance from samples without going through the hard task of density estimation, and demonstrates that the proposed method is computationally more efficient than existing approaches with comparable accuracy. Expand
Detecting and Correcting for Label Shift with Black Box Predictors
Black Box Shift Estimation (BBSE) is proposed to estimate the test distribution of p(y) and it is proved BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. Expand
Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy
This article describes the model for building and operating citizen science projects that has evolved at the Cornell Lab of Ornithology over the past two decades and hopes that the model will inform the fields of biodiversity monitoring, biological research, and science education while providing a window into the culture of citizen science. Expand
Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
This paper proposes a direct importance estimation method that does not involve density estimation and is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Expand
Discriminative Learning Under Covariate Shift
We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution---problems also referred toExpand
Correcting Sample Selection Bias by Unlabeled Data
A nonparametric method which directly produces resampling weights without distribution estimation is presented, which works by matching distributions between training and testing sets in feature space. Expand
Input-dependent estimation of generalization error under covariate shift
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation,Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand