• Publications
  • Influence
Behavior Regularized Offline Reinforcement Learning
TLDR
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment
TLDR
Asymmetrically-relaxed distribution alignment is proposed, a new approach that overcomes some limitations of standard domain-adversarial algorithms and characterize precise assumptions under which the algorithm is theoretically principled and demonstrate empirical benefits on both synthetic and real datasets.
The Laplacian in RL: Learning Representations with Efficient Approximations
TLDR
This paper presents a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context, and empirically shows that it generalizes beyond the tabular, finite-state setting.
A Unified View of Label Shift Estimation
TLDR
A unified view of the two methods and the first theoretical characterization of the likelihood-based estimator is presented, attributing BBSE's statistical inefficiency to a loss of information due to coarse calibration.
Interpretable Multimodality Embedding Of Cerebral Cortex Using Attention Graph Network For Identifying Bipolar Disorder
TLDR
An Edge-weighted Graph Attention Network (EGAT) with Dense Hierarchical Pooling is developed to better understand the underlying roots of the disorder from the view of structure-function integration and indicated that associated with the abnormality of anatomical geometric properties, multiple interactive patterns among Default Mode, Fronto-parietal and Cingulo-opercular networks contribute to identifying BP.
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning
TLDR
Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.
On the Optimality of Batch Policy Optimization Algorithms
TLDR
This work introduces a class of confidenceadjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis and introduces a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction.
Instabilities of Offline RL with Pre-Trained Neural Representation
TLDR
The methodology explores the ideas when using features from pre-trained neural networks, in the hope that these representations are powerful enough to permit sample efficient offline RL, and finds offline RL is stable only under extremely mild distribution shift.
Mixture Proportion Estimation and PU Learning: A Modern Approach
TLDR
Two simple techniques are proposed: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning, which establish formal guarantees that hold whenever a model can cleanly separate out a small subset of positive examples.
Importance Reweighting Using Adversarial-Collaborative Training
TLDR
This work argues that likelihood ratio based reweighting may not be the best choice for the covariate shift problem in terms of low effective sample size, and proposes another learning objective that contains a “collaborator" in addition to the adversary.
...
...