• Corpus ID: 244896605

PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

@inproceedings{Kothawade2021PRISMAR,
  title={PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection},
  author={Suraj Kothawade and Vishal Kaushal and Ganesh Ramakrishnan and Jeff A. Bilmes and Rishabh K. Iyer},
  year={2021}
}
With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i) targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is underperforming, and ii) guided summarization, where data (e… 
Submodlib: A Submodular Optimization Library
TLDR
SUBMODLIB is an open-source, easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine that finds its application in summarization, data subset selection, hyper parameter tuning, efficient training and more.
PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information
TLDR
PLATINUM is a novel semi-supervised model agnostic meta learning framework that uses the submodular mutual information (SMI) functions to boost the performance of FSC and outperforms MAML and semisupervised approaches like pseduo-labeling for semi- supervised FSC.
Online Active Learning with Dynamic Marginal Gain Thresholding
TLDR
This work proposes an online algorithm which, given any stream of data, any assessment of its value, and any formulation of its selection cost, extracts the most valuable subset of the stream up to a constant factor while using minimal memory.
Dynamic Thresholding for Online Distributed Data Selection
TLDR
This work designs a general andible core selection routine for algorithms which, given any stream of data, any assessment of its value, and any formulation of its selection cost, extracts the most valuable subset of the stream up to a constant factor while using minimal memory.
Submodularity In Machine Learning and Artificial Intelligence
TLDR
A gentle review of submodular and supermodularity and their properties, and how submodularity is useful for clustering, data partitioning, parallel machine learning, active and semi-supervised learning, probabilistic modeling, and structured norms and loss functions.
BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced Datasets
TLDR
Basil (Balanced Active SemisupervIsed Learning), a novel algorithm that optimizes the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop that outperforms the state-of-the-art diversity and uncertainty based active learning methods.
Active Data Discovery: Mining Unknown Data using Submodular Information Measures
Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency.

References

SHOWING 1-10 OF 52 REFERENCES
Learning Mixtures of Submodular Functions for Image Collection Summarization
TLDR
This paper provides the first systematic approach for quantifying the problem of image collection summarization, along with a new data set of image collections and human summaries, and introduces a novel summary evaluation method called V-ROUGE.
GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning
TLDR
This work forms GLISTER as a mixed discretecontinuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set, and proposes an iterative online algorithm GLISTER-ONLINE, which performs data selection iteratively along with the parameter updates, and can be applied to any loss-based learning algorithm.
Coresets for Data-efficient Training of Machine Learning Models
TLDR
CRAIG is developed, a method to select a weighted subset of training data that closely estimates the full gradient by maximizing a submodular function and it is proved that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization.
Learning Mixtures of Submodular Shells with Application to Document Summarization
TLDR
A method to learn a mixture of submodular "shells" in a large-margin setting using a projected subgradient method to solve the problem of multi-document summarization and produces the best results reported so far on the widely used NIST D UC-05 through DUC-07 document summarization corpora.
A Class of Submodular Functions for Document Summarization
TLDR
A class of submodular functions meant for document summarization tasks which combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality.
Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision
TLDR
This work empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort, which allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
A Framework Towards Domain Specific Video Summarization
TLDR
The studies show that the more efficient way of incorporating domain specific relevance into a summary is by obtaining ratings of shots as opposed to binary inclusion/exclusion information, and propose a novel evaluation measure which is more naturally suited in assessing the quality of video summary for the task at hand than F1 like measures.
Video summarization by learning submodular mixtures of objectives
TLDR
A new method is introduced that uses a supervised approach in order to learn the importance of global characteristics of a summary and jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good summary.
Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance
TLDR
A framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video) and investigates several summarization models which capture notions of diversity, coverage, representation and importance.
Multi-document summarization via submodularity
TLDR
This paper proposes a new principled and versatile framework for different multi-document summarization tasks using submodular functions based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm.
...
...