Hangul Fonts Dataset: a Hierarchical and Compositional Dataset for Interrogating Learned Representations

  title={Hangul Fonts Dataset: a Hierarchical and Compositional Dataset for Interrogating Learned Representations},
  author={Jesse A. Livezey and Ahyeon Hwang and Kristofer E. Bouchard},
Interpretable representations of data are useful for testing a hypothesis or to distinguish between multiple potential hypotheses about the data. [] Key Method We first present a summary of the structure of the dataset. Using a set of unsupervised and supervised methods, we find that deep network representations contain structure related to the geometrical hierarchy of the characters. Our results lay the foundation for a better understanding of what deep networks learn from complex, structured datasets.


GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial
Definitions, methods, and applications in interpretable machine learning
This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.
Isolating Sources of Disentanglement in VAEs
A decomposition of the variational lower bound is shown that can be used to explain the success of the β-VAE in learning disentangled representations, and a new information-theoretic disentanglement metric is proposed, which is classifier-free and generalizable to arbitrarily-distributed and non-scalar latent variables.
Learning Ordered Representations with Nested Dropout
Nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network, is introduced and it is rigorously shown that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA.
Style and Content Disentanglement in Generative Adversarial Networks
The Style and Content Disentangled GAN (SC-GAN), a new unsupervised algorithm for training GANs that learns disentangled style and content representations of the data, which is evaluated on a set of baseline datasets.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
Interpretable machine learning: definitions, methods, and applications
This paper first defines interpretability in the context of machine learning and place it within a generic data science life cycle, and introduces the Predictive, Descriptive, Relevant (PDR) framework, consisting of three desiderata for evaluating and constructing interpretations.
Representation Learning: A Review and New Perspectives
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Emergence of Invariance and Disentanglement in Deep Representations
It is shown that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations.