• Corpus ID: 208284918

Evaluating Semantic Interaction on Word Embeddings via Simulation

@article{Bian2020EvaluatingSI,
  title={Evaluating Semantic Interaction on Word Embeddings via Simulation},
  author={Yail Bian and Michelle Dowling and Chris North},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.15824}
}
Semantic interaction (SI) attempts to learn the user's cognitive intents as they directly manipulate data projections during sensemaking activity. For text analysis, prior implementations of SI have used common data features, such as bag-of-words representations, for machine learning from user interactions. Instead, we hypothesize that features derived from deep learning word embeddings will enable SI to better capture the user's subtle intents. However, evaluating these effects is difficult… 

Figures and Tables from this paper

DeepSI: Interactive Deep Learning for Semantic Interaction

Results of two complementary studies show that DeepSIfinetune more accurately captures users’ complex mental models with fewer interactions, and compares a state-of-the-art but more basic use of deep learning as only a feature extractor pre-processed outside of the interactive loop.

Challenges in Evaluating Interactive Visual Machine Learning Systems

The challenges and research gaps identified in an IEEE VIS workshop on the evaluation of IVML systems are described and described.

References

SHOWING 1-10 OF 31 REFERENCES

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

From Word Embeddings To Document Distances

It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.

Multi-model semantic interaction for text analytics

StarSPIRE is introduced, a visual text analytics prototype that transforms user interactions on documents into both small-scale display layout updates as well as large-scale relevancy-based document selection and an updated visualization pipeline model for generalized multi-model semantic interaction.

Recent Trends in Deep Learning Based Natural Language Processing [Review Article]

This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering

It is found that semantic interaction captures the analytical reasoning of the user through keyword weighting, and aids the user in co-creating a spatialization based on the user's reasoning and intuition.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic

Interactive Visual Analytics for Sensemaking with Big Text

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.