• Publications
  • Influence
Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents
TLDR
A structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time is found. Expand
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets
TLDR
This work gives an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets, and predicts the capacity of machine learning algorithms to learn textual/visual relationships. Expand
Something's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features
TLDR
Using data from several different communities on reddit.com, this work predicts the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. Expand
Image Representations and New Domains in Neural Image Captioning
TLDR
By varying image representation quality produced by a convolutional neural network, it is found that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. Expand
A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions
TLDR
It is found that unstated background information is better explained by visual features, whereas fine-grained distinctions are disambiguated more easily via ASR tokens. Expand
Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity
TLDR
The experiments show that when considered in isolation, simple unigram text features and deep neural network visual features yield the highest accuracy individually, and that the combination of the two modalities generally leads to the best accuracies overall. Expand
Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities
TLDR
After a newer community is created, for several types of highly-related community pairs, users that engage in a newercommunity tend to be more active in their original community than users that do not explore, even when controlling for previous level of engagement. Expand
What do Vegans do in their Spare Time? Latent Interest Detection in Multi-Community Networks
TLDR
This work uses a dataset of 76M submissions to the social network Reddit, which is organized into distinct sub-communities called subreddits, to measure the similarity between entire subreddits both in terms of user similarity and topical similarity. Expand
A Comparative Analysis of Popular Phylogenetic Reconstruction Algorithms
Understanding the evolutionary relationships between organisms by comparing their genomic sequences is a focus of modern-day computational biology research. Estimating evolutionary history in thisExpand
Does My Multimodal Model Learn Cross-modal Interactions? It’s Harder to Tell than You Might Think!
TLDR
A new diagnostic tool, empirical multimodally-additive function projection (EMAP), for isolating whether or not cross-modal interactions improve performance for a given model on a given task, and recommends that researchers in multimodal machine learning report the performance not only of unimodal baselines, but also the EMAP of their best-performing model. Expand
...
1
2
...