Unsupervised Discovery of Biographical Structure from Text

@article{Bamman2014UnsupervisedDO,
  title={Unsupervised Discovery of Biographical Structure from Text},
  author={David Bamman and Noah A. Smith},
  journal={Transactions of the Association for Computational Linguistics},
  year={2014},
  volume={2},
  pages={363-376}
}
  • David Bamman, Noah A. Smith
  • Published 7 October 2014
  • Computer Science
  • Transactions of the Association for Computational Linguistics
We present a method for discovering abstract event classes in biographies, based on a probabilistic latent-variable model. Taking as input timestamped text, we exploit latent correlations among events to learn a set of event classes (such as Born, Graduates High School, and Becomes Citizen), along with the typical times in a person’s life when those events occur. In a quantitative evaluation at the task of predicting a person’s age for a given event, we find that our generative model… 

A Spatial Model for Extracting and Visualizing Latent Discourse Structure in Text

TLDR
The method outperforms or is competitive with state-of-the-art generative approaches on tasks such as predicting the outcome of a story, and sentence ordering, and it is shown that inference in it can lead to extraction of long-range latent discourse structure from a collection of documents.

Recognizing Biographical Sections in Wikipedia

TLDR
This work investigates the task of recognizing biographical sections from persons, and model this as a sequence classification problem, and proposes a supervised setting, in which the training data are acquired automatically.

Statistical Script Learning with Recurrent Neural Nets

TLDR
It is demonstrated that incorporating multiple arguments into events, yielding a more complex event representation than is used in previous work, helps to improve a co-occurrence-based script system’s predictive power.

Statistical Script Learning with Recurrent Neural Networks

TLDR
It is demonstrated that incorporating multiple arguments into events, yielding a more complex event representation than is used in previous work, helps to improve a co-occurrence-based script system’s predictive power.

Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

TLDR
This work introduces a two staged system for event timeline generation from multiple (historical) text documents by leveraging generative adversarial learning for important sentence classification and by assimilating knowledge based tags for improving the performance of event coreference resolution.

Discovering Typical Histories of Entities by Multi-Timeline Summarization

TLDR
This paper proposes in this paper a novel task of automatically creating summaries of typical histories of entities within their categories, and introduces 4 methods for the aforementioned task and evaluates them on Wikipedia categories containing several types of cities and persons.

Comparative Summarization of Temporal Document Collections

TLDR
A novel research task, Comparative Timeline Summarization (CTS), is introduced as an effective strategy to discover important similarities and differences in collections of timeline documents for providing contrastive type of knowledge.

Automatic Section Recognition in Obituaries

TLDR
A convolutional neural network outperforms bag-of-words and embedding-based BiLSTMs andBiLSTM-CRFs with a micro F1 = 0.81 and a statistical model which recognizes sections corresponding to Personal Information, Biographical Sketch, Characteristics, Family, Gratitude, Tribute, Funeral Information and Other aspects of the person is proposed.

Comparative Timeline Summarization via Dynamic Affinity-Preserving Random Walk

TLDR
A novel summarization framework which relies on a dynamic affinity-preserving mutually reinforced random walk for the CTS task and the ROUGE evaluations demonstrate the superior performance of the method on summarizing contrastive and diverse themes over competitive baselines.

Generating Character Descriptions for Automatic Summarization of Fiction

TLDR
This work collects a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform, and proposes two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes.
...

References

SHOWING 1-10 OF 70 REFERENCES

Learning Frames from Text with an Unsupervised Latent Variable Model

TLDR
A Dirichlet-multinomial model is presented in which frames are latent categories that explain the linking of verb-subject-object triples, given document-level sparsity, and what the model learns is analyzed.

Finding scientific topics

  • T. GriffithsM. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
TLDR
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

Acquiring temporal constraints between relations

TLDR
The proposed algorithm, GraphOrder, is a novel and scalable graph-based label propagation algorithm that takes transitivity of temporal order into account, as well as these statistics on narrative order of verb mentions, and achieves as high as 38.4% absolute improvement in F1 over a random baseline.

An Unsupervised Approach to Biography Production Using Wikipedia

TLDR
An unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies significantly outperforms all systems that participated in DUC2004, according to the ROUGE-L metric, and is preferred by human subjects.

Open domain event extraction from twitter

TLDR
TwiCal is described-- the first open-domain event-extraction and categorization system for Twitter, and a novel approach for discovering important event categories and classifying extracted events based on latent variable models is presented.

Probabilistic Frame Induction

TLDR
This paper proposes the first probabilistic approach to frame induction, which incorporates frames, events, and participants as latent topics and learns those frame and event transitions that best explain the text.

Extracting Social Networks from Literary Fiction

TLDR
The method involves character name chunking, quoted speech attribution and conversation detection given the set of quotes, which provides evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars.

A Bayesian Mixed Effects Model of Literary Character

TLDR
A model that employs multiple effects to account for the influence of extra-linguistic information (such as author) is introduced and it is found that this method leads to improved agreement with the preregistered judgments of a literary scholar, complementing the results of alternative models.

Topics over time: a non-Markov continuous-time model of topical trends

TLDR
An LDA-style topic model is presented that captures not only the low-dimensional structure of data, but also how the structure changes over time, showing improved topics, better timestamp prediction, and interpretable trends.

Organizing the OCA: learning faceted subjects from a library of digital books

TLDR
DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions, is presented, which is simultaneously better able to represent observed properties of text and more scalable to extremely large text collections.
...