Mining events with declassified diplomatic documents

  title={Mining events with declassified diplomatic documents},
  author={Yuanjun Gao and J. Goetz and R. Mazumder and M. Connelly},
  journal={arXiv: Methodology},
Since 1973 the State Department has been using electronic records systems to preserve classified communications. Recently, approximately 1.9 million of these records from 1973-77 have been made available by the U.S. National Archives. While some of these communication streams have periods witnessing an acceleration in the rate of transmission; others do not show any notable patterns in communication intensity. Given the sheer volume of these communications -- far greater than what had been… Expand
The ‘assertive edition’
The paper suggests the use of embedded RDF representations in TEI markup, following the practice in several recent projects, and it concludes with a proposal for a definition of the ‘assertive edition’. Expand
The ‘ assertive edition ’ 1 On the consequences of digital methods in scholarly editing for historians
The paper describes the special interest in edition by historians and the resulting editorial practice in contrast to the methods applied by pure philological textual criticism. The interest inExpand
Big tobacco focuses on the facts to hide the truth: an algorithmic exploration of courtroom tropes and taboos
Quantitative analysis can reveal heretofore hidden patterns in courtroom rhetoric, including the weaponisation of pronouns and the systematic avoidance of certain terms, such as ‘profits’ or ‘customer’. Expand


Bursty and Hierarchical Structure in Streams
The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. Expand
Analyzing feature trajectories for event detection
We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set ofExpand
Parameter Free Bursty Events Detection in Text Streams
This paper proposes a new novel parameter free probabilistic approach, called feature-pivot clustering, which is to fully utilize the time information to determine a set of bursty features which may occur in different time windows. Expand
A Survey of Techniques for Event Detection in Twitter
A survey of techniques for event detection from Twitter streams aimed at finding real‐world occurrences that unfold over space and time and highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features. Expand
What Should We Do about Source Selection in Event Data? Challenges, Progress, and Possible Solutions
This work summarizes recent studies of news selection and outlines a strategy for reducing the risks of possible selection bias, including techniques for generating multisource event inventories, estimating larger populations, and controlling for nonrandomness. Expand
Optimal detection of changepoints with a linear computational cost
This work considers the problem of detecting multiple changepoints in large data sets and introduces a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which is linear in the number of observations. Expand
Allende's Chile and the inter-American Cold War
Stalinist terror throughout Eastern Europe. The specifics of political and economic development and the dynamics within each Communist party accounted for the variations in themethods, speed,Expand
An algorithm for optimal partitioning of data on an interval
This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of N data points in time O(N/sup 2/), which is guaranteed to find the exact global optimum. Expand
Introductory Lectures on Convex Optimization - A Basic Course
It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments. Expand
Algorithms for the optimal identification of segment neighborhoods.
Two algorithms for the efficient identification of segment neighborhoods are presented and one application to the haemagglutinin protein of influenza virus reveals a possible mechanism for conformational change through the finding of a break in a strong heptad repeat structure. Expand