Corpus ID: 15401573

Using Artificial Intelligence to Identify State Secrets

  title={Using Artificial Intelligence to Identify State Secrets},
  author={Renato Rocha Souza and Fl{\'a}vio Codeço Coelho and Rohan Shah and M. Connelly},
Whether officials can be trusted to protect national security information has become a matter of great public controversy, reigniting a long-standing debate about the scope and nature of official secrecy. The declassification of millions of electronic records has made it possible to analyze these issues with greater rigor and precision. Using machine-learning methods, we examined nearly a million State Department cables from the 1970s to identify features of records that are more likely to be… Expand
A framework for technology-assisted sensitivity review: using sensitivity classification to prioritise documents for review
This is the first thesis to investigate automatically classifying FOIA sensitive information to assist digital sensitivity review, and proposes a novel framework for technology-assisted sensitivity review that can prioritise the most appropriate documents to be reviewed at specific stages of the ACM SIGIR Forum. Expand
How the Accuracy and Confidence of Sensitivity Classification Affects Digital Sensitivity Review
The findings demonstrate that sensitivity classification is a viable technology for assisting human reviewers with the sensitivity review of digital documents. Expand
Secure trustless text processing of sensitive documents
This paper proposes a protocol which allows for secure outsourcing of text analytics tasks without compromising the confidentiality of documents, and presents linear time complexity on the size of the corpus. Expand
Developing an information classification method
The empirical demonstration shows that senior and novice information security managers perceive the proposed method as a useful tool for classifying information assets in an organisation, and proves that it is possible to devise a method to support information classification. Expand


Estimating the Severity of the WikiLeaks U.S. Diplomatic Cables Disclosure
This work provides a useful characterization of the sample of documents available to international relations scholars interested in testing theories of “private information,” while helping inform the public debate surrounding Manning's trial and 35-year prison sentence. Expand
Secrets and Leaks: The Dilemma of State Secrecy
Acknowledgments xi Who Watches the Watchers? 1 Chapter 1 The Problem: How to Regulate State Secrecy? 16 Chapter 2 Should We Rely on Judges? Transparency and the Problem of Judicial Deference 51Expand
A Boom with Review: How Retrospective Oversight Increases the Foreign Policy Ability of Democracies
In the ongoing debate concerning whether democracies can carry out effective national security policy, the role of transparency costs has received little attention. I argue for a more nuancedExpand
A comparison of event models for naive bayes text classification
It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size. Expand
Transforming classifier scores into accurate multiclass probability estimates
This work shows how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates, and proposes a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples. Expand
Data Mining Approach to Detect Heart Diseases
Globally, heart diseases are the number one cause of death. About 80% of deaths occurred in low- and middle income countries. If current trends are allowed to continue, by 2030 an estimated 23.6Expand
The relationship between Precision-Recall and ROC curves
It is shown that a deep connection exists between ROC space and PR space, such that a curve dominates in R OC space if and only if it dominates in PR space. Expand
Random Forests
  • L. Breiman
  • Mathematics, Computer Science
  • Machine Learning
  • 2004
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression. Expand
Distributional Structure
This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning. Expand
Natural Language Processing with Python
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automaticExpand