Runners, repeaters, strangers and aliens: Operationalising efficient output disclosure control

@article{Alves2020RunnersRS,
  title={Runners, repeaters, strangers and aliens: Operationalising efficient output disclosure control},
  author={Kyle Alves and Felix Ritchie},
  journal={Statistical journal of the IAOS},
  year={2020},
  volume={36},
  pages={1281-1293}
}
Statistical agencies and other government bodies are increasingly using secure remote research facilities to provide access to sensitive data for research as an efficient way to increase productivity. Such facilities depend on human intervention to ensure that the research outputs do not breach statistical disclosure control (SDC) rules. Output SDC can be either principles-based, rules-based, or ad hoc. Principles-based is often seen as the gold standard when viewed in statistical terms, as it… 

Figures from this paper

10 Is the Safest Number That There's Ever Been

and Paper When checking frequency and magnitude tables for disclosure risk, the cell threshold (the minimum number of observations in each cell) is the crucial statistic. In rules-based environments,

References

SHOWING 1-10 OF 53 REFERENCES

Principles- versus rules-based output statistical disclosure control in remote access environments

In recent years, the level of detail in confidential data made available to social scientists has increased dramatically. Much of this has been due to the growth in secure data access facilities,

Disclosure detection in research environments in practice

There is an increasing demand for access to raw confidential data, and NSIs have responded by setting up controlled research facilities. However, the most common approaches to statistical

Operationalising ‘safe statistics’: The case of linear regression

The recent growth in research access to confidential government microdata has prompted the development of more general 'output-based statistical disclosure control' (OSDC) methods which go beyond

Analyzing the disclosure risk of regression coefficients

It is demonstrated that linear regression coefficients show no substantive disclosure risks in realistic environments, and so should be considered as ‘safe statistics’ in the terminology of this field.

Lessons learned in training ‘safe users’ of confidential data

This paper summarises recent learning about training users of confidential data: what they can learn, what they don’t learn, and how to extract the full benefit from training for both parties.

Confidentiality and linked data

In summary, linked data does present a strong theoretical challenge to the protection of data, as statistical protection is outgunned by technology and software; but in practice a shift in focus to the evidence-based user-centred view shows that there are many directions for practical data protection to go.

Evidence-based, context-sensitive, user-centred, risk-managed SDC planning: Designing data access solutions for scientific use

The case for an evidence-based holistic approach to data access management is summarized, which considers the universality of the ‘intruder’ model, and the focus on quantifiable measures of risk, when uncertainty is the true problem.

Disclosure Risks in Releasing Output Based on Regression Residuals

The U.S. Census Bureau’s Center for Economic Studies (CES) and its network of Census Research Data Centers (RDCs) provide restricted access to non-publicly available Census Bureau data files for

Foundations and implications of a proposed unified services theory

D businesses, such as garbage collection, retail banking, and management consulting are often tied together under the heading of “services”, based on little more than a perception that they are

Model Diagnostics for Remote Access Regression Servers

The proposed synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosures, and can be used to develop remote server diagnostics for generalized linear models.
...