Matching Known Patients to Health Records in Washington State Data

@article{Sweeney2013MatchingKP,
  title={Matching Known Patients to Health Records in Washington State Data},
  author={Latanya Sweeney},
  journal={ArXiv},
  year={2013},
  volume={abs/1307.1370}
}
The state of washington sells patient-level health data for $50. This publicly available data-set has virtually all hospitalizations occurring in the state in a given year, including patient demographics, diagnoses, procedures, attending physician, hospital, a summary of charges, and how the bill was paid. It does not contain patient names or addresses (only ZIPs). Newspaper stories printed in the state for the same year that contain the word "hospitalized” often include a patient’s name and… 

Figures and Tables from this paper

Survey of Publicly Available State Health Databases
TLDR
It is found that states varied widely in whether their data was HIPAA equivalent; while 13 were equivalent (or stricter) with demographic fields only 3 of the 33 states that released data did so in a form that was HIPaa equivalent across all fields.
Data & Civil Rights: Health Primer
TLDR
Not only does misuse of data have consequences for individuals seeking fair access to healthcare, but inappropriate practices also erode productive efforts to use data to empower people, personalize medicine, and develop innovations that can advance healthcare.
Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations
TLDR
It is suggested that the anonymization guidance from these agencies can provide adequate privacy protection for patients, and the modes of attack can inform further refinements of the methodologies they recommend in their guidance for manufacturers.
What are the optimum quasi-identifiers to re-identify medical records?
  • Yong Ju LeeKyung Ho Lee
  • Computer Science
    2018 20th International Conference on Advanced Communication Technology (ICACT)
  • 2018
TLDR
A comparative analysis of the probability of re-identification according to the type and the range of inference of inferable quasi-identifiers, which can be inferred from background knowledge.
Privacy-preserving association rule mining for horizontally partitioned healthcare data: a case study on the heart diseases
TLDR
An approach for the PPDARM is proposed for collaboratively performing association rule mining by all local EHR systems while preserving the privacy, and is also analysed with the heart disease dataset.
Private collection: high correlation of sample collection and patient admission date in clinical microbiological testing complicates sharing of phylodynamic metadata
TLDR
It is suggested that publicly depositing microbiological collection dates at greater resolution than the year may not meet routine Safe Harbor-based requirements for patient de-identification.
Privacy-Preserving Data Analysis for the Federal Statistical Agencies
Government statistical agencies collect enormously valuable data on the nation's population and business activities. Wide access to these data enables evidence-based policy making, supports new
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation (Preprint)
TLDR
A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person and can be applied in the future to evaluate the privacy of fully synthetic data.
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation
TLDR
The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably, and the risk model can be applied in the future to evaluate the privacy of fully synthetic data.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Survey of Publicly Available State Health Databases
TLDR
It is found that states varied widely in whether their data was HIPAA equivalent; while 13 were equivalent (or stricter) with demographic fields only 3 of the 33 states that released data did so in a form that was HIPaa equivalent across all fields.
Medicaid markets and pediatric patient safety in hospitals.
TLDR
The authors' analysis offers additional insights to previous work and suggests a new factor--the Medicaid-payer market--as relevant to the issue of pediatric patient safety.
Identifying Participants in the Personal Genome Project by Name
TLDR
Technological remedies for people to learn about their demographics to make better decisions are proposed, thereby revisiting an old vulnerability that could be easily thwarted with minimal loss of research value.
Weaving Technology and Policy Together to Maintain Confidentiality
  • L. Sweeney
  • Computer Science
    The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics
  • 1997
TLDR
Three general-purpose computer programs for maintaining patient confidentiality when disclosing electronic medical records are examined: the Scrub System, which locates and suppresses or replaces personally identifying information in letters between doctors and in notes written by clinicians; the Datafly System,which generalizes values based on a profile of the data recipient at the time of disclosure; and the μ-Argus System, a somewhat similar system which is becoming a European standard for disclosing public use data.
Simple Demographics Often Identify People Uniquely
In this document, I report on experiments I conducted using 1990 U.S. Census summary data to determine how many individuals within geographically situated populations had combinations of demographic
Effect of critical access hospital conversion on patient safety.
TLDR
CAH conversion in Iowa rural hospitals was associated with better performance of risk-adjusted rates of iatrogenic pneumothorax, selected infections due to medical care, accidental puncture or laceration, and composite score of four PSIs, but had no significant impact on the observed rates of death in low-mortality diagnosis-related groups (DRGs).
The 'Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
TLDR
This paper critically examines the historic Weld re-identification and the dramatic reductions (thousands fold) of re-Identification risks for de-identified health data as they have been protected by the HIPAA Privacy Rule provisions for de -identification since 2003.
A re-examination of distance as a proxy for severity of illness and the implications for differences in utilization by race/ethnicity.
TLDR
The study analyzes the hospitalization patterns of elderly residents to examine whether the relation between distant travel and severity of illness is uniform across racial/ethnic subgroups, and indicates that minorities are likely to have higher severity thresholds in seeking distant hospital care.
Medicare and Medicaid programs; electronic health record incentive program. Final rule.
This final rule implements the provisions of the American Recovery and Reinvestment Act of 2009 (ARRA) (Pub. L. 111-5) that provide incentive payments to eligible professionals (EPs), eligible
Impact of the HealthChoice Program on Cesarean Section and Vaginal Birth after C-Section Deliveries: A Retrospective Analysis
  • Arpit Misra
  • Medicine, Political Science
    Maternal and Child Health Journal
  • 2007
TLDR
It is shown that there has been an overall increase in the use of primary and repeat cesarean sections in Maryland hospitals, however, HealthChoice limited this increase for Medicaid enrollees relative to privately insured women, and vaginal births after C-section have declined.
...
...