PrivOnto: A semantic framework for the analysis of privacy policies

@article{Oltramari2018PrivOntoAS,
  title={PrivOnto: A semantic framework for the analysis of privacy policies},
  author={Alessandro Oltramari and Dhivya Piraviperumal and Florian Schaub and Shomir Wilson and Sushain Cherivirala and Thomas B. Norton and N. Cameron Russell and Peter Story and Joel R. Reidenberg and Norman M. Sadeh},
  journal={Semantic Web},
  year={2018},
  volume={9},
  pages={185-203}
}
Privacy policies are intended to inform users about the collection and use of their data by websites, mobile apps and other services or appliances they interact with. This also includes informing users about any choices they might have regarding such data practices. However, few users read these often long privacy policies; and those who do have difficulty understanding them, because they are written in convoluted and ambiguous language. A promising approach to help overcome this situation… 
Challenges in Automated Question Answering for Privacy Policies
TLDR
This work explores the idea of an automated privacy question-answering assistant, and looks at the kinds of questions users are likely to pose to such a system, as well as their ability to articulate questions in this domain.
Helping Users Understand Privacy Notices with Automated Query Answering Functionality : An Exploratory Study
TLDR
Query Answering functionality could be configured to return short text fragments extracted from privacy notices and rely on the user to interpret some of the finer nuances of the text found in these fragments, which could potentially prove more robust than fully automated annotation techniques, which at this time struggle with the interpretation of finer nuances.
Analyzing Privacy Policies at Scale
TLDR
The results from these efforts show the effectiveness of using automated and semi-automated methods for extracting from privacy policies the data practice details that are salient to Internet users’ interests.
A Semantic-based Approach to Reduce the Reading Time of Privacy Policies
TLDR
This thesis presents the development of domain ontology for privacy policies and indicates that the time required to read a policy is significantly reduced as the ontology directs user to the right content for a query.
Ambiguity and Generality in Natural Language Privacy Policies
TLDR
This work proposes an automated approach to infer semantic relations among information types and construct an ontology to guide requirements authors in the selection of the most appropriate information type terms, yielding predictions that identify hypernymy relations in information type pairs with 0.904 F-1 score.
From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme
TLDR
A mapping from GDPR provisions to the OPP-115 annotation scheme is introduced, which serves as the basis for a growing number of projects to automatically classify privacy policy text, suggesting the feasibility of bridging existing computational and legal research on privacy policies, benefiting both areas.
Question Answering for Privacy Policies: Combining Computational and Legal Perspectives
TLDR
The PrivacyQA corpus, a corpus consisting of 1750 questions about the privacy policies of mobile applications, and over 3500 expert annotations of relevant answers, offers a challenging corpus for question answering, with genuine real world utility.
An Ontology Design Pattern for Describing Personal Data in Privacy Policies
TLDR
An ontology design pattern is presented to assist the existing ecosystem of machine-based approaches for interpretation and visualisation of privacy policies by providing a common structured representation to ease modelling and sharing of related information.
Enhancing Readability of Privacy Policies Through Ontologies
TLDR
This thesis presents the development of a domain ontology using natural language processing (NLP) algorithms as a way to reduce costs and speed up development and found that by using the ontology to locate key parts of privacy policies, average reading times were substantially reduced.
...
...

References

SHOWING 1-10 OF 72 REFERENCES
The Creation and Analysis of a Website Privacy Policy Corpus
TLDR
A corpus of 115 privacy policies with manual annotations for 23K fine-grained data practices is introduced and the process of using skilled annotators and a purpose-built annotation tool to produce the data is described.
Towards Usable Privacy Policies : Semi-automatically Extracting Data Practices From Websites ’ Privacy Policies
TLDR
This work builds on recent advances in natural language processing, privacy preference modeling, crowdsourcing, and privacy interface design in order to develop a practical framework based on a website's existing natural language privacy policy that empowers users to more meaningfully control their privacy, without requiring additional cooperation from website operators.
Automatic Extraction of Opt-Out Choices from Privacy Policies
TLDR
This paper describes machine learning approaches for extracting instances containing opt-out hyperlinks and evaluates the proposed methods using the OPP-115 Corpus, a dataset of annotated privacy policies.
Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies
TLDR
This paper investigates the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary set identified by topic modeling, and shows a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.
Mining Privacy Goals from Privacy Policies Using Hybridized Task Recomposition
TLDR
A semiautomated framework that combines crowdworker annotations, natural language typed dependency parses, and a reusable lexicon to improve goal-extraction coverage, precision, and recall is introduced.
Privacy in the Semantic Web: What Policy Languages Have to Offer
TLDR
An independent, scenario-based comparison of six prominent policy languages, namely Protune, Rei, Ponder, Trust-X, KeyNote and P3P-APPEL, with respect to the needs that users have in protecting their personal, sensitive data is presented.
Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work?
TLDR
The results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies.
Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding
TLDR
This paper investigates the differences in interpretation among expert, knowledgeable, and typical users and explores whether those groups can understand the practices described in privacy policies at a level sufficient to support rational decision-making, and seeks to fill an important gap in the understanding of privacy policies through primary research on user interpretation.
Eddy, a formal language for specifying and analyzing data flow specifications for conflicting privacy requirements
TLDR
A strict subset of commonly found privacy requirements are identified and a methodology to map these requirements from natural language text to a formal language in description logic, called Eddy is developed, so developers can detect conflicting privacy requirements within a policy and enable the tracing of data flows within these policies.
...
...