• Corpus ID: 84835445

Natural Language Processing for Mobile App Privacy Compliance

@inproceedings{Story2019NaturalLP,
  title={Natural Language Processing for Mobile App Privacy Compliance},
  author={Peter Story and Sebastian Zimmeck and Abhilasha Ravichander and Daniel Smullen and Ziqi Wang and Joel R. Reidenberg and N. Cameron Russell and Norman M. Sadeh},
  year={2019}
}
Many Internet services collect a flurry of data from their users. Privacy policies are intended to describe the services’ privacy practices. However, due to their length and complexity, reading privacy policies is a challenge for end users, government regulators, and companies. Natural language processing holds the promise of helping address this challenge. Specifically, we focus on comparing the practices described in privacy policies to the practices performed by smartphone apps covered by… 

Figures and Tables from this paper

MAPS: Scaling Privacy Compliance Analysis to a Million Apps
TLDR
The Mobile App Privacy System (MAPS) is introduced for conducting an extensive privacy census of Android apps and a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques is designed.
Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies
TLDR
The PrivaSeer Corpus of 1,005,380 English language website privacy policies collected from the web is presented and an unsupervised topic modelling approach is employed to investigate the contents of policy documents in the corpus and discuss the distribution of topics in privacy policies at web scale.
Compliance Checking with NLI: Privacy Policies vs. Regulations
TLDR
This work uses Natural Language Inference techniques to compare privacy regulations against sections of privacy policies from a selection of large companies and finds that the test accuracy was higher on the model trained on the SNLI, but when actually doing NLI tasks on real world privacy policies, the model training on MNLI performed much better.
Longitudinal Compliance Analysis of AndroidApplications with Privacy Policies
TLDR
The discrepancies between the purported and actual data practices show that privacy policies are often incoherent with the apps’ behaviors, thus defying the ‘notice and choice’ principle when users install apps.
From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme
TLDR
A mapping from GDPR provisions to the OPP-115 annotation scheme is introduced, which serves as the basis for a growing number of projects to automatically classify privacy policy text, suggesting the feasibility of bridging existing computational and legal research on privacy policies, benefiting both areas.
Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy?
TLDR
The goal is to provide a roadmap for the development and use of language technologies to empower users to reclaim control over their privacy, limit privacy harms, and rally research efforts from the community towards addressing an issue with large social impact.
Natural Language Privacy Policy in IoT
TLDR
The concepts of explicit and implicit purpose, which enable using the syntactic and semantic analyses to extract purposes in different sentences, are presented and the domain adaption method is applied to the semantic role labeling (SRL) model to improve the efficiency of purpose extraction.
PPAdroid: An Approach to Android Privacy Protocol Analysis
TLDR
This paper proposes a method to detect sentences related to personal information operations in privacy protocol documents by the Stanford CoreNLP technique and shows that the proposed method is better than state-of-art keyword-based methods.
An Empirical Study on User Reviews Targeting Mobile Apps' Security & Privacy
TLDR
It was evident from the results that the number of permissions that the apps request plays a dominant role in this matter and sending out the location can affect the users' thoughts about the app.
PurExt: Automated Extraction of the Purpose-Aware Rule from the Natural Language Privacy Policy in IoT
TLDR
A novel approach to identify the rule from natural language privacy policies with a high recall rate and the implicit purpose extraction of the adapted model significantly improves the F1-score by 11%.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Towards Automatic Classification of Privacy Policy Text
TLDR
This paper presents advances in extracting privacy policy paragraphs and individual sentences that relate to expert-identified categories of policy contents, using methods in supervised learning, and shows that relevant segments and sentences can be classified with average micro-F1 scores, improving over prior work.
Identifying the Provision of Choices in Privacy Policy Text
TLDR
This work presents a two-stage architecture of classification models to identify opt-out choices in privacy policy text, labelling common varieties of choices with a mean F1 score of 0.735, and enables the creation of systems to help Internet users to learn about their choices.
MAPS: Scaling Privacy Compliance Analysis to a Million Apps
TLDR
The Mobile App Privacy System (MAPS) is introduced for conducting an extensive privacy census of Android apps and a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques is designed.
The Creation and Analysis of a Website Privacy Policy Corpus
TLDR
A corpus of 115 privacy policies with manual annotations for 23K fine-grained data practices is introduced and the process of using skilled annotators and a purpose-built annotation tool to produce the data is described.
Can We Trust the Privacy Policies of Android Apps?
TLDR
This paper conducts the first systematic study on privacy policy by proposing a novel approach to automatically identify three kinds of problems in privacy policy, named PPChecker, and evaluating it with real apps and privacy policies.
Toward a Framework for Detecting Privacy Policy Violations in Android Application Code
TLDR
This work proposes a semi-automated framework that consists of a policy terminology- API method map that links policy phrases to API methods that pro- duce sensitive information, and information flow analysis to detect misalignments.
Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
TLDR
Polisis, an automated framework for privacy Policies analysis, enables scalable, dynamic, and multi-dimensional queries on privacy policies, and demonstrates Polisis's modularity and utility with two robust applications that support structured and free-form querying.
Privee: An Architecture for Automatically Analyzing Web Privacy Policies
TLDR
Privee--a software architecture for analyzing essential policy terms based on crowdsourcing and automatic classification techniques for facilitating the notice-and-choice principle by accurately notifying-Web users of privacy practices and increasing privacy transparency on the Web is proposed.
The Cost of Reading Privacy Policies
TLDR
It is argued that the time to read privacy policies is, in and of itself, a form of payment and website visitors must pay with their time to research policies in order to retain their privacy.
I Read but Don't Agree: Privacy Policy Benchmarking using Machine Learning and the EU GDPR
TLDR
A machine learning based approach to summarize the rather long privacy policy into short and condensed notes following a risk-based approach and using the European Union (EU) General Data Protection Regulation (GDPR) aspects as assessment criteria is proposed.
...
...