• Corpus ID: 46931441

Rigging Research Results by Manipulating Top Websites Rankings

@article{Pochat2018RiggingRR,
  title={Rigging Research Results by Manipulating Top Websites Rankings},
  author={Victor Le Pochat and Tom van Goethem and Wouter Joosen},
  journal={ArXiv},
  year={2018},
  volume={abs/1806.01156}
}
Researchers often use rankings of popular websites when measuring security practices, evaluating defenses or analyzing ecosystems. However, little is known about the data collection and processing methodologies of these rankings. In this paper, we uncover how both inherent properties and vulnerabilities to adversarial manipulation of these rankings may affect the conclusions of security studies. To that end, we compare four main rankings used in recent studies in terms of their agreement with… 
Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research
TLDR
It is found that the weekend effect in Alexa and Umbrella causes these rankings to change their geographical diversity between the workweek and the weekend, and up to 91% of ranked domains appear in alphabetically sorted clusters containing up to 87k domains of presumably equivalent popularity.
A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists
TLDR
It is found that top lists generally overestimate results compared to the general population by a significant margin, often even an order of magnitude, and some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability.
Tracking and Tricking a Profiler: Automated Measuring and Influencing of Bluekai's Interest Profiling
TLDR
A system to analyze online profiling as a black box by simulating web browsing sessions based on links posted to Reddit shows that only a fraction of websites influence the interests assigned to a session's profile, that the profiles themselves are very noisy, and that identical browsing behavior results in different profiles.
We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR's Impact on Web Privacy
TLDR
It is concluded that the GDPR is making the web more transparent, but there is still a lack of both functional and usable mechanisms for users to consent to or deny processing of their personal data on the Internet.
Privacy Policies Across the Ages: Content and Readability of Privacy Policies 1996-2021
TLDR
A large-scale longitudinal corpus of privacy policies from 1996 to 2021 is collected and analyzed to speculate why privacy policies are rarely read and propose changes that would make privacy policies serve their readers instead of their writers.
Exploring Malware Behavior of Webpages Using Machine Learning Technique: An Empirical Study
TLDR
To improve the feature selection accuracy, a machine learning technique called bagging is employed using the Weka program and random tree was applied because it can handle similar types of data such as bagging, but better than other classifiers because it is faster and more accurate.
Measuring Cookies and Web Privacy in a Post-GDPR World
In response, the European Union has adopted the General Data Protection Regulation (GDPR), a legislative framework for data protection empowering individuals to control their data. Since its adoption
Measurement-based Experiments on the Mobile Web: A Systematic Mapping Study
TLDR
This study benefits researchers and practitioners by presenting common techniques, empirical practices, and tools to properly conduct measurement-based experiments on the mobile Web.
Innocent Until Proven Guilty (IUPG): Building Deep Learning Models with Embedded Robustness to Out-Of-Distribution Content
TLDR
This work proposes a novel learning framework called Innocent Until Proven Guilty which prototypes training data clusters or classes within the input space while uniquely leveraging noise and inherently random classes to discover noise-resistant, uniquely identifiable features of the modeled classes.
Defining the linkage specialist role in the HIV care cascade
TLDR
The most frequently cited duties, knowledge, skills and abilities required of linkage specialists in employment advertisements and described in peer-reviewed literature are identified.

References

SHOWING 1-10 OF 90 REFERENCES
A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists
TLDR
It is found that top lists generally overestimate results compared to the general population by a significant margin, often even an order of magnitude, and some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability.
Security Challenges in an Increasingly Tangled Web
TLDR
The current state of web dependencies is investigated and two security challenges associated with the increasing reliance on external services are explored: the expanded attack surface associated with serving unknown, implicitly trusted third-party content and how the increased set of external dependencies impacts HTTPS adoption.
Large-Scale Security Analysis of the Web: Challenges and Findings
TLDR
This paper reports on the state of security for more than 22,000 websites that originate in 28 EU countries and explores the adoption of countermeasures that can be used to defend against common attacks and serve as indicators of "security consciousness".
Exposing the Hidden Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites
TLDR
It is revealed that a handful of U.S. companies receive the vast bulk of user data, and roughly 1 in 5 websites are potentially vulnerable to known National Security Agency spying techniques at the time of analysis.
Measuring HTTPS Adoption on the Web
TLDR
This work gathers metrics to benchmark the status and progress of HTTPS adoption on the Web in 2017, and surveys server support for HTTPS among top and long-tail websites to gain insight into the current state of the HTTPS ecosystem.
Aiding the Detection of Fake Accounts in Large Scale Social Online Services
TLDR
A new tool in the hands of OSN operators, which relies on social graph properties to rank users according to their perceived likelihood of being fake (SybilRank), which is computationally efficient and can scale to graphs with hundreds of millions of nodes, as demonstrated by the Hadoop prototype.
Knowing your enemy: understanding and detecting malicious web advertising
TLDR
A large-scale study through analyzing ad-related Web traces crawled over a three-month period reveals the rampancy of malvertising: hundreds of top ranking Web sites fell victims and leading ad networks such as DoubleClick were infiltrated.
Peeking Through the Cloud: DNS-Based Estimation and Its Applications
TLDR
A new estimation technique that uses DNS cache probing to infer the density of clients accessing a given service, which is less invasive as it does not reveal user-specific traits, and is more robust against manipulation.
Apples, oranges and hosting providers: Heterogeneity and security in the hosting market
Hosting services are associated with various security threats, yet the market has barely been studied empirically. Most security research has relied on routing data and equates providers with
An Automated Approach to Auditing Disclosure of Third-Party Data Collection in Website Privacy Policies
TLDR
This study presents the first large-scale audit of disclosure of third-party data collection in website privacy policies, indicating that current implementations of "notice and choice" fail to provide notice or respect choice.
...
...