Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research

@inproceedings{Rweyemamu2019ClusteringAT,
  title={Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research},
  author={Walter Rweyemamu and Tobias Lauinger and Christo Wilson and William K. Robertson and Engin Kirda},
  booktitle={PAM},
  year={2019}
}
Top domain rankings (e.g., Alexa) are commonly used in security research, such as to survey security features or vulnerabilities of “relevant” websites. Due to their central role in selecting a sample of sites to study, an inappropriate choice or use of such domain rankings can introduce unwanted biases into research results. We quantify various characteristics of three top domain lists that have not been reported before. For example, the weekend effect in Alexa and Umbrella causes these… Expand
Getting Under Alexa's Umbrella: Infiltration Attacks Against Internet Top Domain Lists
TLDR
It is demonstrated that it is feasible to infiltrate two domain rankings with very little effort, and it is suggested that researchers should refrain from using these domain rankings to model benign behaviour. Expand
Evaluating the Long-term Effects of Parameters on the Characteristics of the Tranco Top Sites Ranking
Although researchers often use top websites rankings for web measurements, recent studies have shown that due to the inherent properties and susceptibility to manipulation of these rankings, theyExpand
Prefix Top Lists: Gaining Insights with Prefixes from Domain-based Top Lists on DNS Deployment
TLDR
It is shown that popular domains adhere to name server recommendations for IPv4, but IPv6 compliance is still lacking, and the concept of prefix top lists is presented, which ameliorate some of the shortcomings, while providing insights into the importance of addresses of domain-based top lists. Expand
Out of Sight, Out of Mind: Detecting Orphaned Web Pages at Internet-Scale
Security misconfigurations and neglected updates commonly lead to systems being vulnerable. Especially in the context of websites, we often find pages that were forgotten, that is, they were leftExpand
The web is still small after more than a decade
TLDR
An empirical study to revisit web co-location using datasets collected from active DNS measurements shows that the web is still small and centralized to a handful of hosting providers, and analyses of popular block lists indicate that IP-based blocking does not cause severe collateral damage as previously thought. Expand
Challenges and pitfalls in malware research
TLDR
A systematic literature review of 491 papers on malware research published in major security conferences between 2000 and 2018 identifies the most common pitfalls present in past literature and proposes a method for assessing current (and future) malware research. Expand
Who's got your mail?: characterizing mail service provider usage
TLDR
A reliable methodology is developed to better map domains to mail service providers and the extent to which nationality plays a role in such mail provisioning decisions is explored. Expand
Information Security: 22nd International Conference, ISC 2019, New York City, NY, USA, September 16–18, 2019, Proceedings
TLDR
A novel attack called the intermittent block withholding (IBWH) attack is proposed and it is proved that this attack is optimal in the authors' model and includes the dynamics of the Bitcoin network’s computing power, and even with the changing attacker's reward rates, the IBWH's reward rate remains optimal. Expand
A Longitudinal Analysis of the ads.txt Standard
TLDR
This work presents a 15-month longitudinal, observational study of the ads.txt standard to understand if it is helping ad buyers to combat domain spoofing and whether the transparency offered by the standard can provide useful data to researchers and privacy advocates. Expand
ShamFinder: An Automated Framework for Detecting IDN Homographs
TLDR
This work developed a framework named "ShamFinder," which is an automated scheme to detect IDN homographs, and develops an automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDNhomograph. Expand
...
1
2
...

References

SHOWING 1-10 OF 27 REFERENCES
A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists
TLDR
It is found that top lists generally overestimate results compared to the general population by a significant margin, often even an order of magnitude, and some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability. Expand
Rigging Research Results by Manipulating Top Websites Rankings
TLDR
How both inherent properties and vulnerabilities to adversarial manipulation of these rankings may affect the conclusions of security studies are uncovered. Expand
Structure and Stability of Internet Top Lists
TLDR
Investigating the aptness of frequently used top lists for empirical Internet scans, including stability, correlation, and potential biases of such lists is investigated. Expand
Taster's choice: a comparative analysis of spam feeds
TLDR
This paper compares the contents of ten distinct contemporaneous feeds of spam-advertised domain names to document significant variations based on how such feeds are collected and show how these variations can produce differences in findings as a result. Expand
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web
TLDR
The first comprehensive study of client-side JavaScript library usage and the resulting security implications across the Web demonstrates that not only website administrators, but also the dynamic architecture and developers of third-party services are to blame for the Web's poor state of library management. Expand
EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis
TLDR
This paper introduces EXPOSURE, a system that employs large-scale, passive DNS analysis techniques to detect domains that are involved in malicious activity, and uses 15 features that it extracts from the DNS traffic that allow it to characterize different properties of DNS names and the ways that they are queried. Expand
Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains
TLDR
The Exposure system, a system designed to detect malicious domains in real time, by applying 15 unique features grouped in four categories, is presented and the results and lessons learned from 17 months of its operation are described. Expand
Knowing your enemy: understanding and detecting malicious web advertising
TLDR
A large-scale study through analyzing ad-related Web traces crawled over a three-month period reveals the rampancy of malvertising: hundreds of top ranking Web sites fell victims and leading ad networks such as DoubleClick were infiltrated. Expand
Connected Colors: Unveiling the Structure of Criminal Networks
TLDR
This paper develops a method to construct a graph of relationships between malicious hosts and identify the underlying criminal networks, using historic assignments in the DNS, and applies this method to study the general threat landscape, as well as four cases of sophisticated criminal networks. Expand
Measuring HTTPS Adoption on the Web
TLDR
This work gathers metrics to benchmark the status and progress of HTTPS adoption on the Web in 2017, and surveys server support for HTTPS among top and long-tail websites to gain insight into the current state of the HTTPS ecosystem. Expand
...
1
2
3
...