Bias on the web

@article{BaezaYates2018BiasOT,
  title={Bias on the web},
  author={Ricardo Baeza-Yates},
  journal={Communications of the ACM},
  year={2018},
  volume={61},
  pages={54 - 61}
}
  • R. Baeza-Yates
  • Published 23 May 2018
  • Computer Science
  • Communications of the ACM
Bias in Web data and use taints the algorithms behind Web-based applications, delivering equally biased results. 

Figures from this paper

The history of digital spam
Tracing the tangled web of unsolicited and undesired email and possible strategies for its demise.
Bias on the web and beyond: an accessibility point of view
TLDR
The main goal is to make people aware of the different biases that affect all of us on the Web as well as stress that inclusive content should be designed such that it helps people with learning disabilities or vision problems, among others.
Analyzing Social Media Research: A Data Quality and Research Reproducibility Perspective
Social media platforms have become very popular these days among individuals and organizations. On the one hand, organizations use social media as a potential tool to create awareness of their prod...
Big Data Science Over the Past Web
TLDR
This Chapter presents several examples of big data tools, machine learning frameworks and deep learning algorithms that significantly increase the scalability and performance of several computational tasks, especially over text, image and audio and gives an overview of their application to support longitudinal studies over web archive collections.
Bias in Search and Recommender Systems
TLDR
This work believes that recommender systems could improve their long-term revenue if significantly more exploration is performed, probably diminishing at the same time the tension between user experience and monetization.
Personalization, Bias and Privacy
TLDR
This presentation discusses the interaction of these three elements: personalization, bias and privacy.
Investigating Searchers’ Mental Models to Inform Search Explanations
Modern web search engines use many signals to select and rank results in response to queries. However, searchers’ mental models of search are relatively unsophisticated, hindering their ability to ...
Biases in the Facebook News Feed: A Case Study on the Italian Elections
TLDR
A reproducible methodology encompassing measurements and an analytical model to capture the visibility of publishers over a News Feed and it is found that the bias is non-negligible even for users that are deliberately set as neutral with respect to their political views.
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
TLDR
A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Data Consortia
TLDR
Potential frameworks for groups of consenting, informed users to pool their data for their own benefit and that of society are explored, discussing directions, challenges, and evolution for such efforts.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Relationship between web links and trade
We report on observations on Web characterization studies that suggest that the amount of Web links among sites under different country-code top-level domains is related to the amount of trade
Characteristics of the Web of Spain
TLDR
The results of an in-depth study over a large collection of Web pages found that some of the characteristics of this collection resemble the ones of the Web at large, while others are specific to the Web of Spain, or have not been studied in the past.
Incremental Sampling of Query Logs
TLDR
A simple technique is introduced to generate incremental query log samples that mimics well the original query distribution and editorial judgments for new queries can be consistently added to previous judgments.
Crowdsourced research: Many hands make tight work
Crowdsourcing research can balance discussions, validate findings and better inform policy, say Raphael Silberzahn and Eric L. Uhlmann.
Beliefs and biases in web search
TLDR
Targeting yes-no questions in the critical domain of health search, it is shown that Web searchers exhibit their own biases and are also subject to bias from the search engine, and that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth.
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
TLDR
A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Social media news communities: gatekeeping, coverage, and statement bias
TLDR
The results, obtained by analyzing 80 international news sources during a two-week period, show that biases are subtle but observable, and follow geographical boundaries more closely than political ones.
A dynamic bayesian network click model for web search ranking
TLDR
A Dynamic Bayesian Network is proposed which aims at providing us with unbiased estimation of the relevance from the click logs and shows that the proposed click model outperforms other existing click models in predicting both click-through rate and relevance.
Genealogical trees on the web: a search engine user perspective
TLDR
It is shown that a significant fraction of the Web is a byproduct of the latter case, and the concept of Web genealogical tree is introduced, in which every page in a Web snapshot is classified into a component.
A user browsing model to predict search engine click data from past observations.
TLDR
It is confirmed that a user almost always see the document directly after a clicked document, and why documents situated just after a very relevant document are clicked more often is explained.
...
...