Diversity in Big Data: A Review

@article{Drosou2017DiversityIB,
  title={Diversity in Big Data: A Review},
  author={Marina Drosou and H. V. Jagadish and Evaggelia Pitoura and Julia Stoyanovich},
  journal={Big data},
  year={2017},
  volume={5 2},
  pages={
          73-84
        }
}
Big data technology offers unprecedented opportunities to society as a whole and also to its individual members. At the same time, this technology poses significant risks to those it overlooks. In this article, we give an overview of recent technical work on diversity, particularly in selection tasks, discuss connections between diversity and fairness, and identify promising directions for future work that will position diversity as an important component of a data-responsible society. We argue… 
Data science ethical considerations: a systematic literature review and proposed project framework
TLDR
This paper maps and describes the main ethical themes that were identified via systematic literature review and identifies a possible structure to integrate these themes within a data science project, thus helping to provide some structure in the on-going debate with respect to the possible ethical situations that can arise when using data science analytics.
Business Data Ethics: Emerging Trends in the Governance of Advanced Analytics and AI
Advanced analytics and artificial intelligence are powerful technologies that, along with their benefits, create new threats to privacy, equality, fairness and transparency. Existing law does not yet
TransFAT: Translating Fairness, Accountably and Transparency into Data Science Practice
TLDR
An ongoing regulatory effort in New York City is discussed, where the goal is to develop a methodology for enabling responsible use of algorithms and data in city agencies and is highlighted as part of the Data, Responsibly project.
Social-minded Measures of Data Quality
  • E. Pitoura
  • Computer Science
    ACM J. Data Inf. Qual.
  • 2020
TLDR
The case for social-minded measures, that is, measures that evaluate the effect of a system in society, are made, namely diversity, lack of bias, and fairness are focused on.
Data Processing: Reflections on Ethics
TLDR
Reflections on the unavoidable ethical facets entailed by all the steps of the information life-cycle, including source selection, knowledge extraction, data integration and data analysis are provided.
Ethics-aware Data Governance (Vision Paper)
TLDR
A comprehensive checklist of ethical desiderata for data protection and processing needs to be developed, along with methods and techniques to ensure and verify that these ethically motivated requirements and related legal norms are fulfilled throughout the data selection and exploration processes.
Responsible data management
TLDR
It is argued that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of Automated Decision Systems.
Teaching Responsible Data Science: Charting New Pedagogical Territory
TLDR
A recent experience in developing and teaching a technical course focused on responsible data science, which tackles the issues of ethics in AI, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection is recounted.
Diversity in Sociotechnical Machine Learning Systems
TLDR
A taxonomy of different diversity concepts from philosophy of science is introduced and the distinct epistemic and political rationales underlying these concepts are explicated, providing an overview of mechanisms by which diversity can benefit group performance.
Ethical Dimensions for Data Quality
TLDR
The need to extend the well-established data quality framework in [5] to incorporate ethics explicitly is advocated, and the most common ethical requirements as dimensions of quality are introduced.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Big Data and Its Exclusions
TLDR
This essay argues that a new "data antisubordination" doctrine may be needed because big data poses a unique threat to equality, not just privacy.
Big Data's Disparate Impact
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with.
Measuring Online Social Bubbles
TLDR
There is a strong correlation between collective and individual diversity, supporting the notion that when the authors use social media they find ourselves inside “social bubbles,” and could lead to a deeper understanding of how technology biases their exposure to new information.
Why diversity programs fail
After Wall Street firms repeatedly had to shell out millions to settle discrimination lawsuits, businesses started to get serious about their efforts to increase diversity. But unfortunately, they
A multidisciplinary survey on discrimination analysis
TLDR
This survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective.
Hear the Whole Story: Towards the Diversity of Opinion in Crowdsourcing Markets
TLDR
This paper addresses the algorithmic optimizations towards the diversity of opinion of crowdsourcing marketplaces and proposes the Similarity-driven Model (S-Model) and Task-driven model (T-Model), which are efficient and effective algorithms to enlist a budgeted number of workers, which have the optimal diversity.
Fairness through awareness
TLDR
A framework for fair classification comprising a (hypothetical) task-specific metric for determining the degree to which individuals are similar with respect to the classification task at hand and an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly is presented.
Exposure to ideologically diverse news and opinion on Facebook
TLDR
Examination of the news that millions of Facebook users' peers shared, what information these users were presented with, and what they ultimately consumed found that friends shared substantially less cross-cutting news from sources aligned with an opposing ideology.
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators
TLDR
The Sidelines algorithm – which temporarily suppresses a voter’s preferences after a preferred item has been selected – is presented as one approach to increase the diversity of result sets and can help build news and opinion aggregators that present users with a broader range of topics and opinions.
How to be Fair and Diverse?
TLDR
This work considers a basic algorithmic task that is central in machine learning: subsampling from a large data set and presents an algorithmic framework which allows for both fair and diverse samples.
...
1
2
3
4
5
...