Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature

@article{Kanakia2020MitigatingBI,
  title={Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature},
  author={Anshul Kanakia and Kuansan Wang and Yuxiao Dong and Boya Xie and Kyle Lo and Zhihong Shen and Lucy Lu Wang and Chiyuan Huang and Darrin Eide and Sebastian Kohlmeier and Chieh-Han Wu},
  journal={Frontiers in Research Metrics and Analytics},
  year={2020},
  volume={5}
}
On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to facilitate the development of question-answering systems that can assist researchers in finding relevant research on COVID-19. As of May 27, 2020, CORD-19 includes more than 100,000 open access publications from major publishers and PubMed as well as preprint articles deposited into medRxiv, bioRxiv… 
Meta-research on COVID-19: An overview of the early trends
TLDR
It is speculated that some aspects of doing research during COVID-19 are more likely to persist than others, and the shift to virtual for academic events such as conferences; the use of openly accessible pre-prints; the ‘datafication’ of scholarly literature and consequent broader adoption of machine learning in science communication.
A scientometric overview of CORD-19
TLDR
Based on a comparison to the Web of Science database, it is found that CORD-19 provides an almost complete coverage of research on COVID-19 and coronaviruses.
A scientometric overview of CORD-19
TLDR
Based on a comparison to the Web of Science database, it is found that CORD-19 provides an almost complete coverage of research on COVID-19 and coronaviruses.
Visibility, collaboration and impact of the Cuban scientific output on COVID-19 in Scopus
TLDR
The greater the leadership in Cuban research, the lower its impact, and the lower the indexes of international collaboration.
Fast Learning of MNL Model from General Partial Rankings with Application to Network Formation Modeling
  • Jiaqi Ma, Xingjian Zhang, Q. Mei
  • Computer Science
    ArXiv
  • 2021
TLDR
A scalable method for approximating the MNL likelihood of general partial rankings in polynomial time complexity is developed and the proposed methods achieve more accurate parameter estimation and better fitness of data compared to conventional methods.
The boundary-spanning mechanisms of Nobel Prize winning papers
TLDR
It is found that a group of Nobel Prize winning papers share remarkable boundary-spanning traits, marked by exceptional abilities to connect disparate and topically-diverse clusters of research papers.

References

SHOWING 1-10 OF 35 REFERENCES
A scientometric overview of CORD-19
TLDR
Based on a comparison to the Web of Science database, it is found that CORD-19 provides an almost complete coverage of research on COVID-19 and coronaviruses.
Bias in data‐driven artificial intelligence systems—An introductory survey
TLDR
A broad multidisciplinary overview of the area of bias in AI systems is provided, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame.
Pandemic Publishing: Medical journals drastically speed up their publication process for Covid-19
TLDR
It is concluded that medical journals have indeed drastically accelerated the publication process for Covid-19 related articles since the outbreak of the pandemic, and turnaround times have decreased on average by 49%.
Microsoft Academic Graph: When experts are not enough
TLDR
The design, schema, and technical and business motivations behind MAG are described and how MAG can be used in analytics, search, and recommendation scenarios are elaborated.
A Century of Science: Globalization of Scientific Collaborations, Citations, and Innovations
TLDR
It is found that science has benefited from the shift from individual work to collaborative effort, with over 90% of the world-leading innovations generated by collaborations in this century, nearly four times higher than they were in the 1900s.
A Review of Microsoft Academic Services for Science of Science Studies
TLDR
The use of three key AI technologies that underlies its prowess in capturing scholarly communications with adequate quality and broad coverage are focused on, including a reinforcement learning approach to assessing scholarly importance for entities participating in scholarly communications, called the saliency, that serves both as an analytic and a predictive metric in MAS.
Scale-free networks are rare
TLDR
A severe test of their empirical prevalence using state-of-the-art statistical tools applied to nearly 1000 social, biological, technological, transportation, and information networks finds robust evidence that strongly scale-free structure is empirically rare, while for most networks, log-normal distributions fit the data as well or better than power laws.
How popular is your paper? An empirical study of the citation distribution
Abstract:Numerical data for the distribution of citations are examined for: (i) papers published in 1981 in journals which are catalogued by the Institute for Scientific Information (783,339 papers)
Community structure in social and biological networks
  • M. Girvan, M. Newman
  • Physics, Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2002
TLDR
This article proposes a method for detecting communities, built around the idea of using centrality indices to find community boundaries, and tests it on computer-generated and real-world graphs whose community structure is already known and finds that the method detects this known structure with high sensitivity and reliability.
Graph evolution: Densification and shrinking diameters
TLDR
A new graph generator is provided, based on a forest fire spreading process that has a simple, intuitive justification, requires very few parameters, and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
...
1
2
3
4
...