S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

@article{Subramanian2021S2ANDAB,
  title={S2AND: A Benchmark and Evaluation System for Author Name Disambiguation},
  author={Shivashankar Subramanian and Daniel King and Doug Downey and Sergey Feldman},
  journal={2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
  year={2021},
  pages={170-179}
}
Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND algorithms have been proposed, comparing them is difficult because they often employ distinct features and are evaluated on different datasets. In response to this challenge, we present S2AND, a unified benchmark dataset for AND on… 
Some of Entity Resolution
TLDR
Modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, political science, and other disciplines that are used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others are reviewed.
Time Waits for No One! Analysis and Challenges of Temporal Misalignment
TLDR
This work establishes a suite of tasks across multiple domains to study temporal misalignment in modern NLP systems and concludes that, while temporal adaptation through continued pretraining can help, these gains are small compared to task-specific finetuning on data from the target time period.
Infrastructure for Rapid Open Knowledge Network Development
TLDR
A National Science Foundation Convergence Accelerator project is described to build a set of Knowledge Network Programming Infrastructure systems to address the issue of frustratingly slow building, using, and scaling large knowledge networks.
Bridger: Toward Bursting Scientific Filter Bubbles and Boosting Innovation via Novel Author Discovery
TLDR
Bridger, a system for facilitating discovery of scholars and their work, is described and an approach that locates commonalities and contrasts between scientists—retrieving partially similar authors—is developed, at a higher rate than a state-of-art neural model.
Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery
TLDR
Bridger is described, a system for facilitating discovery of scholars and their work that locates commonalities and contrasts between scientists to balance relevance and novelty and demonstrates an approach for displaying information about authors, boosting the ability to understand the work of new, unfamiliar scholars.

References

SHOWING 1-10 OF 44 REFERENCES
On Graph-Based Name Disambiguation
TLDR
This article presents an effective framework named GHOST (abbreviation for GrapHical framewOrk for name diSambiguaTion), to solve the problem in digital libraries to distinguish publications written by authors with identical names, and devise a novel similarity metric.
A Web Service for Author Name Disambiguation in Scholarly Databases
TLDR
A novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application and develops a novel algorithm that has a fast record-to-cluster match for record-based queries.
On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method
TLDR
This article proposes a set of carefully designed heuristics and similarity functions, and applies supervision only to optimize such parameters for each particular dataset, and shows that this method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster.
Data sets for author name disambiguation: an empirical analysis and a new resource
TLDR
A set of general requirements to future AND data sets is derived, which include both trivial requirements, like absence of errors and preservation of author order, and more substantial ones, like full disambiguation and adequate representation of publications with a small number of authors and highly variable author names.
Evaluating author name disambiguation for digital libraries: a case of DBLP
TLDR
DBLP’s author name disambiguation performs well even on large ambiguous name blocks but deficiently on distinguishing authors with the same names, possibly due to its hybrid disAmbiguation approach combining algorithmic disambigsuation and manual error correction.
Efficient Name Disambiguation for Large-Scale Databases
TLDR
It is proved that by recasting transitivity as density reachability in DBSCAN, transitivity is guaranteed for core points.
Hybrid Deep Pairwise Classification for Author Name Disambiguation
TLDR
A hybrid method which takes advantage of both approaches by extracting both structure-aware features and global features and in addition, a novel way to train a global model utilizing a large number of negative samples is introduced.
Dynamic author name disambiguation for growing digital libraries
TLDR
This paper proposes a “BatchAD+IncAD” framework for dynamic author disambiguation, and proposes a novel IncAD model which aggregates metadata from a cluster of records to estimate the author’s profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is produced by the author.
Disambiguating authors in academic publications using random forests
TLDR
This paper describes an algorithm for pair-wise disambiguation of author names based on a machine learning classification algorithm, random forests, and defines a set of similarity profile features to assist in author disambigsuation.
Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop.
TLDR
A novel representation learning method is proposed by incorporating both global and local information and an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method is presented.
...
...