Share This Author
SPECTER: Document-level Representation Learning using Citation-informed Transformers
- Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld
- Computer ScienceACL
- 15 April 2020
This work proposes SPECTER, a new method to generate document-level embedding of scientific papers based on pretraining a Transformer language model on a powerful signal of document- level relatedness: the citation graph, and shows that Specter outperforms a variety of competitive baselines on the benchmark.
Construction of the Literature Graph in Semantic Scholar
This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task.
Content-Based Citation Recommendation
- Chandra Bhagavatula, Sergey Feldman, Russell Power, Waleed Ammar
- Computer ScienceNAACL
- 22 February 2018
It is shown empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup and released an online portal for citation recommendation based on this method.
Part-of-speech histograms for genre classification of text
- Sergey Feldman, Marius A. Marin, Mari Ostendorf, M. Gupta
- Computer ScienceIEEE International Conference on Acoustics…
- 19 April 2009
This work proposes statistics of POS histograms as classification features, coupled with a quadratic discriminant classifier, to address the problem of classifying the genre of text, which is useful for a variety of language processing problems.
Completely Lazy Learning
- E. K. Garcia, Sergey Feldman, M. Gupta, S. Srivastava
- Computer ScienceIEEE Transactions on Knowledge and Data…
- 1 September 2010
This work proposes a simple alternative to cross validation of the neighborhood size that requires no preprocessing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods so that similar classification performance can be attained without any training.
Revisiting Stein's paradox: multi-task averaging
The proposed multi-task averaging (MTA) algorithm results in a convex combination of the individual task's sample averages, and the optimal amount of regularization for the two task case is derived for the minimum risk estimator and a minimax estimator.
ABNIRML: Analyzing the Behavior of Neural IR Models
- Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan
- Computer ScienceTACL
- 2 November 2020
A new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML) is presented, which includes new types of diagnostic probes that allow us to test several characteristics—such as writing styles, factuality, sensitivity to paraphrasing and word order—that are not addressed by previous techniques.
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
- Sergey Feldman, Waleed Ammar, Kyle Lo, E. Trepman, Madeleine van Zuylen, Oren Etzioni
- PsychologyJAMA network open
- 1 July 2019
It is suggested that sex bias against female participants in clinical studies persists, but results differ when studies vs participants are the measurement units.
Simplified Data Wrangling with ir_datasets
- Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian
- Computer ScienceSIGIR
- 3 March 2021
A new robust and lightweight tool for acquiring, managing, and performing typical operations over datasets used in IR, primarily focus on textual datasets used for ad-hoc search.
Simulations and real data experiments demonstrate that MTA outperforms both maximum likelihood and James-Stein estimators, and that the approach to estimating the amount of regularization rivals cross-validation in performance but is more computationally efficient.