• Publications
  • Influence
CEDR: Contextualized Embeddings for Document Ranking
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.
Hate speech detection: Challenges and solutions
This work identifies and examines challenges faced by online automatic approaches for hate speech detection in text, and proposes a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods.
SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
This paper investigates the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtains high-quality labeled data without the need for manual labelling.
PARADE: Passage Representation Aggregation for Document Reranking
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
The proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks making these networks more practical to use in a real-time ranking scenario.
Expansion via Prediction of Importance with Contextualization
A representation-based ranking approach that explicitly models the importance of each term using a contextualized language model, and performs passage expansion by propagating the importance to similar terms, which narrows the gap between inexpensive and cost-prohibitive passage ranking approaches.
SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search
This work presents a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles, and trains the model on a general-domain answer ranking dataset, and transfers the relevance signals to SARS-CoV-2 for evaluation.
Content-Based Weak Supervision for Ad-Hoc Re-Ranking
This work examines the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance and proposes filtering techniques to eliminate training samples that are too far out of domain using a heuristic-based approach and novel supervised filter that re-purposes a neural ranker.
GUIR at SemEval-2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction
This work presents a system that uses supervised learning for the extraction of temporal expression and event spans with corresponding attributes and narrative container relations for cross-domain temporal extraction from clinical text.
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.