• Publications
  • Influence
CEDR: Contextualized Embeddings for Document Ranking
TLDR
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.
Hate speech detection: Challenges and solutions
TLDR
This work identifies and examines challenges faced by online automatic approaches for hate speech detection in text, and proposes a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods.
SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
TLDR
This paper investigates the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtains high-quality labeled data without the need for manual labelling.
PARADE: Passage Representation Aggregation for Document Reranking
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
TLDR
The proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks making these networks more practical to use in a real-time ranking scenario.
Expansion via Prediction of Importance with Contextualization
TLDR
A representation-based ranking approach that explicitly models the importance of each term using a contextualized language model, and performs passage expansion by propagating the importance to similar terms, which narrows the gap between inexpensive and cost-prohibitive passage ranking approaches.
SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search
TLDR
This work presents a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles, and trains the model on a general-domain answer ranking dataset, and transfers the relevance signals to SARS-CoV-2 for evaluation.
Content-Based Weak Supervision for Ad-Hoc Re-Ranking
TLDR
This work examines the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance and proposes filtering techniques to eliminate training samples that are too far out of domain using a heuristic-based approach and novel supervised filter that re-purposes a neural ranker.
GUIR at SemEval-2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction
TLDR
This work presents a system that uses supervised learning for the extraction of temporal expression and event spans with corresponding attributes and narrative container relations for cross-domain temporal extraction from clinical text.
OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline
TLDR
This work presents a complete ad-hoc neural ranking pipeline which addresses shortcomings: OpenNIR, and includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines.
...
1
2
3
4
...