Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

  title={Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision},
  author={Zifeng Wang and Jimeng Sun},
Clinical trials are essential for drug develop- 001 ment but are extremely expensive and time- 002 consuming to conduct. It is beneficial to study 003 similar historical trials when designing a clin- 004 ical trial. However, lengthy trial documents 005 and lack of labeled data make trial similarity 006 search difficult. We propose a zero-shot clini- 007 cal trial retrieval method, called Trial2Vec , 008 which learns through self-supervision without 009 the need for annotating similar clinical… 

Figures and Tables from this paper

SurvTRACE: transformers for survival analysis with competing events

This work proposes a transformer-based model that does not make the assumption for the underlying survival distribution and is capable of handling competing events, namely SurvTRACE, which suffices to great potential in enhancing clinical trial design and new treatment development.

Artificial Intelligence for In Silico Clinical Trials: A Review

This article reviews papers under three main topics: clinical simulation, individualized predictive modeling, and computer-aided trial design and presents the machine learning problem formulation and available data sources for each task.



Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching

This work introduces NQS, a neural query synthesis method that leverages a zero-shot document expansion model to generate multiple sentence-long queries from lengthy patient descriptions and introduces a two-stage neural reranking pipeline trained on clinical trial matching data using tailored ranking templates.

Towards an Aspect-Based Ranking Model for Clinical Trial Search

An automated method to retrieve relevant trials based on the overlap of UMLS concepts between the user query and clinical trials and measures the correlation between the different aspect-based ranking lists and observes a high negative Spearman rank’s correlation coefficient between popularity and recency.

Clinical trial search: Using biomedical language understanding models for re-ranking

Neural Ranking Models with Weak Supervision

This paper proposes to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources, and suggests that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Experimental results show that MICoL significantly outperforms strong zero-shot text classification and contrastive learning baselines and is on par with the state-of-the-art supervised metadata-aware LMTC method trained on 10K–200K labeled documents, and tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed labels.

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

The proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.

Pre-training Tasks for Embedding-based Large-scale Retrieval

It is shown that the key ingredient of learning a strong embedding-based Transformer model is the set of pre- training tasks, and with adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers.

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

Experimental results show that the proposed lightweight extension on top of BERT significantly outperforms other unsupervised sentence embedding baselines on common semantic textual similarity (STS) tasks and downstream supervised tasks, and achieves performance competitive with supervised methods on various tasks.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.