A Hybrid Embedding Approach to Noisy Answer Passage Retrieval

@inproceedings{Cohen2018AHE,
  title={A Hybrid Embedding Approach to Noisy Answer Passage Retrieval},
  author={Daniel Cohen and W. Bruce Croft},
  booktitle={ECIR},
  year={2018}
}
Answer passage retrieval is an increasingly important information retrieval task as queries become more precise and mobile and audio interfaces more prevalent. In this task, the goal is to retrieve a contiguous series of sentences (a passage) that concisely addresses the information need expressed in the query. Recent work with deep learning has shown the efficacy of distributed text representations for retrieving sentences or tokens for question answering. However, determining the relevancy of… 
Passage Similarity and Diversification in Non-factoid Question Answering
TLDR
This paper introduces a new dataset NFPassageQA_Sim, with human annotated similarity labels for pairs of answer passages corresponding to each question, and demonstrates the effectiveness of using weak supervision signals derived from GloVe, fine-tuned and trained using a BERT model for the task of answer passage clustering.
On Achieving High Quality User Reviews Retrieval in the Context of Conversational Faceted Search
TLDR
The combination of handmade and statistical dictionaries succeed better results in terms of precision in relevant AP (review) retrieval, and methods that devalue question words which describe the general domain context, improve the precision in restricted domains such as hotel booking.
ANTIQUE: A Non-factoid Question Answering Benchmark
TLDR
This paper develops and releases a collection of 2,626 open-domain non-factoid questions from a diverse set of categories, and includes a brief analysis of the data as well as baseline results on both classical and neural IR models.
A Semantic Expansion-Based Joint Model for Answer Ranking in Chinese Question Answering Systems
TLDR
A new joint model for answer ranking is proposed by leveraging context semantic features, which balances both question-answer similarities and answer ranking scores, which outperforms the state-of-the-art baseline methods.
Exploring Diversification In Non-factoid Question Answering
TLDR
It is shown that topic diversification can help to generate more effective rankings but is not consistent across different queries and test collections.
On the Theory of Weak Supervision for Information Retrieval
TLDR
It is proved that given some sufficient constraints on the loss function, weak supervision is equivalent to supervised learning under uniform noise, and an upper bound for the empirical risk of weak supervision in case of non-uniform noise is found.
Using NLP Techniques to Enhance Content Discoverability and Reusability for Adaptive Systems
TLDR
This thesis investigates how NLP techniques can be utilised in order to enhance the supply of content to adaptive systems and proposes a novel hierarchical text segmentation approach, named C-HTS, that builds a structure from text documents based on the semantic representation of text.
A Study of BERT for Non-Factoid Question-Answering under Passage Length Constraints
TLDR
This work explores the fine-tuning of BERT in different learning-to-rank setups, comprising both point-wise and pair-wise methods, resulting in substantial improvements over the state-of-the-art.
PatentMatch: A Dataset for Matching Patent Claims & Prior Art
TLDR
A training dataset for supervised machine learning called PatentMatch, which contains pairs of claims from patent applications and semantically corresponding text passages of different degrees from cited patent documents, is created to address the computer-assisted search for prior art.
Information Retrieval Technology: 15th Asia Information Retrieval Societies Conference, AIRS 2019, Hong Kong, China, November 7–9, 2019, Proceedings
This book constitutes the refereed proceedings of the 15th Information Retrieval Technology Conference, AIRS 2019, held in Hong Kong, China, in November 2019.The 14 full papers presented together
...
...

References

SHOWING 1-10 OF 25 REFERENCES
Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval
TLDR
This work introduces the answer sentence retrieval task for non-factoid Web queries, and investigates how this task can be effectively solved under a learning to rank framework, and designs two types of features, namely semantic and context features, beyond traditional text matching features.
Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
TLDR
This paper presents a convolutional neural network architecture for reranking pairs of short texts, where the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data are learned.
A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering
TLDR
The proposed method uses a stacked bidirectional Long-Short Term Memory network to sequentially read words from question and answer sentences, and then outputs their relevance scores, which outperforms previous work which requires syntactic features and external knowledge resources.
Skipping Word: A Character-Sequential Representation based Framework for Question Answering
TLDR
This paper proposes to straightforwardly model sentences by means of character sequences, and then utilize convolutional neural networks to integrate character embedding learning together with point-wise answer selection training, showing a much simpler procedure and more stable performance across different benchmarks.
Evaluating answer passages using summarization measures
TLDR
The advantages of document summarization measures for evaluating answer passage retrieval are described and it is shown that these measures have high correlation with existing measures and human judgments.
Query Expansion with Locally-Trained Word Embeddings
TLDR
It is demonstrated that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddlings for retrieval tasks, suggesting that other tasks benefiting from global embeddments may also benefit from local embeddins.
Learning deep structured semantic models for web search using clickthrough data
TLDR
A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed.
Bidirectional Attention Flow for Machine Comprehension
TLDR
The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.
A Deep Relevance Matching Model for Ad-hoc Retrieval
TLDR
A novel deep relevance matching model (DRMM) for ad-hoc retrieval that employs a joint deep architecture at the query term level for relevance matching and can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.
LSTM-based Deep Learning Models for non-factoid answer selection
TLDR
A general deep learning framework is applied for the answer selection task, which does not depend on manually defined features or linguistic tools, and is extended in two directions to define a more composite representation for questions and answers.
...
...