Cross-Lingual Document Retrieval with Smooth Learning
@article{Liu2020CrossLingualDR, title={Cross-Lingual Document Retrieval with Smooth Learning}, author={Jiapeng Liu and Xiao Zhang and Dan Goldwasser and Xiao Wang}, journal={ArXiv}, year={2020}, volume={abs/2011.00701} }
Cross-lingual document search is an information retrieval task in which the queries’ language and the documents’ language are different. In this paper, we study the instability of neural document search models and propose a novel end-to-end robust framework that achieves improved performance in cross-lingual search with different documents’ languages. This framework includes a novel measure of the relevance, smooth cosine similarity, between queries and documents, and a novel loss function…
5 Citations
Aspect term extraction and optimized deep fuzzy clustering-based inverted indexing for document retrieval
- Computer ScienceIntell. Decis. Technol.
- 2022
This paper develops a novel approach, named Exponential Aquila Optimizer (EAO)-based Deep Fuzzy Clustering for retrieving the documents that effectively finds the relevant documents and tries to understand the relationship among the documents and queries in terms of the significance of documents for query optimization.
The Geometry of Multilingual Language Models: An Equality Lens
- Linguistics, Computer ScienceArXiv
- 2023
This study analyzes the geometry of three multilingual language models in Euclidean space and finds that all languages are represented by unique geometries, and introduces a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space.
Topological Data Analysis of Database Representations for Information Retrieval
- Computer ScienceArXiv
- 2021
This work compute persistent homology on a variety of datasets and shows that some commonly used embeddings fail to preserve the connectivity and introduces the dilation-invariant bottleneck distance to capture this effect.
Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures
- Computer Science
- 2021
This work shows that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect, and provides an algorithm for their computation that exhibits greatly reduced time complexity over existing methods.
Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval
- Computer ScienceKnowledge-Based Systems
- 2022
24 References
Cross-Lingual Learning-to-Rank with Shared Representations
- Computer ScienceNAACL
- 2018
A large-scale dataset derived from Wikipedia is introduced to support CLIR research in 25 languages and a simple yet effective neural learning-to-rank model is presented that shares representations across languages and reduces the data requirement.
Translation techniques in cross-language information retrieval
- Computer ScienceCSUR
- 2012
Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation, with a special emphasis on recent developments.
Cross language information retrieval
- Computer ScienceAMTA
- 1998
This work focuses on the development of a model for automatic Cross-Language Information Retrieval using Latent Semantic Indexing and its application to Machine Translation Technology.
Learning deep structured semantic models for web search using clickthrough data
- Computer ScienceCIKM
- 2013
A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed.
Indexing by Latent Semantic Analysis
- Computer ScienceJ. Am. Soc. Inf. Sci.
- 1990
A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval
- Computer ScienceCIKM
- 2014
A new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents is proposed.
GloVe: Global Vectors for Word Representation
- Computer ScienceEMNLP
- 2014
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Query-level stability and generalization in learning to rank
- Computer ScienceICML '08
- 2008
The proposed theory of generalization ability of learning to rank algorithms for information retrieval (IR) is applied to the existing algorithms of Ranking SVM and IRSVM, and a number of new concepts are defined, including query-level loss, query- level risk, andquery-level stability are defined.
Distributed Representations of Words and Phrases and their Compositionality
- Computer ScienceNIPS
- 2013
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Generalization error bounds for learning to rank: Does the length of document lists matter?
- Computer ScienceICML
- 2015
There is no degradation in generalization ability for several loss functions, including the cross-entropy loss used in the well known ListNet method, and novel generalization error bounds under l1 regularization and faster convergence rates if the loss function is smooth are provided.