Corpus ID: 17420900

Apache Lucene 4

@inproceedings{Bialecki2012ApacheL4,
  title={Apache Lucene 4},
  author={A. Bialecki and R. Muir and Grant Ingersoll},
  booktitle={OSIR@SIGIR},
  year={2012}
}
Apache Lucene is a modern, open source search library designed to provide both relevant results as well as high performance. Furthermore, Lucene has undergone significant change over the years, starting as a one-person project to one of the leading search solutions available. Lucene is used in a vast range of applications from mobile devices and desktops through Internet scale solutions. The evolution of Lucene has been quite dramatic at times, none more so than in the current release of Lucene… Expand
On Using Non-Volatile Memory in Apache Lucene
TLDR
This preliminary article presents the first reported work on the impact of using NVDIMM on the performance of committing, searching, and near-real time searching in Apache Lucene and suggests that bigger impact requires redesigning Lucene to access NVM as byte-addressable memory using loads and stores, instead of accessing NVM via the file system. Expand
Model-Driven Query Generation for Elasticsearch
TLDR
A Domain-specific Modeling Language (DSML) is introduced, called Dimension Query Language (DQL), to support the model-driven development of Elasticsearch queries, and it is shown that the use of the language significantly decreases the development time required for creating Elastic search queries. Expand
Blind Queries Applied to JSON Document Stores
TLDR
The aim of this paper is to provide analysts with a tool that allows for blind querying collections of JSON documents within a NoSQL document database, by evolving the Hammer framework into the HammerJDB framework, which is able to work on MongoDB databases. Expand
Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge
The Open-Source IR Reproducibility Challenge brought together developers of open-source search engines to provide reproducible baselines of their systems in a common environment on Amazon EC2. TheExpand
A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets
  • Ahmet Alkılınç, A. Arslan
  • Computer Science
  • 2018 International Conference on Artificial Intelligence and Data Processing (IDAP)
  • 2018
TLDR
This work is carried out to analyze and evaluate the retrieval effectiveness of recently developed term-weighting models (after the 2000s) using the earlier datasets (dating back as far as the 1980s), and observes that the DFIC model is in general more effective than the other models. Expand
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
TLDR
This work has developed, and open-sourced the MIaS MIR system, a system based on the full-text search engine Apache Lucene that is both efficient, and effective, as evidenced by the NTCIR-11 Math-2 task. Expand
In-RDBMS inverted indexes revisited
TLDR
This work finds that a specialized IR engine integrated into the RDBMS can provide more than an order of magnitude speedup over both the row- and column-oriented relational query engines for conjunctive and phrase queries, and shows that relational inverted indexes can provide performance comparable to a specialized in-RDBMS IR engine with no change to the underlying storage format. Expand
Recomendação de conhecimento da multidão para auxílio aodesenvolvimento de software
TLDR
A recommendation system in the form of a plugin for the Eclipse IDE that has as main objectives to Recommend Q&A pairs of How-to-do category to assist developers in the API Learning Process and Recommend crowd bugs to help developers during debugging tasks. Expand
Touchless and always-on cloud analytics as a service
TLDR
Near field monitoring (NFM), a cloud-native framework for monitoring cloud systems and providing operational analytics services, is presented, allowing instantaneous monitoring as soon as a guest system becomes hosted on the cloud, without any setup prerequisites or enforced cooperation. Expand
Evaluating Web-as-corpus Topical Document Retrieval with an Index of the OpenDirectory
TLDR
A first fully automatic evaluation is described and provides baseline performances for this task and practical information regarding the availability of the index and resource files is provided. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
Lucene in Action
TLDR
Lucene in Action describes what Lucene is and how it works and most importantly how it can be used in a variety of real-world use cases, such at Nutch, an open-source project designed to index the internet very much like Google. Expand
Earlybird: Real-Time Search at Twitter
TLDR
This paper presents Early bird, the core retrieval engine that powers Twitter's real-time search service, and describes its index structures, which differ from those built to support traditional web search. Expand
Lucene and Juru at TREC 2007: 1-Million Queries Track
TLDR
Modification of Lucene scoring can be modified to improve its measured search quality for TREC, and the modifications involved are described - namely normalizing term frequencies, different choice of document length normalization, phrase expansion and proximity scoring. Expand
Variations autour de tf idf et du moteur Lucene
This paper evaluates and compares the retrieval effectiveness resulting from various models derived from the classical tf idf paradigm when searching into a test-collection written in the FrenchExpand
Fast ranking in limited space
  • A. Moffat, J. Zobel
  • Computer Science
  • Proceedings of 1994 IEEE 10th International Conference on Data Engineering
  • 1994
TLDR
The methods described in the paper have been used to build a retrieval system with which it is possible to process ranked queries of 40 terms in about 5% of the space required by previous implementations; in as little as 25%" of the time; and without measurable degradation in retrieval effectiveness. Expand
Has adhoc retrieval improved since 1994?
TLDR
There appears to have been no overall improvement in effectiveness for either median or top-end TREC submissions, even after allowing for several possible confounds, and it is questioned whether the effectiveness of adhoc information retrieval has improved over the past decade and a half. Expand
Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene
TLDR
This paper achieves comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene, which focuses on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. Expand
Okapi at TREC-3
During the course of TREC{1 the low-level search functions were split o into a separate Basic Search System (BSS) [2], but retrieval and ranking of documents was still done using the \classical"Expand
An object-oriented architecture for text retrieval
TLDR
A software implementation architecture for text retrieval systems that facilitates functional modularization, a mix-and-match combination of module implementations and a deenition of inter-module protocols is presented. Expand
Okapi at TREC
TLDR
Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases, and expansion using terms from the top documents retrieved by a pilot search on topic terms was used. Expand
...
1
2
3
4
5
...