#### Filter Results:

- Full text PDF available (76)

#### Publication Year

1997

2017

- This year (7)
- Last 5 years (34)
- Last 10 years (64)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Daniel Lemire, Anna Maclachlan
- SDM
- 2005

Rating-based collaborative filtering is the process of predicting how a user would rate a given item from other user ratings. We propose three related slope one schemes with predictors of the form f (x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both. Slope one algorithms are easy to… (More)

- Daniel Lemire, Leonid Boytsov
- Softw., Pract. Exper.
- 2015

In many important applications—such as search engines and relational database systems—data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers… (More)

- Daniel Lemire
- Pattern Recognition
- 2009

The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a… (More)

- Owen Kaser, Daniel Lemire
- ArXiv
- 2007

Tag clouds provide an aggregate of tag-usage statistics. They are typically sent as in-line HTML to browsers. However, display mechanisms suited for ordinary text are not ideal for tags, because font sizes may vary widely on a line. As well, the typical layout does not account for relationships that may be known between tags. This paper presents models and… (More)

- Daniel Lemire, Owen Kaser, Kamel Aouiche
- Data Knowl. Eng.
- 2010

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index… (More)

- M. Anderson, Marcel Ball, +4 authors Sean McGrath
- 2003

In this paper 1 we give an overview of the RACOFI (RuleApplying Collaborative Filtering) multidimensional rating system and its related technologies. This will be exemplified with RACOFI Music, an implemented collaboration agent that assists on-line users in the rating and recommendation of audio (Learning) Objects. It lets users rate contemporary Canadian… (More)

- Daniel Lemire
- SDM
- 2007

Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate.… (More)

- Xiao-Dan Zhu, Peter D. Turney, Daniel Lemire, André Vellino
- JASIST
- 2015

The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For… (More)

- Daniel Lemire
- Nord. J. Comput.
- 2006

The running maximum-minimum (-) filter computes the maxima and minima over running windows of size w. This filter has numerous applications in signal processing and time series analysis. We present an easy-to-implement online algorithm requiring no more than 3 comparisons per element, in the worst case. Comparatively, no algorithm is known to compute… (More)

- Daniel Lemire, Leonid Boytsov, Nathan Kurz
- Softw., Pract. Exper.
- 2016

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded 32-bit integer while still providing state-of-the-art… (More)