WTF: the who to follow service at Twitter
- Pankaj Gupta, Ashish Goel, Jimmy J. Lin, Aneesh Sharma, Dong Wang, R. Zadeh
- Computer ScienceThe Web Conference
- 13 May 2013
An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
End-to-End Open-Domain Question Answering with BERTserini
- Wei Yang, Yuqing Xie, Jimmy J. Lin
- Computer ScienceNorth American Chapter of the Association for…
- 1 February 2019
An end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit is demonstrated, showing that fine-tuning pretrained Bert with SQuAD is sufficient to achieve high accuracy in identifying answer spans.
Anserini: Enabling the Use of Lucene for Information Retrieval Research
- Peilin Yang, H. Fang, Jimmy J. Lin
- Computer ScienceAnnual International ACM SIGIR Conference on…
- 7 August 2017
Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks, and aims to provide the best of both worlds to better align information retrieval practice and research.
Deep Residual Learning for Small-Footprint Keyword Spotting
- Raphael Tang, Jimmy J. Lin
- Computer Science, EconomicsIEEE International Conference on Acoustics…
- 28 October 2017
This work explores the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as a benchmark and establishes an open-source state-of-the-art reference to support the development of future speech-based interfaces.
Overview of the TREC 2011 Microblog Track
Document Expansion by Query Prediction
A simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents is proposed.
Data-Intensive Text Processing with MapReduce
This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model , using the open-source Hadoop implementation. The focus will be on…
Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
- Hua He, Kevin Gimpel, Jimmy J. Lin
- Computer ScienceConference on Empirical Methods in Natural…
- 1 September 2015
This work proposes a model for comparing sentences that uses a multiplicity of perspectives, first model each sentence using a convolutional neural network that extracts features at multiple levels of granularity and uses multiple types of pooling.
Earlybird: Real-Time Search at Twitter
- Michael Busch, Krishna Gade, B. Larson, Patrick Lok, Samuel B. Luckenbill, Jimmy J. Lin
- Computer ScienceIEEE International Conference on Data Engineering
- 1 April 2012
This paper presents Early bird, the core retrieval engine that powers Twitter's real-time search service, and describes its index structures, which differ from those built to support traditional web search.
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy J. Lin
- Computer ScienceArXiv
- 28 March 2019
This paper proposes to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks, and achieves comparable results with ELMo.