— The web today is increasingly characterized by social and real-time signals, which we believe represent two frontiers in information retrieval. In this paper, we present Earlybird, the core retrieval engine that powers Twitter's real-time search service. Although Earlybird builds and maintains inverted indexes like nearly all modern retrieval engines, its… (More)
Various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemset-based constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of <i>block</i> constraints that determine the significance of… (More)
In this talk, we will discuss the data pipeline at Twitter that collects, aggregates and processes large volumes of data in real time and also how it fits in the broader data infrastructure ecosystem. We will also discuss challenges we have faced and lessons we have learned while building this infrastructure at Twitter.