• Publications
  • Influence
Keyword searching and browsing in databases using BANKS
BANKS is described, a system which enables keyword-based search on relational databases, together with data and schema browsing, and presents an efficient heuristic algorithm for finding and ranking query results. Expand
Bidirectional Expansion For Keyword Search on Graph Databases
This paper proposes a new search algorithm, Bidirectional Search, which improves on Backward Expanding search by allowing forward search from potential roots towards leaves, and devise a novel search frontier prioritization technique based on spreading activation. Expand
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
A new hypertext resource discovery system called a Focused Crawler that is robust against large perturbations in the starting set of URLs, and capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius. Expand
Enhanced hypertext categorization using hyperlinks
This work has developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained and its technique also adapts gracefully to the fraction of neighboring documents having known topics. Expand
Collective annotation of Wikipedia entities in web text
This work gives formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities, and investigates practical solutions based on local hill-climbing, rounding integer linear programs, and pre-clustering entities followed by local optimization within clusters. Expand
Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text
An evaluation of ARC suggests that the resources found by ARC frequently fare almost as well as, and sometimes better than, lists of resources that are manually compiled or classified into a topic. Expand
Annotating and searching web tables using entities, types and relationships
This paper proposes new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express, and a new graphical model for making all these labeling decisions for each table simultaneously. Expand
Generalizing Across Domains via Cross-Gradient Training
Empirical evaluation on three different applications establishes that (1) domain-guided perturbation provides consistently better generalization to unseen domains, compared to generic instance perturbations methods, and that (2) data augmentation is a more stable and accurate method than domain adversarial training. Expand
Dynamic personalized pagerank in entity-relation graphs
HubRank is presented, a new system for fast, dynamic, space-efficient proximity searches in ER graphs, and experiments with CiteSeer's ER graph and millions of real Cite Seer queries are reported on. Expand
BANKS: Browsing and Keyword Searching in Relational Databases
Publisher Summary Browsing ANd Keyword Searching (BANKS) enables almost effortless Web publishing of relational and eXtensible Markup Language (XML) data that would otherwise remain (at leastExpand