Yannis Papakonstantinou

Learn More
Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no(More)
The ObjectRank system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authority with respect to the particular keywords. One can adjust(More)
Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact(More)
The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data This paper gives an overview of the project describ ing components that extract properties from unstructured objects that translate information into a common object model that(More)
We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem di ers from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We de ne an object-based information exchange model and a corresponding query(More)
TSIMMIS—The Stanford-IBM Manager of Multiple InformationSources—is a system for integrating information. It offers a datamodel and a common query language that are designed to support thecombining of information from many different sources. It also offerstools for generating automatically the components that are needed tobuild systems for integrating(More)
We study the inference of Data Type Definitions (DTDs) for views of XML data, using an abstraction that focuses on document content structure. The views are defined by a query language that produces a list of documents selected from one or more input sources. The selection conditions involve vertical and horizontal navigation, thus querying explicitly the(More)
Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and furthermore it may not be easy to model the source's performance. At the same time, querying remote sources may be(More)