Learn More
XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language , suitable for a naive user. It returns semantically related document fragments that satisfy the user's query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced indexing techniques(More)
Given two geographic databases, a fusion algorithm should produce all pairs of corresponding objects (i.e., objects that represent the same real-world entity). Four fusion algorithms , which only use locations of objects, are described and their performance is measured in terms of recall and precision. These algorithms are designed to work even when(More)
A framework for describing semantic relationships among nodes in XML documents is presented. In contrast to earlier work, the XML documents may have ID references (i.e., they correspond to graphs and not just trees). A specific <i>interconnection semantics</i> in this framework can be defined explicitly or derived automatically. The main advantage of(More)
Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers. We do not present, however, a concrete query(More)
In a <i>geographical route search</i>, given search terms, the goal is to find an <i>effective</i> route that <i>(1)</i> starts at a given location, <i>(2)</i> ends at a given location, and <i>(3)</i> travels via geographical entities that are relevant to the given terms. A route is effective if it does not exceed a given distance limit whereas the(More)
An uncertain geo-spatial dataset is a collection of geo-spatial objects that do not represent accurately real-world entities. Each object has a confidence value indicating how likely it is for the object to be correct. Uncertain data can be the result of operations such as imprecise integration, incorrect update or inexact querying. A k-route, over an(More)
Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data-driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of n-grams that appear in the text. We explore the trade-off between accuracy and(More)
Microblogs allow users to publish geo-tagged posts---short textual messages assigned to a geographic location. Users send posts from places they visit and discuss an idiosyncratic mixture of personal and general topics. Thus, it is reasonable to assume that the locations and the textual content of posts will be unique and will identify the posting user, to(More)