Learn More
Information retrieval may suggest a document, and information extraction may tell us what it says, but which information sources do we trust and which assertions do we believe when different authors make conflicting claims? Trust algorithms known as fact-finders attempt to answer these questions, but consider only which source makes which claim, ignoring a(More)
—When information sources are unreliable, information networks have been used in data mining literature to uncover facts from large numbers of complex relations between noisy variables. The approach relies on topology analysis of graphs, where nodes represent pieces of (unreliable) information and links represent abstract relations. Such topology analysis(More)
A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) claims we should accept. Current state-of-the-art information credibility algorithms known as(More)
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content from the original HTML document is complicated by the large amount of less informative and typically unrelated material such as navigation menus, forms, user comments, and ads.(More)
This demonstration presents Apollo, a new sensor information processing tool for uncovering likely facts in noisy par-ticipatory sensing data 1. Participatory sensing, where users proactively document and share their observations, has received significant attention in recent years as a paradigm for crowd-sourcing observation tasks. However, it poses(More)
Once information retrieval has located a document, and information extraction has provided its contents, how do we know whether we should actually believe it? Fact-finders are a state-of-the-art class of algorithms that operate in a manner analogous to Kleinberg's Hubs and Authorities, iteratively computing the trustworthiness of an information source as a(More)
We introduce a new probabilistic model for transliteration that performs significantly better than previous approaches, is language-agnostic, requiring no knowledge of the source or target languages, and is capable of both generation (creating the most likely transliteration of a source word) and discovery (selecting the most likely transliteration from a(More)
Wikipedia, the popular online encyclopedia, has in just six years grown from an adjunct to the now-defunct Nupedia to over 31 million pages and 429 million revisions in 256 languages and spawned sister projects such as Wiktionary and Wikisource. Available under the GNU Free Documentation License, it is an extraordinarily large corpus with broad scope and(More)
  • 1