Learn More
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content from the original HTML document is complicated by the large amount of less informative and typically unrelated material such as navigation menus, forms, user comments, and ads.(More)
A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) claims we should accept. Current state-of-the-art information credibility algorithms known as(More)
Information retrieval may suggest a document, and information extraction may tell us what it says, but which information sources do we trust and which assertions do we believe when different authors make conflicting claims? Trust algorithms known as fact-finders attempt to answer these questions, but consider only which source makes which claim, ignoring a(More)
When information sources are unreliable, information networks have been used in data mining literature to uncover facts from large numbers of complex relations between noisy variables. The approach relies on topology analysis of graphs, where nodes represent pieces of (unreliable) information and links represent abstract relations. Such topology analysis(More)
Once information retrieval has located a document, and information extraction has provided its contents, how do we know whether we should actually believe it? Fact-finders are a state-of-the-art class of algorithms that operate in a manner analogous to Kleinberg's Hubs and Authorities, iteratively computing the trustworthiness of an information source as a(More)
This demonstration presents Apollo, a new sensor information processing tool for uncovering likely facts in noisy participatory sensing data<sup>1</sup>. Participatory sensing, where users proactively document and share their observations, has received significant attention in recent years as a paradigm for crowd-sourcing observation tasks. However, it(More)
We introduce a new probabilistic model for transliteration that performs significantly better than previous approaches, is language-agnostic, requiring no knowledge of the source or target languages, and is capable of both generation (creating the most likely transliteration of a source word) and discovery (selecting the most likely transliteration from a(More)
We begin by giving a comprehensive literature review that ties together many fields which have heretofore remained separate. We comment on the approaches from each field and show which algorithms are similar and which are different. Then, starting from a concrete task, we extend traditional trustworthiness algorithms to deal with the more complex situation(More)