Heinz-Peter Lang

Learn More
This paper presents the Media Watch on Climate Change, a public Web portal that captures and aggregates large archives of digital content from multiple stakeholder groups. Each week it assesses the domain-specific relevance of millions of documents and user comments from news media, blogs, Web 2.0 platforms such as Facebook, Twitter and YouTube, the Web(More)
Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data(More)
Web pages not only contain main content, but also other elements such as navigation panels, advertisements and links to related documents. Furthermore, overview pages (summarization pages and entry points) duplicate and aggregate parts of articles and thereby create redundancies. The noise elements in Web pages as well as overview pages affect the(More)
The <i>webLyzard</i> media monitoring and Web intelligence platform (www.webLyzard.com) presented in this paper is a generic tool for assessing the strategic positioning of an organization and the effectiveness of its communication strategies. The platform captures and aggregates large archives of digital content from multiple stakeholder groups. Each week(More)
  • 1