Learn More
Classic news summarization plays an important role with the exponential document growth on the Web. Many approaches are proposed to generate summaries but seldom simultaneously consider evolutionary characteristics of news plus to traditional summary elements. Therefore, we present a novel framework for the web mining problem named Evolutionary Timeline(More)
Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propose two page value metrics, AVRank and HVRank. These two "values" of all the web pages can be well assessed by using the bidirectional links' information. Moreover, with the help of(More)
To bridge the increasing processor-disk performance gap, buffer caches are used in both storage clients (e.g. database systems) and storage servers to reduce the number of slow disk accesses. These buffer caches need to be managed effectively to deliver the performance commensurate to the aggregate buffer cache size. To address this problem, two paradigms(More)
We investigate an important and challenging problem in summary generation, i.e., Evolutionary Trans-Temporal Summarization (ETTS), which generates news timelines from massive data on the Internet. ETTS greatly facilitates fast news browsing and knowledge comprehension, and hence is a necessity. Given the collection of time-stamped web documents related to(More)
We investigate the novel problem of event recognition from news webpages. " Events " are basic text units containing news elements. We observe that a news article is always constituted by more than one event, namely Latent Ingredients (LIs) which form the whole document. Event recognition aims to mine these Latent Ingredients out. Researchers have tackled(More)
Search engines are playing a more and more important role in discovering information on the web nowadays. Spam web pages, however, are employing various tricks to bamboozle search engines, therefore achieving undeserved ranks. In this paper, we propose a new page importance metric, which takes both the content quality and the link quality into(More)
In the Web 2.0 eras, the individual Internet users can also act as information providers, releasing information or making comments conveniently. However, some participants may spread irresponsible remarks or express irrelevant comments for commercial interests. This kind of so-called comment spam severely hurts the information quality. This paper tries to(More)
Biterm Topic Model (BTM) is designed to model the generative process of the word co-occurrence patterns in short texts such as tweets. However, two aspects of BTM may restrict its performance: 1) user individualities are ignored to obtain the corpus level words co-occurrence patterns; and 2) the strong assumptions that two co-occurring words will be(More)
Usually scientists breed research ideas inspired by previous publications, but they are unlikely to follow all publications in the unbounded literature collection. The volume of literature keeps on expanding extremely fast, whilst not all papers contribute equal impact to the academic society. Being aware of potentially influential literature would put one(More)
Inhalational general anesthesia is widely used in clinical practice, but there have been plenty of evidence shows that application of inhalational anesthetics may increase the risk of Alzheimer's disease (AD). Halothane is a common inhalational anesthetics. We cultured rat primary neurons as study model to investigate cytotoxicity of halothane in neurons.(More)