Jingtian Jiang

Learn More
In this paper, we present FoCUS (Forum Crawler Under Supervision), a supervised web-scale forum crawler. The goal of FoCUS is to only trawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Although forums have different layouts or styles and are powered by different(More)
This paper shows our work on CLEF 2008. Our group joined the Visual Concept Detection Task of ImageCLEF 2008 this year. We submitted one run (run id: HJ_FA) for the evaluation. In the run, we applied a method called “Feature Annotation” to detect visual concept for the predefined concepts and we want to know how this information help in solving the(More)
In this paper, we address the problem of author extraction (AE) from user generated content (UGC) pages. Most existing solutions for web information extraction, including AE, adopt supervised approaches, which require expensive manual annotation. We propose a novel unsupervised approach for automatically collecting and labeling training data based on two(More)
Web forums have become an important data resource for research as there is much user generated content (UGC) every day. Thus efficient web forum crawling is a crucial problem. Previous works all focus on crawling all the forum threads with minimal overhead. They treat all threads equally and adopt a breadth-first strategy. Some strategies such as PageRank(More)
Information Retrieving is task of recuperating information with high relevance, precision and recall. Basic methods for information retrieval include Boolean Retrieval, Fuzzy retrieval, Vector Space model. Searching depends on matching keywords between user-query and document. Ontology can be used in information retrieval. In software engineering and(More)
  • 1