Lingpeng Yang

Learn More
This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo(More)
For Information Retrieval, users are more concerned about the precision of top ranking documents in most practical situations. In this paper, we propose a method to improve the precision of top N ranking documents by reordering the retrieved documents from the initial retrieval. To reorder documents, we first automatically extract Global Key Terms from(More)
In this paper, we propose a method to improve the precision of top retrieved documents in Chinese information retrieval where the query is a short description by reordering retrieved documents in the initial retrieval. To reorder the documents, we firstly find out terms in query and their importance scales by making use of the information derived from top N(More)
In this paper, we describe our approach for single language information retrieval (SLIR) on Chinese language of NTCIR4 tasks. Firstly, we automatically extract terms (short-terms and long terms) from document set and use them to build indexes; secondly, for a query, we use short terms in the query and documents to do initial retrieval; thirdly, we build an(More)
This paper briefly describes our system in the third SIGHAN bakeoff on Chinese word segmentation and named entity recognition. This is done via a word chunking strategy using a context-dependent Mutual Information Independence Model. Evaluation shows that our system performs well on all the word segmentation closed tracks and achieves very good scalability(More)
In this article we describe our approach to Chinese information retrieval, where a query is a short natural language description. First, we use automatically extracted short terms from document sets to build indexes and use the short terms in both the query and documents to do initial retrieval. Next, we use long terms extracted from the document collection(More)