Lingpeng Yang

Learn More
This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo(More)
In this paper, we propose a method to improve the precision of top retrieved documents in Chinese information retrieval where the query is a short description by reordering retrieved documents in the initial retrieval. To reorder the documents, we firstly find out terms in query and their importance scales by making use of the information derived from top N(More)
This paper briefly describes our system in the third SIGHAN bakeoff on Chinese word segmentation and named entity recognition. This is done via a word chunking strategy using a context-dependent Mutual Information Independence Model. Evaluation shows that our system performs well on all the word segmentation closed tracks and achieves very good scalability(More)
In this paper, we describe our approach for single language information retrieval (SLIR) on Chinese language of NTCIR4 tasks. Firstly, we automatically extract terms (short-terms and long terms) from document set and use them to build indexes; secondly, for a query, we use short terms in the query and documents to do initial retrieval; thirdly, we build an(More)
For Information Retrieval, users are more concerned about the precision of top ranking documents in most practical situations. In this paper, we propose a method to improve the precision of top N ranking documents by reordering the retrieved documents from the initial retrieval. To reorder documents, we first automatically extract Global Key Terms from(More)
In this paper we propose to use a semi-supervised learning algorithm to deal with word sense disambiguation problem. We evaluated a semi-supervised learning algorithm, local and global consistency algorithm , on widely used benchmark corpus for word sense disambiguation. This algorithm yields encouraging experimental results. It achieves better performance(More)