Building bridges for web query classification

@article{Shen2006BuildingBF,
  title={Building bridges for web query classification},
  author={Dou Shen and Jian-Tao Sun and Qiang Yang and Zheng Chen},
  journal={Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval},
  year={2006}
}
  • Dou Shen, Jian-Tao Sun, +1 author Zheng Chen
  • Published 2006
  • Computer Science
  • Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging… Expand
Query enrichment for web-query classification
TLDR
It is shown that, despite the difficulty of an abundance of ambiguous queries and lack of training data, the query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. Expand
Analysis of varying approaches to topical web query classification
TLDR
This work finds that training classifiers directly from manually classified queries outperforms the best general topical classifier by 48% in relative F1 score, and attributes this to a mismatch in task when applying a general classifier to queries. Expand
PQC: personalized query classification
TLDR
This paper proposes the Personalized Query Classification task and develops an algorithm based on user preference learning as a solution and proposes a collaborative ranking model to leverage similar users' information to tackle the sparseness problem in clickthrough logs. Expand
Enrichment and Reductionism: Two Approaches for Web Query Classification
TLDR
Two complementary approaches for the web query classification task are found to be complementary to each other as the reductionist approach exhibits high precision but low recall, whereas the enrichment method exhibits high recall but low precision. Expand
Leveraging Search Engine Results for Query Classification ∗
Web query classification is significant to search engines for the purpose of efficient retrieval of appropriate results in response to user queries. User queries a e hort in nature, contain noise andExpand
Varying approaches to topical web query classification
TLDR
It is found that training classifiers explicitly from manually classified queries outperforms the bridged classifier by 48% in F1 score and a pre-retrieval classifier using only the query terms performs merely 11% worse than the bridging classifier which requires snippets from retrieved documents. Expand
Classifying search queries using the Web as a source of knowledge
TLDR
Empirical evaluation confirms that the proposed methodology yields a considerably higher classification accuracy than previously reported, which will lead to better matching of online ads to rare queries and overall to a better user experience. Expand
A feature-free search query classification approach using semantic distance
TLDR
This paper analyzes queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity and proposes a feature-free classification approach using semantic distance. Expand
Learning-based web query understanding
TLDR
This thesis focuses on personal name detection in Web queries and proposes three solutions for different scenarios in query topic classification, which reflect the generalization/specification relationship among Web queries. Expand
Search Query Categorization at Scale
TLDR
This paper presents a novel, fast and scalable approach to categorization of search queries based on a limited intermediate corpus: Wikipedia as the knowledge base and achieves results comparable to the state-of-the-art approaches while maintaining high performance and scalability. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
Automatic web query classification using labeled and unlabeled training data
TLDR
This work examines three approaches to topical categorization of general web queries: matching against a list of manually labeled queries, supervised learning of classifiers, and mining of selectional preference rules from large unlabeled query logs, and shows that their combined method accurately classifies 46% of queries. Expand
Classifying search engine queries using the web as background knowledge
TLDR
The architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in is described, which received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005. Expand
Q2C@UST: our winning solution to query classification in KDDCUP 2005
TLDR
This paper describes the ensemble-search based approach, Q2C@UST, for the query classification task for the KDDCUP 2005, and proposes two ensemble classifiers based on two different strategies to tackle the key difficulties. Expand
Query type classification for web document retrieval
TLDR
A user query classification scheme that uses the difference of distribution, mutual information, the usage rate as anchor texts, and the POS information for the classification and could get the best performance when the OKAPI scoring algorithm was used. Expand
Categorizing web queries according to geographical locality
TLDR
This paper defines how to categorize queries according to their (often implicit) geographical locality, and introduces several alternatives for automatically and efficiently categorizing queries in this scheme, using a variety of state-of-the-art machine learning tools. Expand
KDD CUP-2005 report: facing a great challenge
TLDR
The KDD-Cup 2005 competition was to classify 800,000 internet user search queries into 67 predefined categories, but the lack of straightforward training set, subjective user intents of queries, poor information in short queries, and high noise level make the task very challenge. Expand
Query clustering using content words and user feedback
TLDR
This paper describes the attempt to cluster similar queries according to their contents as well as the document click information in the user logs. Expand
Method combination for document filtering
TLDR
It is found that simple averaging strategies do indeed improve performance, but that direet averaging of probability estimates is not the correet approach, and the probabiJit y estimates must be renormalized using logistic regression on the known relevance judgments. Expand
The Ferrety algorithm for the KDD Cup 2005 problem
In this paper, we present a general solution for the KDD Cup 2005 problem. It uses the Internet as source of knowledge and extends it to categorize very short (less than 5 words) documents withExpand
A Comparative Study on Feature Selection in Text Categorization
TLDR
This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive. Expand
...
1
2
...