Learning query intent from regularized click graphs

@inproceedings{Li2008LearningQI,
  title={Learning query intent from regularized click graphs},
  author={Xiao Li and Ye-Yi Wang and A. Acero},
  booktitle={SIGIR '08},
  year={2008}
}
This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. [...] Key Method Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness…Expand
Learning with click graph for query intent classification
TLDR
This work proposes two semisupervised learning methods that exploit user click-through data and finds that with a large amount of training data obtained, a classifier based on simple query term features can outperform those using state-of-the-art, augmented features. Expand
Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts
TLDR
A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneousGraph to represent multiple types of relationships between them. Expand
Improving context-aware query classification via adaptive self-training
TLDR
This work first incorporates search contexts into their framework using a Conditional Random Field (CRF) model, and adapts self-training with the authors' model to exploit the information in unlabeled queries to improve query classification accuracy. Expand
Entropy-biased models for query representation on the click graph
TLDR
This paper investigates and develops a novel entropy-biased framework for modeling click graphs and introduces a new concept, namely the inverse query frequency (IQF), to weigh the importance of a click on a certain URL. Expand
Click-graph modeling for facet attribute estimation of web search queries
TLDR
This work used clickthrough data of a Japanese commercial search engine to evaluate the similarity between a query and a facet category from the patterns of clicks on URLs, and introduced edges and vertices corresponding to the decomposed URL paths into the click graph to capture the click pattern differences at an appropriate granularity level. Expand
Learning Document Labels from Enriched Click Graphs
TLDR
A semi-supervised learning approach to predict a document’s class label by mining the click graph to overcome the sparseness problem of click graph and enrich it by including hyperlinks between the Web documents. Expand
Context-aware query classification
TLDR
This paper incorporates context information into the problem of query classification by using conditional random field (CRF) models and shows that it can improve the F1 score by 52% as compared to other state-of-the-art baselines. Expand
Learning Semantic Categories from Search Clickthrough Logs Using Laplacian Label Propagation
TLDR
This work proposes to use web clickthrough logs to learn semantic categories and explores a weakly-supervised label propagation method using graph Laplacian to alleviate the problem of semantic drift. Expand
Learning Entity Types from Query Logs via Graph-Based Modeling
TLDR
This paper first model query logs using a bipartite graph with entities and their auxiliary information, such as contextual words and clicked URLs, and proposes a graph-based framework called ELP (Ensemble framework based on Lable Propagation) to simultaneously learn the types of both entities and auxiliary signals. Expand
Identifying Web Queries with Question Intent
TLDR
This work presents a supervised classification scheme, random forest over word-clusters for variable length texts, which can model the query structure and substantially improves classification performance in the CQA-intent selection task compared to content-oriented based classification, especially as query length grows. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Improving automatic query classification via semi-supervised learning
TLDR
An application of computational linguistics is used to develop an approach for mining the vast amount of unlabeled data in Web query logs to improve automatic topical Web query classification and it is shown that this approach in combination with manual matching and supervised learning allows us to classify a substantially larger proportion of queries than any single technique. Expand
Building bridges for web query classification
TLDR
A novel approach for QC is presented that outperforms the winning solution of the ACM KDDCUP 2005 competition and introduces category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which the authors classify the queries. Expand
Robust classification of rare queries using web knowledge
We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the queryExpand
Varying approaches to topical web query classification
TLDR
It is found that training classifiers explicitly from manually classified queries outperforms the bridged classifier by 48% in F1 score and a pre-retrieval classifier using only the query terms performs merely 11% worse than the bridging classifier which requires snippets from retrieved documents. Expand
Random walks on the click graph
TLDR
A Markov random walk model is applied to a large click log, producing a probabilistic ranking of documents for a given query, demonstrating its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively. Expand
Functional Faceted Web Query Analysis
TLDR
This work proposes a faceted classification scheme for web queries that consists of four facets of ambiguity, authority sensitivity, temporal sensitivity and spatial sensitivity and hypothesizes that the classification of queries into such facets yields insight on user intent and information needs. Expand
Combining labeled and unlabeled data with co-training
TLDR
A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Expand
Automatic identification of user goals in Web search
TLDR
This paper presents the results from a human subject study that strongly indicate the feasibility of automatic query-goal identification, and proposes two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Expand
Learning from labeled and unlabeled data with label propagation
TLDR
A simple iterative algorithm to propagate labels through the dataset along high density are as d fined by unlabeled data is proposed and its solution is analyzed, and its connection to several other algorithms is analyzed. Expand
IRC: an iterative reinforcement categorization algorithm for interrelated Web objects
TLDR
IRC attempts to classify the interrelated Web objects by iterative reinforcement between individual classification results of different types via the interrelationships by exploiting the full interrelationship between the heterogeneous objects on the Web. Expand
...
1
2
3
...