• Corpus ID: 239050001

Tagged Documents Co-Clustering

@article{Candel2021TaggedDC,
  title={Tagged Documents Co-Clustering},
  author={Ga{\"e}lle Candel and David Naccache},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.11079}
}
Tags are short sequences of words allowing to describe textual and non-texual resources such as as music, image or book. Tags could be used by machine information retrieval systems to access quickly a document. These tags can be used to build recommender systems to suggest similar items to a user. However, the number of tags per document is limited, and often distributed according to a Zipf’s law. In this paper, we propose a methodology to cluster tags into conceptual groups. Data are… 

Figures from this paper

References

SHOWING 1-10 OF 25 REFERENCES
Improving Tag-Clouds as Visual Information Retrieval Interfaces
TLDR
This paper presents a novel approach to Tag-Cloud’s tags selection, and proposes the use of clustering algorithms for visual layout, with the aim of improve browsing experience and reduce the semantic density of tag set.
"Power tags" in information retrieval
TLDR
A sketch of an algorithm for mining and processing pow... to cut off all tags in the long tail of a document‐specific tag distribution and form a new, additional search option in information retrieval systems.
Keyword clustering for automatic categorization
TLDR
In this paper, keyword clustering is studied for automatic categorization, a validity index for determining the number of clusters is proposed and the result in experiments indicates the index is effective.
Tag Clusters as Information Retrieval Interfaces
TLDR
It is found out that tag clusters are perceived as more useful than tag clouds, are much more trustworthy, and are more enjoyable to use.
Co-clustering documents and words using bipartite spectral graph partitioning
  • I. Dhillon
  • Computer Science, Mathematics
    KDD '01
  • 2001
TLDR
A new spectral co-clustering algorithm is used that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings and it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitionsing problem.
Survey on social tagging techniques
TLDR
Different techniques employed to study various aspects of tagging are summarized, including properties of tag streams, tagging models, tag semantics, generating recommendations using tags, visualizations of tags, applications of tags and problems associated with tagging usage.
Information-theoretic co-clustering
TLDR
This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.
Automated Tag Clustering: Improving search and exploration in the tag space
TLDR
It is shown that clustering techniques can improve the user experience of current tagging services and thus the success of collaborative tagging services.
COMPARISON OF HIERARCHICAL AGGLOMERATIVE ALGORITHMS FOR CLUSTERING MEDICAL DOCUMENTS
TLDR
The experimental results showed that the agglomerative algorithm that uses I1 as its criterion function for choosing which clusters to merge produced better clusters quality than the other criterion functions in term of entropy and purity as external measures.
ArnetMiner: extraction and mining of academic social networks
TLDR
The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.
...
1
2
3
...