Share This Author
Probabilistic topic models with biased propagation on heterogeneous information networks
- Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, C. Lin
- Computer ScienceKnowledge Discovery and Data Mining
- 21 August 2011
This paper proposes a novel topic model with biased propagation (TMBP) algorithm to directly incorporate heterogeneous information network with topic modeling in a unified way and extensively evaluates the proposed approach and compares to the state-of-the-art techniques on several datasets.
Mining Graph Patterns Efficiently via Randomized Summaries
- C. Chen, C. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han
- Computer ScienceProceedings of the VLDB Endowment
- 1 August 2009
This work proposes a new framework, called Summarize-Mine, which can find interesting malware fingerprints that were not revealed previously by generating randomized summaries and repeating the process for multiple rounds, and provides strict probabilistic guarantees on pattern loss likelihood.
Text Cube: Computing IR Measures for Multidimensional Text Database Analysis
- C. Lin, Bolin Ding, Jiawei Han, Feida Zhu, Bo Zhao
- Computer ScienceEighth IEEE International Conference on Data…
- 15 December 2008
This paper proposes a text-cube model on multidimensional text database and conducts systematic studies on efficient text-Cube implementation, OLAP execution and query processing and shows the high promise of the methods.
PET: a statistical model for popular events tracking in social communities
This paper formally defines the problem of popular event tracking in online communities (PET) and proposes a novel statistical method that models the the popularity of events over time, taking into consideration the burstiness of user interest, information diffusion on the network structure, and the evolution of textual topics.
Efficient Keyword-Based Search for Top-K Cells in Text Cube
- Bolin Ding, Bo Zhao, N. Oza
- Computer Science, EconomicsIEEE Transactions on Knowledge and Data…
- 1 December 2011
This paper defines a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube, and proposes four approaches to solve the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube): inverted-index one- scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering.
SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks
This work proposes SocialSpamGuard, a scalable and online social media spam detection system based on data mining for social network security, which employs the GAD clustering algorithm for large scale clustering and integrates it with the designed active learning algorithm to deal with the scalability and real-time detection challenges.
The Joint Inference of Topic Diffusion and Evolution in Social Communities
- C. Lin, Q. Mei, Jiawei Han, Yunliang Jiang, Marina Danilevsky
- Computer ScienceIEEE 11th International Conference on Data Mining
- 11 December 2011
A novel and principled probabilistic model is proposed which casts this task as an joint inference problem, which considers textual documents, social influences, and topic evolution in a unified way and performs significantly better than existing methods.
Visual cube and on-line analytical processing of images
- Xin Jin, Jiawei Han, Liangliang Cao, Jiebo Luo, Bolin Ding, C. Lin
- Computer ScienceCIKM
- 26 October 2010
Visual Cube and multi-dimensional OLAP of image collections, such as web images indexed in search engines, product images and photos shared on social networks, are proposed and efficient algorithms are developed to construct Visual Cube.
TopCells: Keyword-based search of top-k aggregated documents in text cube
- Bolin Ding, Bo Zhao, C. Lin, Jiawei Han, ChengXiang Zhai
- Computer Science, EconomicsIEEE 26th International Conference on Data…
- 1 March 2010
This paper aims to support keyword search in a data cube with text-rich dimension(s) (so-called text cube) by proposing a relevance scoring model and efficient ranking algorithms.
Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures
This study extracts a similarity matrix among pages via in-page and crosspage link structures, based on which a density-based clustering algorithm is developed, which hierarchically groups densely linked webpages into semantic clusters.