PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks

@article{Sun2011PathSimMP,
  title={PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks},
  author={Yizhou Sun and Jiawei Han and Xifeng Yan and Philip S. Yu and Tianyi Wu},
  journal={Proc. VLDB Endow.},
  year={2011},
  volume={4},
  pages={992-1003}
}
Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different… 

W-PathSim: Novel Approach of Weighted Similarity Measure in Content-Based Heterogeneous Information Networks by Applying LDA Topic Modeling

W - PathSim model is presented, which applies the L atent D irichlet A llocation (LDA) topic modeling for generating the weighting attribute for the object’s links in similarity scoring between two objects.

Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks

The distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity is proposed and the state-of-the-art similarity performance on two text-based HINs is shown.

AsymSim: meta path-based similarity with asymmetric relations

This paper presents an efficient meta path-based peer similarity measure, AsymSim, which both captures the semantics of peer similarity and remains sensitive to asymmetric relations in the network, allowing us to extract deeper peer semantics.

Constrained-meta-path-based ranking in heterogeneous information network

This paper proposes a constrained meta path to subtly capture the refined semantics through confining constraints on objects and studies the ranking problem in heterogeneous networks and proposes the HRank method to evaluate the importance of multiple types of objects and meta paths.

PathSimExt: Revisiting PathSim in Heterogeneous Information Networks

The definition of PathSim is revisited by introducing external support to enrich the result ofPathSim, the first work to address the problem which captures the similarity of two objects based on their connectivity along a semantic path.

HowSim: A General and Effective Similarity Measure on Heterogeneous Information Networks

SimRank is extended, a well-known similarity measure for homogeneous graphs, to HINs, by introducing the concept of decay graph, and the newly proposed relevance measure is called HowSim, which has the property of being meta-path free, and capturing the structural and semantic similarity simultaneously.

HighSim : Highly Effective Similarity Measurement in Large Heterogeneous Information Networks

A novel HighSim algorithm is developed, which integrates the PathSim algorithm and the basic methodology in LINE algorithm, to leverage the similarity ranking by considering both the research topics and the venues of published papers of different authors.

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

This paper proposes to use an approximate personalized PageRank algorithm to find useful subgraphs to allocate the meta-paths, and develops a new similarity measure called KnowSim which is an ensemble of selected meta- PathSim, which results in impressive high-quality document clustering and classification performance.

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

This paper proposes to use an approximate personalized PageRank algorithm to find useful subgraphs to allocate the meta-paths, and develops a new similarity measure called KnowSim which is an ensemble of selected meta- PathSim, which results in impressive high-quality document clustering and classification performance.

DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network*

These studies are mainly focused on improving the topic-driven weighted similarity measurement between same-typed objects in HIN, based on the meta-path-based mechanism, called W- PathSim, by combining the proposed W-PathSim model with the approach of distributed computing of ‘graph-frames’ on Spark, called DW- pathSim.
...

References

SHOWING 1-10 OF 22 REFERENCES

RankClus: integrating clustering with ranking for heterogeneous information network analysis

This paper addresses the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed information network, and proposes a novel clustering framework called RankClus that directly generates clusters integrated with ranking.

ObjectRank: Authority-Based Keyword Search in Databases

Object-level ranking: bringing order to Web objects

The experimental results show that PopRank can achieve significantly better ranking results than naively applying PageRank on the object graph, and the proposed efficient approaches to automatically decide these factors are proposed.

SCAN: a structural clustering algorithm for networks

A novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks and clusters vertices based on a structural similarity measure is proposed.

Scaling personalized web search

The approach enables incremental computation, so that the construction of personalized views from partial vectors is practical at query time, and experimental results demonstrate the effectiveness and scalability of the techniques.

Fast algorithms for topk personalized pagerank queries

This work proposes a framework to answer top-k graph conductance queries, and extends the system to handle hard predicates, leading to a 4X speedup and overall, the system executes queries 200-1600X faster than whole-graph PageRank.

Top-k Set Similarity Joins

An algorithm, topk-join, is proposed to answer top-k similarity join efficiently, based on the prefix filtering principle and employs tight upper bounding of similarity values of unseen pairs.

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases

iTopicModel: Information Network-Integrated Topic Modeling

A novel topic modeling framework is proposed, which builds a unified generative topic model that is able to consider both text and structure information for documents, and a graphical model is proposed to describe the generative model.

Fast Random Walk with Restart and Its Applications

The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block- wise, community-like structure and exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion.