LINE: Large-scale Information Network Embedding

@article{Tang2015LINELI,
  title={LINE: Large-scale Information Network Embedding},
  author={Jian Tang and Meng Qu and Mingzhe Wang and Ming Zhang and Jun Yan and Qiaozhu Mei},
  journal={Proceedings of the 24th International Conference on World Wide Web},
  year={2015}
}
  • Jian Tang, Meng Qu, Q. Mei
  • Published 11 March 2015
  • Computer Science
  • Proceedings of the 24th International Conference on World Wide Web
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. [] Key Method The method optimizes a carefully designed objective function that preserves both the local and global network structures.

Figures and Tables from this paper

An Empirical Study of Locally Updated Large-scale Information Network Embedding (LINE)

TLDR
The novel network embedding method called the ''LINE'' is studied, which optimizes a carefully designed objective function that preserves both the local and global network structures and demonstrates the embeddings on several multi-label network classification tasks for social networks such as BlogCatalog and YouTube.

Evaluating Node Embeddings of Complex Networks

TLDR
A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the 'divergence score' to embeddings with a goal of distinguishing good from bad ones.

Edge2vec

TLDR
Considering an important property of social networks, i.e., the network is sparse, and hence the average degree of nodes is bounded, an edge-based graph embedding (edge2vec) method is proposed to map the edges in social networks directly to low-dimensional vectors to preserve structure information of embedded edges as much as possible.

A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks

TLDR
A general embedding framework named Heterogeneous Information Learning in Large-scale networks (HILL), which enables the simultaneous node proximity assessing process to be done in a distributed manner by decomposing the complex modeling and optimization into many simple and independent sub-problems.

Degree-biased random walk for large-scale network embedding

Edge2vec: Edge-based Social Network Embedding

TLDR
This article proposes an edge-based graph embedding (edge2vec) method to map the edges in social networks directly to low-dimensional vectors and shows the experimental results on different datasets show edge2vec benefits from the direct mapping in preserving the structure information of edges.

Preserving Local and Global Information for Network Embedding

TLDR
An approach to capture global information and a network embedding framework LOG, which can coherently model {\bf LO}cal and {\bf G}lobal information is introduced, which demonstrates the ability to preserve global information of the proposed framework.

Modeling Large-Scale Dynamic Social Networks via Node Embeddings

TLDR
This paper attempts to model the hierarchical and dynamic features of social networks by designing a damping-based sampling algorithm corresponding to a local search-based incremental learning algorithm, which can easily be extended to large-scale scenarios.

RaRE: Social Rank Regulated Large-scale Network Embedding

TLDR
A carefully designed link generation model is proposed, which explicitly models the interdependency between these two types of embeddings, and demonstrates the superiority of the novel network embedding model over the state-of-the-art methods.

Properties of Vector Embeddings in Social Networks

TLDR
This paper studies and investigates network properties preserved by recent random walk-based embedding procedures like node2vec, DeepWalk or LINE and proposes a method that applies learning to rank in order to relate embeddings to network centralities, which is shown to approximate the Closeness Centrality measure in social networks.
...

References

SHOWING 1-10 OF 24 REFERENCES

Distributed large-scale natural graph factorization

TLDR
This work proposes a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions, and decomposition is based on a streaming algorithm.

Information network or social network?: the structure of the twitter follow graph

TLDR
A characterization of the topological features of the Twitter follow graph is provided, analyzing properties such as degree distributions, connected components, shortest path lengths, clustering coefficients, and degree assortativity to hypothesize that from an individual user's perspective, Twitter starts off more like an information network, but evolves to behave more like a social network.

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

TLDR
A new supervised dimensionality reduction algorithm called marginal Fisher analysis is proposed in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizing the interclass separability.

ArnetMiner: extraction and mining of academic social networks

TLDR
The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.

The link prediction problem for social networks

TLDR
Experiments on large co-authorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.

DeepWalk: online learning of social representations

TLDR
DeepWalk is an online learning algorithm which builds useful incremental results, and is trivially parallelizable, which make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

Node Classification in Social Networks

When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or

Neural Word Embedding as Implicit Matrix Factorization

TLDR
It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.

Distributed Representations of Sentences and Documents

TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

Reducing the sampling complexity of topic models

TLDR
An algorithm which scales linearly with the number of actually instantiated topics kd in the document, for large document collections and in structured hierarchical models kd ll k, yields an order of magnitude speedup.