LINE: Large-scale Information Network Embedding

  title={LINE: Large-scale Information Network Embedding},
  author={Jian Tang and Meng Qu and Mingzhe Wang and Ming Zhang and Jun Yan and Qiaozhu Mei},
  journal={Proceedings of the 24th International Conference on World Wide Web},
  • Jian Tang, Meng Qu, Q. Mei
  • Published 11 March 2015
  • Computer Science
  • Proceedings of the 24th International Conference on World Wide Web
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. [] Key Method The method optimizes a carefully designed objective function that preserves both the local and global network structures.

Figures and Tables from this paper

An Empirical Study of Locally Updated Large-scale Information Network Embedding (LINE)

The novel network embedding method called the ''LINE'' is studied, which optimizes a carefully designed objective function that preserves both the local and global network structures and demonstrates the embeddings on several multi-label network classification tasks for social networks such as BlogCatalog and YouTube.

Evaluating Node Embeddings of Complex Networks

A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the 'divergence score' to embeddings with a goal of distinguishing good from bad ones.


Considering an important property of social networks, i.e., the network is sparse, and hence the average degree of nodes is bounded, an edge-based graph embedding (edge2vec) method is proposed to map the edges in social networks directly to low-dimensional vectors to preserve structure information of embedded edges as much as possible.

A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks

A general embedding framework named Heterogeneous Information Learning in Large-scale networks (HILL), which enables the simultaneous node proximity assessing process to be done in a distributed manner by decomposing the complex modeling and optimization into many simple and independent sub-problems.

Degree-biased random walk for large-scale network embedding

Edge2vec: Edge-based Social Network Embedding

This article proposes an edge-based graph embedding (edge2vec) method to map the edges in social networks directly to low-dimensional vectors and shows the experimental results on different datasets show edge2vec benefits from the direct mapping in preserving the structure information of edges.

Preserving Local and Global Information for Network Embedding

An approach to capture global information and a network embedding framework LOG, which can coherently model {\bf LO}cal and {\bf G}lobal information is introduced, which demonstrates the ability to preserve global information of the proposed framework.

Modeling Large-Scale Dynamic Social Networks via Node Embeddings

This paper attempts to model the hierarchical and dynamic features of social networks by designing a damping-based sampling algorithm corresponding to a local search-based incremental learning algorithm, which can easily be extended to large-scale scenarios.

RaRE: Social Rank Regulated Large-scale Network Embedding

A carefully designed link generation model is proposed, which explicitly models the interdependency between these two types of embeddings, and demonstrates the superiority of the novel network embedding model over the state-of-the-art methods.

Properties of Vector Embeddings in Social Networks

This paper studies and investigates network properties preserved by recent random walk-based embedding procedures like node2vec, DeepWalk or LINE and proposes a method that applies learning to rank in order to relate embeddings to network centralities, which is shown to approximate the Closeness Centrality measure in social networks.



Distributed large-scale natural graph factorization

This work proposes a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions, and decomposition is based on a streaming algorithm.

Information network or social network?: the structure of the twitter follow graph

A characterization of the topological features of the Twitter follow graph is provided, analyzing properties such as degree distributions, connected components, shortest path lengths, clustering coefficients, and degree assortativity to hypothesize that from an individual user's perspective, Twitter starts off more like an information network, but evolves to behave more like a social network.

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

A new supervised dimensionality reduction algorithm called marginal Fisher analysis is proposed in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizing the interclass separability.

ArnetMiner: extraction and mining of academic social networks

The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.

The link prediction problem for social networks

Experiments on large co-authorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.

DeepWalk: online learning of social representations

DeepWalk is an online learning algorithm which builds useful incremental results, and is trivially parallelizable, which make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

Node Classification in Social Networks

When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or

Neural Word Embedding as Implicit Matrix Factorization

It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

Reducing the sampling complexity of topic models

An algorithm which scales linearly with the number of actually instantiated topics kd in the document, for large document collections and in structured hierarchical models kd ll k, yields an order of magnitude speedup.