AMiner: Toward Understanding Big Scholar Data

@article{Tang2016AMinerTU,
  title={AMiner: Toward Understanding Big Scholar Data},
  author={Jie Tang},
  journal={Proceedings of the Ninth ACM International Conference on Web Search and Data Mining},
  year={2016}
}
  • Jie Tang
  • Published 8 February 2016
  • Computer Science
  • Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
In this talk, I will present a novel academic search and mining system, AMiner, the second generation of the ArnetMiner system. [] Key Method For now, the system has collected a big scholar data with more than 130,000,000 researcher profiles and 100,000,000 papers from multiple publication databases. We also developed an approach named COSNET to connect AMiner with several professional social networks such as LinkedIn and VideoLectures, which significantly enriches the metadata of the scholarly data.
Deep-profiling: a deep neural network model for scholarly Web user profiling
TLDR
A profile attributes extraction model, PAE-NN, based on a Bi-LSTM-CRF neural network that can automatically extract the characteristics and contextual representations of each extracting entity through a Recurrent Neural Network with end-to-end training.
Scholarly data mining: A systematic review of its applications
TLDR
This work analyzes studies investigating the scholarly data generated via academic technologies such as scholarly networks and digital libraries for building scalable approaches for retrieving, recommending, and analyzing the scholarly content, classifying them into different applications based on literature features and highlighting the machine learning techniques used for this purpose.
A network approach to expertise retrieval based on path similarity and credit allocation
TLDR
This work proposes a network-based approach to the construction of authors' expertise profiles and shows that this method can be applied to a number of widely used data sets and outperforms other methods traditionally used for expertise identification.
AMiner Citation-Data Preprocessing for Recommender Systems on Scientific Publications
TLDR
The proposed approach consists of two phases: creation of a collection of articles based on user preferences and preprocessing this collection, which demonstrates the value of the approach with at least 79.8% information-preserving data reduction.
Mining information interaction behavior : Academic papers and enterprise emails
TLDR
This thesis uncovers how academic searchers interact with information objects, and focuses on how users read their enterprise emails, and characterize user reading time, which improves understanding of user behavior on email platforms.
Unsupervised Key-phrase Extraction and Clustering for Classification Scheme in Scientific Publications
TLDR
This paper investigates possible ways of automating common sub-tasks of the SM/SR process, i.e., extracting keywords and key-phrases from scientific documents using unsupervised methods, which are then used as a basis to construct the so-called classification scheme using semantic clustering techniques.
Attribute-Aware Graph Recurrent Networks for Scholarly Friend Recommendation Based on Internet of Scholars in Scholarly Big Data
TLDR
This article proposes to design a scholarly friend recommendation system by taking advantages of network embedding and scholar attributes, and develops a novel graph recurrent neural framework to embed attributed scholar interactions within the model for recommendations.
Citation Intent Classification Using Word Embedding
TLDR
This study critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent, which will enhance the study of citation context analysis.
Characterizing and predicting downloads in academic search
Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification
TLDR
Findings point towards the possible added value in combining bibliographic coupling analysis with other structures, at the same time, combining direct citation and co-citation is put into question.
...
...

References

SHOWING 1-10 OF 13 REFERENCES
ArnetMiner: extraction and mining of academic social networks
TLDR
The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.
A Combination Approach to Web User Profiling
TLDR
This article formalizes the profiling problem as several subtasks: profile extraction, profile integration, and user interest discovery, and proposes a combination approach to deal with the profiling tasks.
Topic level expertise search over heterogeneous networks
TLDR
This paper proposes a topic level random walk method for ranking the different objects in the academic network, and develops a topical graph search function, based on the topic modeling and citation tracing analysis.
Co-Evolution of Multi-Typed Objects in Dynamic Star Networks
TLDR
A hierarchical Dirichlet process mixture model-based evolution model is proposed, which detects the co-evolution ofMultityped objects in the form of multityped cluster evolution in dynamic star networks and an efficient inference algorithm is provided to learn the proposed model.
Mining advisor-advisee relationships from research publication networks
TLDR
A time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function is proposed and an efficient learning algorithm is designed to optimize the objective function.
VEGAS: Visual influEnce GrAph Summarization on Citation Networks
TLDR
It can be proved that the matrix decomposition based algorithm can approximate the objective of the proposed IGS problem, and it is demonstrated that the method significantly outperforms the previous ones in optimizing both the quantitative IGS objective and the quality of the visual summarizations.
Panther: Fast Top-k Similarity Search on Large Networks
TLDR
This paper proposes a sampling method that provably and accurately estimates the similarity between vertices, based on a novel idea of random path, and shows that the proposed algorithm achieves clearly better performance than several alternative methods.
COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency
TLDR
An efficient subgradient algorithm is developed to train the model by converting the original energy-based objective function into its dual form, and it is demonstrated that applying the integration results produced by the method can improve the accuracy of expert finding, an important task in social networks.
Social influence analysis in large-scale networks
TLDR
Topical Affinity Propagation (TAP) is designed with efficient distributed learning algorithms that is implemented and tested under the Map-Reduce framework and can take results of any topic modeling and the existing network structure to perform topic-level influence propagation.
Cross-domain collaboration recommendation
TLDR
The Cross-domain Topic Learning (CTL) model is proposed, which consolidates the existing cross-domain collaborations through topic layers instead of at author layers, which alleviates the sparseness issue and outperforms baselines significantly on multiple recommendation metrics.
...
...