• Corpus ID: 42416536

Should We Be Confident in Peer Effects Estimated from Partial Crawls of Social Networks ?

  title={Should We Be Confident in Peer Effects Estimated from Partial Crawls of Social Networks ?},
  author={Jiasen Yang and Bruno Ribeiro and Jennifer Neville},
Research in social network analysis and statistical relational learning has produced a number of methods for learning relational models from large-scale network data. Unfortunately, these methods have been developed under the unrealistic assumption of full data access. In practice, however, the data are often collected by crawling the network, due to proprietary access, limited resources, and privacy concerns. While prior studies have examined the impact of network crawling on the structural… 

Figures and Tables from this paper

Stochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls

This work extends the methodology to learning relational logistic regression models via stochastic gradient descent from partial network crawls, and shows that the proposed method yields accurate parameter estimates and confidence intervals.

Simulating systematic bias in attributed social networks and its effect on rankings of minority nodes

The implications of systematic bias in edge data depend on an interplay between network topology and type of systematic error, which emphasises the need for an error model framework as developed here, which provides a first step towards studying the effects of systematic edge-uncertainty for various network analysis tasks.



Inference in OSNs via Lightweight Partial Crawls

Estimation techniques based on short crawls that have proven statistical guarantees are proposed and an adaptive crawler is provided that makes the method parameter-free, significantly improving the statistical guarantees.

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

This paper develops a practical framework for obtaining a uniform sample of users in an online social network by crawling its social graph by considering and comparing several candidate crawling techniques and introduces online formal convergence diagnostics to assess sample quality during the data collection process.

Network Sampling Designs for Relational Classification

Different sampling methods are presented and it is indicated that the choice of sampling method can impact classification performance, and thus consequently affects the accuracy of evaluation.

Sampling from large graphs

The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

Classification in Networked Data: a Toolkit and a Univariate Case Study

The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data.

Simple estimators for relational Bayesian classifiers

This work examines bias and variance tradeoffs over a range of data sets and shows that INDEPVAL's ability to model more multiset information results in lower bias estimates and contributes to its superior performance.

Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues

This book describes the development of Markov models for discrete-time Carlo simulation and some of the models used in this study had problems with regard to consistency and Ergodicity.

Bootstrap Methods: Another Look at the Jackknife

Identifying User Survival Types via Clustering of Censored Social Network Data

This paper proposes a decision tree based algorithm that uses a global normalization of $p$-values to identify clusters with significantly different survival distributions and shows that this model outperforms other competing methods.

The NBER patent citation data file: Lessons, insights and methodological tools

  • 2001