Configuring random graph models with fixed degree sequences

Bailey K. Fosdick; Daniel B. Larremore; Joel Nishimura; Johan Ugander

doi:10.1137/16M1087175

Configuring random graph models with fixed degree sequences

Bailey K. Fosdick, Daniel B. Larremore, Joel Nishimura, Johan Ugander

Mathematical and Natural Sciences, School of (SMNS)

Research output: Contribution to journal › Review article › peer-review

123 Scopus citations

Abstract

Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and we investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling-stub-labeled or vertex-labeled-under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three in-depth vignettes, analyzing three different network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.

Original language	English (US)
Pages (from-to)	315-355
Number of pages	41
Journal	SIAM Review
Volume	60
Issue number	2
DOIs	https://doi.org/10.1137/16M1087175
State	Published - 2018

Keywords

Complex networks
Configuration model
Graph enumeration
Graph sampling
Markov chain monte carlo
Permutation tests

ASJC Scopus subject areas

Theoretical Computer Science
Computational Mathematics
Applied Mathematics

Access to Document

10.1137/16M1087175

Cite this

@article{ff70c3074d2b41158ba56191ed44a1d0,

title = "Configuring random graph models with fixed degree sequences",

abstract = "Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and we investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling-stub-labeled or vertex-labeled-under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three in-depth vignettes, analyzing three different network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.",

keywords = "Complex networks, Configuration model, Graph enumeration, Graph sampling, Markov chain monte carlo, Permutation tests",

author = "Fosdick, {Bailey K.} and Larremore, {Daniel B.} and Joel Nishimura and Johan Ugander",

note = "Funding Information: \ast Received by the editors August 1, 2016; accepted for publication (in revised form) April 20, 2017; published electronically May 8, 2018. All authors contributed equally to this work. http://www.siam.org/journals/sirev/60-2/M108717.html Funding: This work was funded by the American Mathematical Society, the Santa Fe Institute Omidyar Fellowship, a David Morgenthaler II Faculty Fellowship, and the National Science Foundation under grant SES-1461495. \dagger Department of Statistics, Colorado State University, Ft. Collins, CO 80523 (bailey.fosdick@ colostate.edu). \ddagger Santa Fe Institute, Sante Fe, NM 87501 (larremore@santafe.edu). \S School of Mathematical and Natural Sciences, Arizona State University, Glendale, AZ 85306 (joel.nishimura@asu.edu). \P Management Science \& Engineering, Stanford University, Stanford, CA 94305 (jugander@ stanford.edu). Publisher Copyright: {\textcopyright} 2018 Society for Industrial and Applied Mathematics.",

year = "2018",

doi = "10.1137/16M1087175",

language = "English (US)",

volume = "60",

pages = "315--355",

journal = "SIAM Review",

issn = "0036-1445",

publisher = "Society for Industrial and Applied Mathematics Publications",

number = "2",

}

TY - JOUR

T1 - Configuring random graph models with fixed degree sequences

AU - Fosdick, Bailey K.

AU - Larremore, Daniel B.

AU - Nishimura, Joel

AU - Ugander, Johan

N1 - Funding Information: \ast Received by the editors August 1, 2016; accepted for publication (in revised form) April 20, 2017; published electronically May 8, 2018. All authors contributed equally to this work. http://www.siam.org/journals/sirev/60-2/M108717.html Funding: This work was funded by the American Mathematical Society, the Santa Fe Institute Omidyar Fellowship, a David Morgenthaler II Faculty Fellowship, and the National Science Foundation under grant SES-1461495. \dagger Department of Statistics, Colorado State University, Ft. Collins, CO 80523 (bailey.fosdick@ colostate.edu). \ddagger Santa Fe Institute, Sante Fe, NM 87501 (larremore@santafe.edu). \S School of Mathematical and Natural Sciences, Arizona State University, Glendale, AZ 85306 (joel.nishimura@asu.edu). \P Management Science \& Engineering, Stanford University, Stanford, CA 94305 (jugander@ stanford.edu). Publisher Copyright: © 2018 Society for Industrial and Applied Mathematics.

PY - 2018

Y1 - 2018

N2 - Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and we investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling-stub-labeled or vertex-labeled-under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three in-depth vignettes, analyzing three different network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.

AB - Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and we investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling-stub-labeled or vertex-labeled-under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three in-depth vignettes, analyzing three different network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.

KW - Complex networks

KW - Configuration model

KW - Graph enumeration

KW - Graph sampling

KW - Markov chain monte carlo

KW - Permutation tests

UR - http://www.scopus.com/inward/record.url?scp=85046685956&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046685956&partnerID=8YFLogxK

U2 - 10.1137/16M1087175

DO - 10.1137/16M1087175

M3 - Review article

AN - SCOPUS:85046685956

SN - 0036-1445

VL - 60

SP - 315

EP - 355

JO - SIAM Review

JF - SIAM Review

IS - 2

ER -

Configuring random graph models with fixed degree sequences

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this