Abstract
Original language  Undefined 

Awarding Institution 

Supervisors/Advisors 

Thesis sponsors  
Award date  7 Oct 2016 
Place of Publication  Enschede 
Publisher  
Print ISBNs  9789036541794 
DOIs  
Publication status  Published  7 Oct 2016 
Keywords
 Complex networks
 Degreedegree correlations
 Directed paths
 IR101225
 Erased configuration model
 METIS317818
 Random graphs
 EWI27220
Cite this
}
Asymptotic analysis of network structures: degreedegree correlations and directed paths. / van der Hoorn, W.L.F.
Enschede : Centre for Telematics and Information Technology (CTIT), 2016. 249 p.Research output: Thesis › PhD Thesis  Research UT, graduation UT
TY  THES
T1  Asymptotic analysis of network structures: degreedegree correlations and directed paths
AU  van der Hoorn, W.L.F.
PY  2016/10/7
Y1  2016/10/7
N2  Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are webpages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most realworld networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degreedegree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between outdegree, the number of edges pointing out of the node, and indegree, the number of edges pointing towards the node. Degreedegree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degreedegree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degreedegree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rankcorrelations, and show that these are statistically consistent estimators of degreedegree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degreedegree correlations. The first is the wellknown directed configuration model. This model assigns to each node an out and indegree, which we see as a number of halfedges pointing out of, and into the node. Then we create the network by randomly pairing halfedges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (selfloops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most realworld networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all selfloops and replace all multiple edges between two nodes by just one edge. We analyze degreedegree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degreedegree correlations in the directed configuration model. When the degree distributions follows a powerlaw, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the powerlaw. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degreedegree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degreedegree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degreedegree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out and indegrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, stateoftheart, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.
AB  Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are webpages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most realworld networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degreedegree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between outdegree, the number of edges pointing out of the node, and indegree, the number of edges pointing towards the node. Degreedegree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degreedegree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degreedegree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rankcorrelations, and show that these are statistically consistent estimators of degreedegree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degreedegree correlations. The first is the wellknown directed configuration model. This model assigns to each node an out and indegree, which we see as a number of halfedges pointing out of, and into the node. Then we create the network by randomly pairing halfedges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (selfloops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most realworld networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all selfloops and replace all multiple edges between two nodes by just one edge. We analyze degreedegree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degreedegree correlations in the directed configuration model. When the degree distributions follows a powerlaw, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the powerlaw. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degreedegree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degreedegree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degreedegree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out and indegrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, stateoftheart, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.
KW  Complex networks
KW  Degreedegree correlations
KW  Directed paths
KW  IR101225
KW  Erased configuration model
KW  METIS317818
KW  Random graphs
KW  EWI27220
U2  10.3990/1.9789036541794
DO  10.3990/1.9789036541794
M3  PhD Thesis  Research UT, graduation UT
SN  9789036541794
PB  Centre for Telematics and Information Technology (CTIT)
CY  Enschede
ER 