# Asymptotic analysis of network structures: degree-degree correlations and directed paths

W.L.F. van der Hoorn

Research output: ThesisPhD Thesis - Research UT, graduation UT

### Abstract

Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are web-pages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most real-world networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degree-degree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between out-degree, the number of edges pointing out of the node, and in-degree, the number of edges pointing towards the node. Degree-degree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degree-degree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degree-degree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rank-correlations, and show that these are statistically consistent estimators of degree-degree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degree-degree correlations. The first is the well-known directed configuration model. This model assigns to each node an out- and in-degree, which we see as a number of half-edges pointing out of, and into the node. Then we create the network by randomly pairing half-edges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (self-loops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most real-world networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all self-loops and replace all multiple edges between two nodes by just one edge. We analyze degree-degree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degree-degree correlations in the directed configuration model. When the degree distributions follows a power-law, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the power-law. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degree-degree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degree-degree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degree-degree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out- and in-degrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, state-of-the-art, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.
Original language Undefined University of Twente Boucherie, Richardus J., SupervisorLitvak, Nelly, Advisor EU-FET Open grant NADINE (288956) 7 Oct 2016 Enschede Centre for Telematics and Information Technology (CTIT) 978-90-365-4179-4 https://doi.org/10.3990/1.9789036541794 Published - 7 Oct 2016

### Keywords

• Complex networks
• Degree-degree correlations
• Directed paths
• IR-101225
• Erased configuration model
• METIS-317818
• Random graphs
• EWI-27220

### Cite this

van der Hoorn, W. L. F. (2016). Asymptotic analysis of network structures: degree-degree correlations and directed paths. Enschede: Centre for Telematics and Information Technology (CTIT). https://doi.org/10.3990/1.9789036541794
van der Hoorn, W.L.F.. / Asymptotic analysis of network structures: degree-degree correlations and directed paths. Enschede : Centre for Telematics and Information Technology (CTIT), 2016. 249 p.
@phdthesis{5f8b72c0865d4e32a00dddf3ee93ac45,
title = "Asymptotic analysis of network structures: degree-degree correlations and directed paths",
abstract = "Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are web-pages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most real-world networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degree-degree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between out-degree, the number of edges pointing out of the node, and in-degree, the number of edges pointing towards the node. Degree-degree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degree-degree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degree-degree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rank-correlations, and show that these are statistically consistent estimators of degree-degree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degree-degree correlations. The first is the well-known directed configuration model. This model assigns to each node an out- and in-degree, which we see as a number of half-edges pointing out of, and into the node. Then we create the network by randomly pairing half-edges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (self-loops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most real-world networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all self-loops and replace all multiple edges between two nodes by just one edge. We analyze degree-degree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degree-degree correlations in the directed configuration model. When the degree distributions follows a power-law, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the power-law. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degree-degree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degree-degree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degree-degree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out- and in-degrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, state-of-the-art, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.",
keywords = "Complex networks, Degree-degree correlations, Directed paths, IR-101225, Erased configuration model, METIS-317818, Random graphs, EWI-27220",
author = "{van der Hoorn}, W.L.F.",
year = "2016",
month = "10",
day = "7",
doi = "10.3990/1.9789036541794",
language = "Undefined",
isbn = "978-90-365-4179-4",
publisher = "Centre for Telematics and Information Technology (CTIT)",
school = "University of Twente",

}

Asymptotic analysis of network structures: degree-degree correlations and directed paths. / van der Hoorn, W.L.F.

Enschede : Centre for Telematics and Information Technology (CTIT), 2016. 249 p.

Research output: ThesisPhD Thesis - Research UT, graduation UT

TY - THES

T1 - Asymptotic analysis of network structures: degree-degree correlations and directed paths

AU - van der Hoorn, W.L.F.

PY - 2016/10/7

Y1 - 2016/10/7

N2 - Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are web-pages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most real-world networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degree-degree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between out-degree, the number of edges pointing out of the node, and in-degree, the number of edges pointing towards the node. Degree-degree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degree-degree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degree-degree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rank-correlations, and show that these are statistically consistent estimators of degree-degree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degree-degree correlations. The first is the well-known directed configuration model. This model assigns to each node an out- and in-degree, which we see as a number of half-edges pointing out of, and into the node. Then we create the network by randomly pairing half-edges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (self-loops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most real-world networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all self-loops and replace all multiple edges between two nodes by just one edge. We analyze degree-degree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degree-degree correlations in the directed configuration model. When the degree distributions follows a power-law, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the power-law. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degree-degree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degree-degree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degree-degree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out- and in-degrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, state-of-the-art, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.

AB - Our world is filled with complex systems, ranging from technological systems such as the Internet and the World Wide Web, to the human brain and social interactions between individuals or even organizations. Many of such systems can be modeled as a network, consisting of nodes and relations between them, which are called edges. These edges can be undirected, representing symmetric relations such as a friendship in a social network, or have a direction, in which case the relations are asymmetric. For instance, in the World Wide Web, nodes are web-pages and the edges are the hyperlinks between them, which are directed since they point from one page to another. Understanding the structure of these networks is of vital importance for understanding the behavior of the systems they represent and the processes that run on those systems. In this thesis we analyze structural properties of networks by combining the mathematical theory of probability and graphs with large scale numerical experiments. Due to the fact that most real-world networks we want to analyze are large, we are interested in the asymptotic behavior of the structure of networks, as the number of nodes goes to infinity. The structural properties we consider in this thesis are degree-degree correlations and directed paths. Given a network, each node has a number of edges which we call the degree of the node. When the network is directed we distinguish between out-degree, the number of edges pointing out of the node, and in-degree, the number of edges pointing towards the node. Degree-degree correlations describe the correlation between the (out/in) degrees of connected nodes in a network. A network is said to have assortative mixing (of degrees) if this correlation is positive, for instance, if nodes of large degree connect to nodes of large degree. When the correlation is negative, for example when nodes of large degree are connected to nodes of small degree, the network is said to have disassortative mixing. We say that a network has neutral mixing if the correlation is zero. In this case nodes have no preference, in terms of degrees, to which nodes they connect. A measure for degree-degree correlations assigns to a network a number in between −1 and 1, that characterizes the correlation between the degrees of connected nodes. In this work, we analyze the asymptotic behavior of different measures for degree-degree correlations in directed networks. We conclude that the most commonly used measure, related to Pearson’s correlation coefficient, has significant short comings. We propose several alternative measures, related to rank-correlations, and show that these are statistically consistent estimators of degree-degree correlations in directed networks, under very general assumptions. We also consider two models for generating networks with a specific distribution of the degrees and with specific degree-degree correlations. The first is the well-known directed configuration model. This model assigns to each node an out- and in-degree, which we see as a number of half-edges pointing out of, and into the node. Then we create the network by randomly pairing half-edges pointing outwards to those pointing inwards, to form edges between nodes. The general version of this model will generate networks where nodes can be connected to themselves (self-loops), or where there can be multiple edges from one node to another. Networks that do not have these features are called simple, and this is the case for most real-world networks. Therefore, there are two versions of the configuration model that generate simple networks. The first is the repeated version, where we simply repeat the connection steps, until the resulting network is simple. The second one is the erased version in which we remove all self-loops and replace all multiple edges between two nodes by just one edge. We analyze degree-degree correlations in all three versions of the configuration model and prove that they all have neutral mixing, asymptotically. In addition we analyze the number of edges that are removed in the erased configuration model. Here we establish a new scaling result in terms of the number of nodes in the network. We use the insights from this analysis to analyze the finite size effects on the measures for degree-degree correlations in the directed configuration model. When the degree distributions follows a power-law, we find an interesting phase transition in the fluctuations of the measures, in terms of the exponent of the power-law. We also introduce a new model, that generates graphs that have maximal disassortative mixing. Our model first assigns, similar to the configuration model, degrees to each node. Then we employ a greedy algorithm for connecting the halfedges, which minimizes the value of measures for degree-degree correlations and makes sure that the resulting network is simple. We show that the model gives a lower bound on the value of measures for degree-degree correlations in general networks and prove several asymptotic properties of this model. Using large scale experiments we also show that the minimal achievable value of a measure for degree-degree correlations depends on the full distribution of the degrees and this value can be much larger than the absolute minimum −1. In the last part of this thesis we investigate the hopcount in the directed configuration model, where the degree distribution has finite mean and infinite variance. Given a directed network, the hopcount of the network is defined as the shortest directed path between two nodes, selected at random. We show that the hopcount scales logarithmically with the number of nodes and determine the fluctuations around this value. Interestingly, we find that for the directed model it is the correlation between the out- and in-degrees of individual nodes that plays an important role in the scaling of the hopcount and not the moments of the degree distribution. This is different from the undirected model, where, for infinite variance degrees, the hopcount was found to scale double logarithmically in the number of nodes. To complement our results, we use recently developed, state-of-the-art, algorithms to compute the hopcount distribution on large networks generated by the configuration model and find that already for graphs of reasonable size, one million nodes, our asymptotic results are extremely close to the true fluctuations.

KW - Complex networks

KW - Degree-degree correlations

KW - Directed paths

KW - IR-101225

KW - Erased configuration model

KW - METIS-317818

KW - Random graphs

KW - EWI-27220

U2 - 10.3990/1.9789036541794

DO - 10.3990/1.9789036541794

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-4179-4

PB - Centre for Telematics and Information Technology (CTIT)

CY - Enschede

ER -

van der Hoorn WLF. Asymptotic analysis of network structures: degree-degree correlations and directed paths. Enschede: Centre for Telematics and Information Technology (CTIT), 2016. 249 p. https://doi.org/10.3990/1.9789036541794