Load Balanced Mapping of Distributed Objects to Minimize Network Communication

Alexander D. Stoyenko, Jan Bosch, Mehmet Aksit, Thomas J. Marlowe

    Research output: Contribution to journalArticleAcademicpeer-review

    9 Citations (Scopus)
    19 Downloads (Pure)

    Abstract

    This paper introduces a new load balancing and communica- tion minimizing heuristic used in the Inverse Remote Procedure Call (IRPC) system. While the paper briefly describes the IRPC system, the focus is on the new IRPC assignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting pro- gram partition is then reflected in the physical object distri- bution, and remote and local object communication is trans- parently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in time O(E(log d + l + log M)), where M is the number of network sites, E is the number of communication edges, and d is the maximum degree of a node; l is a parameter of the algorithm, and can vary between 1 and N, where N is the number of communicating objects. This complexity is more nearly independent of M, and considerably better in terms of E and N, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Q(MAPE), and it can be made (by choosing l appropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal com- munication, and Kernighan - Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random parti- tioning, and outperforming in cost reduction and actual execu- tion time the three aforementioned fast heuristics, even with a large l. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is both essentially linear in the number of communication edges, and better than existing, established heuristics of no better complexity.
    Original languageEnglish
    Pages (from-to)117-136
    Number of pages20
    JournalJournal of parallel and distributed computing
    Volume34
    Issue number2
    DOIs
    Publication statusPublished - 1 May 1996

    Fingerprint

    Network Communication
    Telecommunication networks
    Heuristics
    Minimise
    Communication
    Cost reduction
    Resource allocation
    Copying
    Assignment
    Electric power transmission networks
    Vertex of a graph
    Load Balancing
    Compiler
    Costs
    Graph in graph theory
    Object
    Min-cut
    Graph Partitioning
    Runtime Systems
    Quantitative Evaluation

    Cite this

    Stoyenko, Alexander D. ; Bosch, Jan ; Aksit, Mehmet ; Marlowe, Thomas J. / Load Balanced Mapping of Distributed Objects to Minimize Network Communication. In: Journal of parallel and distributed computing. 1996 ; Vol. 34, No. 2. pp. 117-136.
    @article{0b160063eba94937a3ae4d943ec4e913,
    title = "Load Balanced Mapping of Distributed Objects to Minimize Network Communication",
    abstract = "This paper introduces a new load balancing and communica- tion minimizing heuristic used in the Inverse Remote Procedure Call (IRPC) system. While the paper briefly describes the IRPC system, the focus is on the new IRPC assignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting pro- gram partition is then reflected in the physical object distri- bution, and remote and local object communication is trans- parently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in time O(E(log d + l + log M)), where M is the number of network sites, E is the number of communication edges, and d is the maximum degree of a node; l is a parameter of the algorithm, and can vary between 1 and N, where N is the number of communicating objects. This complexity is more nearly independent of M, and considerably better in terms of E and N, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Q(MAPE), and it can be made (by choosing l appropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal com- munication, and Kernighan - Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75{\%} traffic cost reductions for over 95{\%} of the programs when compared to random parti- tioning, and outperforming in cost reduction and actual execu- tion time the three aforementioned fast heuristics, even with a large l. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is both essentially linear in the number of communication edges, and better than existing, established heuristics of no better complexity.",
    author = "Stoyenko, {Alexander D.} and Jan Bosch and Mehmet Aksit and Marlowe, {Thomas J.}",
    year = "1996",
    month = "5",
    day = "1",
    doi = "10.1006/jpdc.1996.0050",
    language = "English",
    volume = "34",
    pages = "117--136",
    journal = "Journal of parallel and distributed computing",
    issn = "0743-7315",
    publisher = "Academic Press Inc.",
    number = "2",

    }

    Load Balanced Mapping of Distributed Objects to Minimize Network Communication. / Stoyenko, Alexander D.; Bosch, Jan; Aksit, Mehmet; Marlowe, Thomas J.

    In: Journal of parallel and distributed computing, Vol. 34, No. 2, 01.05.1996, p. 117-136.

    Research output: Contribution to journalArticleAcademicpeer-review

    TY - JOUR

    T1 - Load Balanced Mapping of Distributed Objects to Minimize Network Communication

    AU - Stoyenko, Alexander D.

    AU - Bosch, Jan

    AU - Aksit, Mehmet

    AU - Marlowe, Thomas J.

    PY - 1996/5/1

    Y1 - 1996/5/1

    N2 - This paper introduces a new load balancing and communica- tion minimizing heuristic used in the Inverse Remote Procedure Call (IRPC) system. While the paper briefly describes the IRPC system, the focus is on the new IRPC assignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting pro- gram partition is then reflected in the physical object distri- bution, and remote and local object communication is trans- parently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in time O(E(log d + l + log M)), where M is the number of network sites, E is the number of communication edges, and d is the maximum degree of a node; l is a parameter of the algorithm, and can vary between 1 and N, where N is the number of communicating objects. This complexity is more nearly independent of M, and considerably better in terms of E and N, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Q(MAPE), and it can be made (by choosing l appropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal com- munication, and Kernighan - Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random parti- tioning, and outperforming in cost reduction and actual execu- tion time the three aforementioned fast heuristics, even with a large l. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is both essentially linear in the number of communication edges, and better than existing, established heuristics of no better complexity.

    AB - This paper introduces a new load balancing and communica- tion minimizing heuristic used in the Inverse Remote Procedure Call (IRPC) system. While the paper briefly describes the IRPC system, the focus is on the new IRPC assignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting pro- gram partition is then reflected in the physical object distri- bution, and remote and local object communication is trans- parently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in time O(E(log d + l + log M)), where M is the number of network sites, E is the number of communication edges, and d is the maximum degree of a node; l is a parameter of the algorithm, and can vary between 1 and N, where N is the number of communicating objects. This complexity is more nearly independent of M, and considerably better in terms of E and N, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Q(MAPE), and it can be made (by choosing l appropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal com- munication, and Kernighan - Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random parti- tioning, and outperforming in cost reduction and actual execu- tion time the three aforementioned fast heuristics, even with a large l. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is both essentially linear in the number of communication edges, and better than existing, established heuristics of no better complexity.

    U2 - 10.1006/jpdc.1996.0050

    DO - 10.1006/jpdc.1996.0050

    M3 - Article

    VL - 34

    SP - 117

    EP - 136

    JO - Journal of parallel and distributed computing

    JF - Journal of parallel and distributed computing

    SN - 0743-7315

    IS - 2

    ER -