Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation

H.J. Lu, J.X. Yu, L. Feng, Z.X. Li

Research output: Contribution to journalArticleAcademicpeer-review

10 Citations (Scopus)

Abstract

Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.
Original languageUndefined
Article number10.1023/A:1021567425133
Pages (from-to)181-202
Number of pages22
JournalDistributed and parallel databases
Volume13
Issue number2
DOIs
Publication statusPublished - Mar 2003

Keywords

  • EWI-6316
  • IR-63245
  • DB-DW: DATA WAREHOUSING

Cite this

Lu, H.J. ; Yu, J.X. ; Feng, L. ; Li, Z.X. / Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation. In: Distributed and parallel databases. 2003 ; Vol. 13, No. 2. pp. 181-202.
@article{7bb169a1862d457e941531b3bdedf17f,
title = "Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation",
abstract = "Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.",
keywords = "EWI-6316, IR-63245, DB-DW: DATA WAREHOUSING",
author = "H.J. Lu and J.X. Yu and L. Feng and Z.X. Li",
note = "Imported from EWI/DB PMS [db-utwente:arti:0000003305]",
year = "2003",
month = "3",
doi = "10.1023/A:1021567425133",
language = "Undefined",
volume = "13",
pages = "181--202",
journal = "Distributed and parallel databases",
issn = "0926-8782",
publisher = "Springer",
number = "2",

}

Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation. / Lu, H.J.; Yu, J.X.; Feng, L.; Li, Z.X.

In: Distributed and parallel databases, Vol. 13, No. 2, 10.1023/A:1021567425133, 03.2003, p. 181-202.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation

AU - Lu, H.J.

AU - Yu, J.X.

AU - Feng, L.

AU - Li, Z.X.

N1 - Imported from EWI/DB PMS [db-utwente:arti:0000003305]

PY - 2003/3

Y1 - 2003/3

N2 - Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.

AB - Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.

KW - EWI-6316

KW - IR-63245

KW - DB-DW: DATA WAREHOUSING

U2 - 10.1023/A:1021567425133

DO - 10.1023/A:1021567425133

M3 - Article

VL - 13

SP - 181

EP - 202

JO - Distributed and parallel databases

JF - Distributed and parallel databases

SN - 0926-8782

IS - 2

M1 - 10.1023/A:1021567425133

ER -