ROX: Run-Time Optimization of XQueries

Riham Abdel Kader, Peter Boncz, Stefan Manegold, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

19 Citations (Scopus)
53 Downloads (Pure)

Abstract

Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called "zero-investment" algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.
Original languageEnglish
Title of host publicationSIGMOD '09
Subtitle of host publicationProceedings of the 35th ACM SIGMOD International Conference on Management of Data
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages615-626
Number of pages12
ISBN (Print)978-1-60558-551-2
DOIs
Publication statusPublished - 29 Jun 2009

Fingerprint

Sampling
XML
Costs
Error analysis
Agglomeration

Keywords

  • EWI-15178
  • METIS-265195
  • DB-XMLDB: XML DATABASES

Cite this

Abdel Kader, R., Boncz, P., Manegold, S., & van Keulen, M. (2009). ROX: Run-Time Optimization of XQueries. In SIGMOD '09: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data (pp. 615-626). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/1559845.1559910
Abdel Kader, Riham ; Boncz, Peter ; Manegold, Stefan ; van Keulen, Maurice. / ROX: Run-Time Optimization of XQueries. SIGMOD '09: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. New York : Association for Computing Machinery (ACM), 2009. pp. 615-626
@inproceedings{46d3aef1cd934ac8ae3fcc5b248b24bd,
title = "ROX: Run-Time Optimization of XQueries",
abstract = "Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called {"}zero-investment{"} algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.",
keywords = "EWI-15178, METIS-265195, DB-XMLDB: XML DATABASES",
author = "{Abdel Kader}, Riham and Peter Boncz and Stefan Manegold and {van Keulen}, Maurice",
year = "2009",
month = "6",
day = "29",
doi = "10.1145/1559845.1559910",
language = "English",
isbn = "978-1-60558-551-2",
pages = "615--626",
booktitle = "SIGMOD '09",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",

}

Abdel Kader, R, Boncz, P, Manegold, S & van Keulen, M 2009, ROX: Run-Time Optimization of XQueries. in SIGMOD '09: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery (ACM), New York, pp. 615-626. https://doi.org/10.1145/1559845.1559910

ROX: Run-Time Optimization of XQueries. / Abdel Kader, Riham; Boncz, Peter; Manegold, Stefan; van Keulen, Maurice.

SIGMOD '09: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. New York : Association for Computing Machinery (ACM), 2009. p. 615-626.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - ROX: Run-Time Optimization of XQueries

AU - Abdel Kader, Riham

AU - Boncz, Peter

AU - Manegold, Stefan

AU - van Keulen, Maurice

PY - 2009/6/29

Y1 - 2009/6/29

N2 - Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called "zero-investment" algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.

AB - Optimization of complex XQuery queries that combine many XPath steps as well as join conditions is currently hindered by the absence of good result size estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated join, selection and aggregation predicates. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute and materialize partial results on the fly, observing intermediate result characteristics as well as applying sampling techniques to evaluate the real observed query cost. The query optimization problem studied here takes as input a Join Graph where the edges are either equi-predicates or XPath axis steps, and the execution environment provides value- and structural-join algorithms, in addition to structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. The ROX approach deals with these issues by limiting the run-time search space to so-called "zero-investment" algorithms for which sampling can be guaranteed to be strictly linear in sample size. While the Join Graph used in ROX is a purely relational concept, it crucially fits our XQuery domain as all structural join algorithms and XML value indices we use have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.

KW - EWI-15178

KW - METIS-265195

KW - DB-XMLDB: XML DATABASES

U2 - 10.1145/1559845.1559910

DO - 10.1145/1559845.1559910

M3 - Conference contribution

SN - 978-1-60558-551-2

SP - 615

EP - 626

BT - SIGMOD '09

PB - Association for Computing Machinery (ACM)

CY - New York

ER -

Abdel Kader R, Boncz P, Manegold S, van Keulen M. ROX: Run-Time Optimization of XQueries. In SIGMOD '09: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery (ACM). 2009. p. 615-626 https://doi.org/10.1145/1559845.1559910