ROX: Run-Time Optimization of XQueries

Riham Abdel Kader

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

40 Downloads (Pure)

Abstract

Query optimization is the most important and complex phase of answering a user query. While sufficient for some applications, the widely used type of relational optimizers are not always robust, picking execution plans that are far from optimal. This is due to several reasons. First, they depend on statistics and a cost model which are often inaccurate, and sometimes even absent. Second, they fail to detect correlations which can unexpectedly make certain plans considerably cheaper than others. Finally, they cannot efficiently handle the large search space of big queries. The challenges faced by traditional relational optimizers and their impact on the quality of the chosen plans are aggravated in the context of XML and XQueries. This is due to the fact that in XML, it is harder to collect and maintain representative statistics since they have to capture more information about the document. Moreover, the search space of plans for an XQuery query is on average larger than that of relational queries, due to the higher number of joins resulting from the existence of many XPath steps in a typical XQuery. To overcome the above challenges, we propose ROX, a Run-time Optimizer for XQueries. ROX is autonomous, i.e. it does not depend on any statistics and cost models, robust in always finding a good execution plan while detecting and benefiting from correlations, and efficient in exploring the search space of plans. We show, through experiments, that ROX is indeed robust and efficient, and performs better than relational compile-time optimizers. ROX adopts a fundamentally different internal design which moves the optimization to run-time, and interleaves it with query execution. The search space is efficiently explored by alternating optimization and execution phases, defining the plan incrementally. Every execution step executes a set of operators and materializes the results, allowing the next optimization phase to benefit from the knowledge extracted from the newly materialized intermediates. Sampling techniques are used to accurately estimate the cardinality and cost of operators. To detect correlations, we introduce the chain sampling technique, the first generic and robust method to deal with any type of correlated data. We also extend the ROX idea to pipelined architectures to allow most of the existing database systems to benefit from our research.
Original languageEnglish
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Apers, Peter Maria Gerardus, Supervisor
  • van Keulen, Maurice , Co-Supervisor
Thesis sponsors
Award date25 Nov 2010
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3111-5
DOIs
Publication statusPublished - 25 Nov 2010

Fingerprint

Statistics
XML
Sampling
Costs
Experiments

Keywords

  • DB-XMLDB: XML DATABASES
  • METIS-271172
  • IR-75036
  • EWI-19005
  • NWO 612.066.410

Cite this

Abdel Kader, R. (2010). ROX: Run-Time Optimization of XQueries. Enschede: Centre for Telematics and Information Technology (CTIT). https://doi.org/10.3990/1.9789036531115
Abdel Kader, Riham. / ROX: Run-Time Optimization of XQueries. Enschede : Centre for Telematics and Information Technology (CTIT), 2010. 240 p.
@phdthesis{47e9d71c9fd54059aa46314fa48703e1,
title = "ROX: Run-Time Optimization of XQueries",
abstract = "Query optimization is the most important and complex phase of answering a user query. While sufficient for some applications, the widely used type of relational optimizers are not always robust, picking execution plans that are far from optimal. This is due to several reasons. First, they depend on statistics and a cost model which are often inaccurate, and sometimes even absent. Second, they fail to detect correlations which can unexpectedly make certain plans considerably cheaper than others. Finally, they cannot efficiently handle the large search space of big queries. The challenges faced by traditional relational optimizers and their impact on the quality of the chosen plans are aggravated in the context of XML and XQueries. This is due to the fact that in XML, it is harder to collect and maintain representative statistics since they have to capture more information about the document. Moreover, the search space of plans for an XQuery query is on average larger than that of relational queries, due to the higher number of joins resulting from the existence of many XPath steps in a typical XQuery. To overcome the above challenges, we propose ROX, a Run-time Optimizer for XQueries. ROX is autonomous, i.e. it does not depend on any statistics and cost models, robust in always finding a good execution plan while detecting and benefiting from correlations, and efficient in exploring the search space of plans. We show, through experiments, that ROX is indeed robust and efficient, and performs better than relational compile-time optimizers. ROX adopts a fundamentally different internal design which moves the optimization to run-time, and interleaves it with query execution. The search space is efficiently explored by alternating optimization and execution phases, defining the plan incrementally. Every execution step executes a set of operators and materializes the results, allowing the next optimization phase to benefit from the knowledge extracted from the newly materialized intermediates. Sampling techniques are used to accurately estimate the cardinality and cost of operators. To detect correlations, we introduce the chain sampling technique, the first generic and robust method to deal with any type of correlated data. We also extend the ROX idea to pipelined architectures to allow most of the existing database systems to benefit from our research.",
keywords = "DB-XMLDB: XML DATABASES, METIS-271172, IR-75036, EWI-19005, NWO 612.066.410",
author = "{Abdel Kader}, Riham",
note = "NWO 612.066.410",
year = "2010",
month = "11",
day = "25",
doi = "10.3990/1.9789036531115",
language = "English",
isbn = "978-90-365-3111-5",
series = "SIKS Dissertation Series",
publisher = "Centre for Telematics and Information Technology (CTIT)",
number = "2010-54",
address = "Netherlands",
school = "University of Twente",

}

Abdel Kader, R 2010, 'ROX: Run-Time Optimization of XQueries', University of Twente, Enschede. https://doi.org/10.3990/1.9789036531115

ROX: Run-Time Optimization of XQueries. / Abdel Kader, Riham.

Enschede : Centre for Telematics and Information Technology (CTIT), 2010. 240 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - ROX: Run-Time Optimization of XQueries

AU - Abdel Kader, Riham

N1 - NWO 612.066.410

PY - 2010/11/25

Y1 - 2010/11/25

N2 - Query optimization is the most important and complex phase of answering a user query. While sufficient for some applications, the widely used type of relational optimizers are not always robust, picking execution plans that are far from optimal. This is due to several reasons. First, they depend on statistics and a cost model which are often inaccurate, and sometimes even absent. Second, they fail to detect correlations which can unexpectedly make certain plans considerably cheaper than others. Finally, they cannot efficiently handle the large search space of big queries. The challenges faced by traditional relational optimizers and their impact on the quality of the chosen plans are aggravated in the context of XML and XQueries. This is due to the fact that in XML, it is harder to collect and maintain representative statistics since they have to capture more information about the document. Moreover, the search space of plans for an XQuery query is on average larger than that of relational queries, due to the higher number of joins resulting from the existence of many XPath steps in a typical XQuery. To overcome the above challenges, we propose ROX, a Run-time Optimizer for XQueries. ROX is autonomous, i.e. it does not depend on any statistics and cost models, robust in always finding a good execution plan while detecting and benefiting from correlations, and efficient in exploring the search space of plans. We show, through experiments, that ROX is indeed robust and efficient, and performs better than relational compile-time optimizers. ROX adopts a fundamentally different internal design which moves the optimization to run-time, and interleaves it with query execution. The search space is efficiently explored by alternating optimization and execution phases, defining the plan incrementally. Every execution step executes a set of operators and materializes the results, allowing the next optimization phase to benefit from the knowledge extracted from the newly materialized intermediates. Sampling techniques are used to accurately estimate the cardinality and cost of operators. To detect correlations, we introduce the chain sampling technique, the first generic and robust method to deal with any type of correlated data. We also extend the ROX idea to pipelined architectures to allow most of the existing database systems to benefit from our research.

AB - Query optimization is the most important and complex phase of answering a user query. While sufficient for some applications, the widely used type of relational optimizers are not always robust, picking execution plans that are far from optimal. This is due to several reasons. First, they depend on statistics and a cost model which are often inaccurate, and sometimes even absent. Second, they fail to detect correlations which can unexpectedly make certain plans considerably cheaper than others. Finally, they cannot efficiently handle the large search space of big queries. The challenges faced by traditional relational optimizers and their impact on the quality of the chosen plans are aggravated in the context of XML and XQueries. This is due to the fact that in XML, it is harder to collect and maintain representative statistics since they have to capture more information about the document. Moreover, the search space of plans for an XQuery query is on average larger than that of relational queries, due to the higher number of joins resulting from the existence of many XPath steps in a typical XQuery. To overcome the above challenges, we propose ROX, a Run-time Optimizer for XQueries. ROX is autonomous, i.e. it does not depend on any statistics and cost models, robust in always finding a good execution plan while detecting and benefiting from correlations, and efficient in exploring the search space of plans. We show, through experiments, that ROX is indeed robust and efficient, and performs better than relational compile-time optimizers. ROX adopts a fundamentally different internal design which moves the optimization to run-time, and interleaves it with query execution. The search space is efficiently explored by alternating optimization and execution phases, defining the plan incrementally. Every execution step executes a set of operators and materializes the results, allowing the next optimization phase to benefit from the knowledge extracted from the newly materialized intermediates. Sampling techniques are used to accurately estimate the cardinality and cost of operators. To detect correlations, we introduce the chain sampling technique, the first generic and robust method to deal with any type of correlated data. We also extend the ROX idea to pipelined architectures to allow most of the existing database systems to benefit from our research.

KW - DB-XMLDB: XML DATABASES

KW - METIS-271172

KW - IR-75036

KW - EWI-19005

KW - NWO 612.066.410

U2 - 10.3990/1.9789036531115

DO - 10.3990/1.9789036531115

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3111-5

T3 - SIKS Dissertation Series

PB - Centre for Telematics and Information Technology (CTIT)

CY - Enschede

ER -

Abdel Kader R. ROX: Run-Time Optimization of XQueries. Enschede: Centre for Telematics and Information Technology (CTIT), 2010. 240 p. (SIKS Dissertation Series; 2010-54 ). (CTIT Dissertation Series; 10-184). https://doi.org/10.3990/1.9789036531115