Abstract
Machine learning differs widely from classical statistical methodology, be it in its philosophy or its approach to data analysis. In statistics, much of the theory is derived under standard assumptions systematically violated in modern practice. Although ignoring traditional guidelines, experiments show outstanding results. The latter are often surprising and challenge well-established dogmas, most notably the bias-variance tradeoff. An unequal availability of data and a growing usage of predictive models for the automation of various tasks leads to training and applying the same models to different data sets. Such practices break the common assumption of distributional equality between training and testing data, invalidating most of the established theoretical guarantees. Statistical methods that perform in practice but lack a proper theory are unpredictable and untrustworthy, which prevents their deployment in many academic and industrial applications.
This thesis contributes to developing a statistical theory covering modern practices. The central theme is the nonparametric regression model. We analyse extensions of its standard setup matching two contemporary techniques: transfer learning and generative modelling. Transfer learning leverages shared information across datasets to train models with improved performance. In Chapters 2 and 3, we derive risk bounds for the prediction error of a model in the context of covariate shift, a specific instance of transfer learning that models a distributional mismatch between the covariates in (part of) the training and testing data. Then, we consider the problem of sampling from an unknown conditional distribution. We relate the latter to quantile regression, a strict generalisation of nonparametric regression.
This thesis contributes to developing a statistical theory covering modern practices. The central theme is the nonparametric regression model. We analyse extensions of its standard setup matching two contemporary techniques: transfer learning and generative modelling. Transfer learning leverages shared information across datasets to train models with improved performance. In Chapters 2 and 3, we derive risk bounds for the prediction error of a model in the context of covariate shift, a specific instance of transfer learning that models a distributional mismatch between the covariates in (part of) the training and testing data. Then, we consider the problem of sampling from an unknown conditional distribution. We relate the latter to quantile regression, a strict generalisation of nonparametric regression.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 12 Mar 2025 |
Place of Publication | Enschede |
Publisher | |
Print ISBNs | 978-90-365-6463-2 |
Electronic ISBNs | 978-90-365-6464-9 |
DOIs | |
Publication status | Published - 12 Mar 2025 |