Statistical machine learning beyond standard supervised learning

  • Hongwei Wen

Research output: ThesisPhD Thesis - Research UT, graduation UT

32 Downloads (Pure)

Abstract

Machine learning has revolutionized scientific research, industry, and society, yet many real-world problems fall outside the scope of traditional supervised learning, which assumes abundant labeled data and stable distributions. In practice, challenges such as distribution shifts, incomplete supervision, and noisy or missing labels demand new theoretical frameworks and algorithms.

This thesis addresses four statistical learning problems beyond standard supervised learning, aiming to develop generalizable, robust methods with solid theoretical guarantees.

First, it investigates label shift in transfer learning, where label distributions differ between training and deployment domains. A novel class probability matching (CPM) framework is introduced to estimate target label distributions by aligning class probabilities. CPM is combined with calibrated neural networks and kernel logistic regression, with both algorithms supported by theory and experiments.

Second, it studies partial label learning, where labels are ambiguous. A new family of leveraged weighted (LW) losses is proposed, introducing a leverage parameter to balance losses on partially and fully observed labels. Risk consistency is established, and the approach shows strong empirical results.

Third, it tackles robust kernel regression under heavy-tailed noise via a generalized Cauchy noise assumption. The work proves an equivalence between excess Cauchy risk and L_2-risk for suitable parameters, and achieves almost minimax-optimal rates for kernel Cauchy ridge regression, demonstrating robustness to diverse noise types.

Finally, it develops high-dimensional density estimation for unsupervised learning through random forest density estimation (RFDE). RFDE is locally adaptive, computationally efficient, and outlier-robust via the median-of-means technique, with provably lower error than single-tree methods.

Collectively, these contributions extend the foundations of statistical learning, offering theoretically sound and practically effective tools for complex, real-world machine learning scenarios.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Schmidt-Hieber, Johannes, Supervisor
  • Koolen, Wouter, Supervisor
  • Betken, Annika, Co-Supervisor
Award date9 Sept 2025
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-6819-7
Electronic ISBNs978-90-365-6820-3
DOIs
Publication statusPublished - 9 Sept 2025

Fingerprint

Dive into the research topics of 'Statistical machine learning beyond standard supervised learning'. Together they form a unique fingerprint.

Cite this