A novel approach to the unsupervised extraction of reliable training samples from thematic products

C. Paris, Lorenzo Bruzzone

Research output: Contribution to journalArticleAcademicpeer-review

10 Citations (Scopus)


Supervised classification algorithms require a sufficiently large set of representative training samples to generate accurate land-cover maps. Collecting reference data is difficult, expensive, and unfeasible at the large scale. To solve this problem, this article introduces a novel approach that aims to extract reliable labeled data from existing thematic products. Although these products represent a potentially useful information source, their use is not straightforward. They are not completely reliable since they may present classification errors. They are typically aggregated at polygon level, where polygons do not necessarily correspond to homogeneous areas. Finally, usually, there is a semantic gap between map legends and remote sensing (RS) data. In this context, we propose an approach that aims to: 1) perform a domain understanding to detect the discrepancies between the thematic map domain and the RS data domain; 2) use RS data contemporary to the map to decompose the thematic product from the semantic and spatial viewpoints; and 3) extract a database of informative and reliable training samples. The database of weak labeled units is used for training an ensemble of classifiers on recent data whose results are then combined in a majority voting rule. Two sets of experimental results obtained on MS images by extracting training samples from a crop type map and the 2018 Corine Land Cover (CLC) map, respectively, confirm the effectiveness of the proposed approach.

Original languageEnglish
Article number9121728
Pages (from-to)1930-1948
Number of pages19
JournalIEEE transactions on geoscience and remote sensing
Issue number3
Publication statusPublished - Mar 2020
Externally publishedYes


  • ITC-CV
  • n/a OA procedure


Dive into the research topics of 'A novel approach to the unsupervised extraction of reliable training samples from thematic products'. Together they form a unique fingerprint.

Cite this