TY - CONF
T1 - Clustering geo-data cubes
AU - Zurita-Milla, R.
AU - Izquierdo Verdiguier, Emma
AU - Girgin, Serkan
AU - Nattino, F.
AU - Ku, Ou
AU - Grootes, M.W.
AU - Goncalves, R.
PY - 2020/9/4
Y1 - 2020/9/4
N2 - Earth observation sensors deliver ever-expanding collections of geospatial data at multiple resolutions (spatial, temporal and thematic or spectral). Efficient tools to extract knowledge from these collections are currently missing. Here we present the first release of Clustering geo-Data Cubes (CDC), a Python package to cluster geospatial data cubes by explicitly considering their dimensionality. CDC has three main hallmarks: 1/ it is based on innovative co- and tri-clustering methods that identify groups of pixels with similar spatio-temporal and/or thematic information by simultaneously considering all the dimensions of the data. This overcomes a major limitations of traditional clustering approaches, which analyze each dimension separately; 2/ it provides refined clusters by re-grouping the results obtained from co- and/or tri-clustering. These refined clusters better capture the patterns present in the data and represent a more automatic approach to analyze geospatial data cubes because the number of clusters is automatically chosen via an optimization procedure; and 3/ it allows users to run tasks efficiently by either using NumPy’s threading capabilities or Dask’s parallel computing power. Hence, CDC is a scalable package that can analyze both small and big geospatial data cubes. These hallmarks are showcased through several case studies.
AB - Earth observation sensors deliver ever-expanding collections of geospatial data at multiple resolutions (spatial, temporal and thematic or spectral). Efficient tools to extract knowledge from these collections are currently missing. Here we present the first release of Clustering geo-Data Cubes (CDC), a Python package to cluster geospatial data cubes by explicitly considering their dimensionality. CDC has three main hallmarks: 1/ it is based on innovative co- and tri-clustering methods that identify groups of pixels with similar spatio-temporal and/or thematic information by simultaneously considering all the dimensions of the data. This overcomes a major limitations of traditional clustering approaches, which analyze each dimension separately; 2/ it provides refined clusters by re-grouping the results obtained from co- and/or tri-clustering. These refined clusters better capture the patterns present in the data and represent a more automatic approach to analyze geospatial data cubes because the number of clusters is automatically chosen via an optimization procedure; and 3/ it allows users to run tasks efficiently by either using NumPy’s threading capabilities or Dask’s parallel computing power. Hence, CDC is a scalable package that can analyze both small and big geospatial data cubes. These hallmarks are showcased through several case studies.
UR - https://github.com/phenology/cgc
M3 - Other
SP - S1-S16
T2 - Space and Artificial Intelligence 2020
Y2 - 4 September 2020 through 4 September 2020
ER -