A Spark-based platform to extract phenological information from satellite images

V. Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

9 Downloads (Pure)


Phenology is the study of periodic plant and animal lifecycle events and how these are influenced by seasonal andinter-annual variations in weather and climate, as well as inother environmental factors. Time series of remote sensing(RS) images can be used to characterize land surface phenol-ogy at continental to global scales. For this, the RS imagesare typically transformed into various vegetation indices (VI)such as the normalized difference vegetation index (NDVI) orthe enhanced vegetation index (EVI). These indices can thenbe used to extract various phenological metrics.In our previous work we used cloud computing to generatetemperature-based phenological indices [1], [2], and to relateone phenological metric, namely the Start-of-Season (SOS),with those indices [3], [4]. Here we present an extension ofour work where we use a Spark-based platform to efficientlyextract phenological metrics from time series of NDVI andEVI. This platform allows obtaining and analyzing high spatialresolution metrics (in this case 1km) from 10-day composites.The platform uses the same architecture as in [3], i.e., it isorganized into three layers: a storage layer, a processing layer,and JupyterHub services for user-interaction. It is designedto store the data in well-known file formats like GeoTiffsand Hierarchical Data Format (HDF). For the data analysisthe user expresses the operations in Jupyter notebooks asPython, R, or Scala code (Fig. 1). Hence, with a browser andremote connection, the user can express a research questionand/or collect insights from large data sets. All computationsare pushed down to the computational platform, and resultsfetched back for data visualization.To extract the phenological metrics, we rely on TimeSat[5]. TimeSat is a software package that can be used to fit afunction (e.g. double logistic) to time series of VIs. After that,it uses various approaches to extract vegetation seasonalitymetrics such as SOS. The programs numerical and graphicalroutines are coded in Matlab and Fortran. These routines arehighly vectorized and efficient for use with large data sets.However, distributed processing is required to determine SOSat continental scales. Through an efficient partition of the data,and Spark’s scheduling policies, these single-core routines arescheduled for parallel execution over multiple machines.The study evaluates which VIs and fitting functions are mostFig. 1. Computational platformsuitable for certain vegetation types by comparing the SOSmetrics to volunteered phenological observations curated bythe USA national phenological network [6]. Our preliminaryresults show there can be up to 20-30 days differences inthe SOS depending on the fitting function, the VI and theapproach used to extract the SOS metric. In the South, SOSis around mid-February or March whereas in mountainousregions and the North, the SOS can be as late as June-July. Weare to further evaluate how our results compare to the groundvolunteered observations. This work is then a first steppingstone towards being able to systematically analyze and mapthe impact of climate change on the seasonality of plants. Ourtests show that the platform is scalable and can be extended towork with even higher resolution VIs, such as those that canbe derived from Sentinel-2 images (10 m resolution). Becauseof this, our work opens the door to studies at continental toglobal scales, and to the use of high and very high spatialresolution data.
Original languageEnglish
Title of host publication2018 IEEE 14th International Conference on e-Science (e-Science)
Number of pages2
ISBN (Electronic)978-1-5386-9156-4
Publication statusPublished - 2018
Event14th IEEE International Conference on eScience, eScience 2018 - Mövenpick Hotel Amsterdam City Centre, Amsterdam, Netherlands
Duration: 29 Oct 20181 Nov 2018
Conference number: 14


Conference14th IEEE International Conference on eScience, eScience 2018
Internet address


Dive into the research topics of 'A Spark-based platform to extract phenological information from satellite images'. Together they form a unique fingerprint.

Cite this