A Spark-based platform to extract phenological information from satellite images

V. Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Phenology is the study of periodic plant and animal lifecycle events and how these are influenced by seasonal andinter-annual variations in weather and climate, as well as inother environmental factors. Time series of remote sensing(RS) images can be used to characterize land surface phenol-ogy at continental to global scales. For this, the RS imagesare typically transformed into various vegetation indices (VI)such as the normalized difference vegetation index (NDVI) orthe enhanced vegetation index (EVI). These indices can thenbe used to extract various phenological metrics.In our previous work we used cloud computing to generatetemperature-based phenological indices [1], [2], and to relateone phenological metric, namely the Start-of-Season (SOS),with those indices [3], [4]. Here we present an extension ofour work where we use a Spark-based platform to efficientlyextract phenological metrics from time series of NDVI andEVI. This platform allows obtaining and analyzing high spatialresolution metrics (in this case 1km) from 10-day composites.The platform uses the same architecture as in [3], i.e., it isorganized into three layers: a storage layer, a processing layer,and JupyterHub services for user-interaction. It is designedto store the data in well-known file formats like GeoTiffsand Hierarchical Data Format (HDF). For the data analysisthe user expresses the operations in Jupyter notebooks asPython, R, or Scala code (Fig. 1). Hence, with a browser andremote connection, the user can express a research questionand/or collect insights from large data sets. All computationsare pushed down to the computational platform, and resultsfetched back for data visualization.To extract the phenological metrics, we rely on TimeSat[5]. TimeSat is a software package that can be used to fit afunction (e.g. double logistic) to time series of VIs. After that,it uses various approaches to extract vegetation seasonalitymetrics such as SOS. The programs numerical and graphicalroutines are coded in Matlab and Fortran. These routines arehighly vectorized and efficient for use with large data sets.However, distributed processing is required to determine SOSat continental scales. Through an efficient partition of the data,and Spark’s scheduling policies, these single-core routines arescheduled for parallel execution over multiple machines.The study evaluates which VIs and fitting functions are mostFig. 1. Computational platformsuitable for certain vegetation types by comparing the SOSmetrics to volunteered phenological observations curated bythe USA national phenological network [6]. Our preliminaryresults show there can be up to 20-30 days differences inthe SOS depending on the fitting function, the VI and theapproach used to extract the SOS metric. In the South, SOSis around mid-February or March whereas in mountainousregions and the North, the SOS can be as late as June-July. Weare to further evaluate how our results compare to the groundvolunteered observations. This work is then a first steppingstone towards being able to systematically analyze and mapthe impact of climate change on the seasonality of plants. Ourtests show that the platform is scalable and can be extended towork with even higher resolution VIs, such as those that canbe derived from Sentinel-2 images (10 m resolution). Becauseof this, our work opens the door to studies at continental toglobal scales, and to the use of high and very high spatialresolution data.
Original languageEnglish
Title of host publication2018 IEEE 14th International Conference on e-Science (e-Science)
Pages354-355
Number of pages2
ISBN (Electronic)978-1-5386-9156-4
DOIs
Publication statusPublished - 2018
Event14th IEEE International Conference on eScience, eScience 2018 - Mövenpick Hotel Amsterdam City Centre, Amsterdam, Netherlands
Duration: 29 Oct 20181 Nov 2018
Conference number: 14
https://www.esciencecenter.nl/ieee-escience-conference-2018

Conference

Conference14th IEEE International Conference on eScience, eScience 2018
CountryNetherlands
CityAmsterdam
Period29/10/181/11/18
Internet address

Fingerprint

vegetation index
time series
NDVI
remote sensing
phenology
vegetation type
annual variation
seasonality
visualization
satellite image
phenol
land surface
logistics
environmental factor
weather
software
climate change
animal
vegetation
climate

Cite this

Bakayov, V., Goncalves, R., Zurita-Milla, R., & Izquierdo-Verdiguier, E. (2018). A Spark-based platform to extract phenological information from satellite images. In 2018 IEEE 14th International Conference on e-Science (e-Science) (pp. 354-355) https://doi.org/10.1109/eScience.2018.00095
Bakayov, V. ; Goncalves, R. ; Zurita-Milla, R. ; Izquierdo-Verdiguier, E. / A Spark-based platform to extract phenological information from satellite images. 2018 IEEE 14th International Conference on e-Science (e-Science). 2018. pp. 354-355
@inproceedings{b063ab05f85846d2a70e784feafdc6c4,
title = "A Spark-based platform to extract phenological information from satellite images",
abstract = "Phenology is the study of periodic plant and animal lifecycle events and how these are influenced by seasonal andinter-annual variations in weather and climate, as well as inother environmental factors. Time series of remote sensing(RS) images can be used to characterize land surface phenol-ogy at continental to global scales. For this, the RS imagesare typically transformed into various vegetation indices (VI)such as the normalized difference vegetation index (NDVI) orthe enhanced vegetation index (EVI). These indices can thenbe used to extract various phenological metrics.In our previous work we used cloud computing to generatetemperature-based phenological indices [1], [2], and to relateone phenological metric, namely the Start-of-Season (SOS),with those indices [3], [4]. Here we present an extension ofour work where we use a Spark-based platform to efficientlyextract phenological metrics from time series of NDVI andEVI. This platform allows obtaining and analyzing high spatialresolution metrics (in this case 1km) from 10-day composites.The platform uses the same architecture as in [3], i.e., it isorganized into three layers: a storage layer, a processing layer,and JupyterHub services for user-interaction. It is designedto store the data in well-known file formats like GeoTiffsand Hierarchical Data Format (HDF). For the data analysisthe user expresses the operations in Jupyter notebooks asPython, R, or Scala code (Fig. 1). Hence, with a browser andremote connection, the user can express a research questionand/or collect insights from large data sets. All computationsare pushed down to the computational platform, and resultsfetched back for data visualization.To extract the phenological metrics, we rely on TimeSat[5]. TimeSat is a software package that can be used to fit afunction (e.g. double logistic) to time series of VIs. After that,it uses various approaches to extract vegetation seasonalitymetrics such as SOS. The programs numerical and graphicalroutines are coded in Matlab and Fortran. These routines arehighly vectorized and efficient for use with large data sets.However, distributed processing is required to determine SOSat continental scales. Through an efficient partition of the data,and Spark’s scheduling policies, these single-core routines arescheduled for parallel execution over multiple machines.The study evaluates which VIs and fitting functions are mostFig. 1. Computational platformsuitable for certain vegetation types by comparing the SOSmetrics to volunteered phenological observations curated bythe USA national phenological network [6]. Our preliminaryresults show there can be up to 20-30 days differences inthe SOS depending on the fitting function, the VI and theapproach used to extract the SOS metric. In the South, SOSis around mid-February or March whereas in mountainousregions and the North, the SOS can be as late as June-July. Weare to further evaluate how our results compare to the groundvolunteered observations. This work is then a first steppingstone towards being able to systematically analyze and mapthe impact of climate change on the seasonality of plants. Ourtests show that the platform is scalable and can be extended towork with even higher resolution VIs, such as those that canbe derived from Sentinel-2 images (10 m resolution). Becauseof this, our work opens the door to studies at continental toglobal scales, and to the use of high and very high spatialresolution data.",
author = "V. Bakayov and R. Goncalves and R. Zurita-Milla and E. Izquierdo-Verdiguier",
year = "2018",
doi = "10.1109/eScience.2018.00095",
language = "English",
pages = "354--355",
booktitle = "2018 IEEE 14th International Conference on e-Science (e-Science)",

}

Bakayov, V, Goncalves, R, Zurita-Milla, R & Izquierdo-Verdiguier, E 2018, A Spark-based platform to extract phenological information from satellite images. in 2018 IEEE 14th International Conference on e-Science (e-Science). pp. 354-355, 14th IEEE International Conference on eScience, eScience 2018, Amsterdam, Netherlands, 29/10/18. https://doi.org/10.1109/eScience.2018.00095

A Spark-based platform to extract phenological information from satellite images. / Bakayov, V.; Goncalves, R.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

2018 IEEE 14th International Conference on e-Science (e-Science). 2018. p. 354-355.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - A Spark-based platform to extract phenological information from satellite images

AU - Bakayov, V.

AU - Goncalves, R.

AU - Zurita-Milla, R.

AU - Izquierdo-Verdiguier, E.

PY - 2018

Y1 - 2018

N2 - Phenology is the study of periodic plant and animal lifecycle events and how these are influenced by seasonal andinter-annual variations in weather and climate, as well as inother environmental factors. Time series of remote sensing(RS) images can be used to characterize land surface phenol-ogy at continental to global scales. For this, the RS imagesare typically transformed into various vegetation indices (VI)such as the normalized difference vegetation index (NDVI) orthe enhanced vegetation index (EVI). These indices can thenbe used to extract various phenological metrics.In our previous work we used cloud computing to generatetemperature-based phenological indices [1], [2], and to relateone phenological metric, namely the Start-of-Season (SOS),with those indices [3], [4]. Here we present an extension ofour work where we use a Spark-based platform to efficientlyextract phenological metrics from time series of NDVI andEVI. This platform allows obtaining and analyzing high spatialresolution metrics (in this case 1km) from 10-day composites.The platform uses the same architecture as in [3], i.e., it isorganized into three layers: a storage layer, a processing layer,and JupyterHub services for user-interaction. It is designedto store the data in well-known file formats like GeoTiffsand Hierarchical Data Format (HDF). For the data analysisthe user expresses the operations in Jupyter notebooks asPython, R, or Scala code (Fig. 1). Hence, with a browser andremote connection, the user can express a research questionand/or collect insights from large data sets. All computationsare pushed down to the computational platform, and resultsfetched back for data visualization.To extract the phenological metrics, we rely on TimeSat[5]. TimeSat is a software package that can be used to fit afunction (e.g. double logistic) to time series of VIs. After that,it uses various approaches to extract vegetation seasonalitymetrics such as SOS. The programs numerical and graphicalroutines are coded in Matlab and Fortran. These routines arehighly vectorized and efficient for use with large data sets.However, distributed processing is required to determine SOSat continental scales. Through an efficient partition of the data,and Spark’s scheduling policies, these single-core routines arescheduled for parallel execution over multiple machines.The study evaluates which VIs and fitting functions are mostFig. 1. Computational platformsuitable for certain vegetation types by comparing the SOSmetrics to volunteered phenological observations curated bythe USA national phenological network [6]. Our preliminaryresults show there can be up to 20-30 days differences inthe SOS depending on the fitting function, the VI and theapproach used to extract the SOS metric. In the South, SOSis around mid-February or March whereas in mountainousregions and the North, the SOS can be as late as June-July. Weare to further evaluate how our results compare to the groundvolunteered observations. This work is then a first steppingstone towards being able to systematically analyze and mapthe impact of climate change on the seasonality of plants. Ourtests show that the platform is scalable and can be extended towork with even higher resolution VIs, such as those that canbe derived from Sentinel-2 images (10 m resolution). Becauseof this, our work opens the door to studies at continental toglobal scales, and to the use of high and very high spatialresolution data.

AB - Phenology is the study of periodic plant and animal lifecycle events and how these are influenced by seasonal andinter-annual variations in weather and climate, as well as inother environmental factors. Time series of remote sensing(RS) images can be used to characterize land surface phenol-ogy at continental to global scales. For this, the RS imagesare typically transformed into various vegetation indices (VI)such as the normalized difference vegetation index (NDVI) orthe enhanced vegetation index (EVI). These indices can thenbe used to extract various phenological metrics.In our previous work we used cloud computing to generatetemperature-based phenological indices [1], [2], and to relateone phenological metric, namely the Start-of-Season (SOS),with those indices [3], [4]. Here we present an extension ofour work where we use a Spark-based platform to efficientlyextract phenological metrics from time series of NDVI andEVI. This platform allows obtaining and analyzing high spatialresolution metrics (in this case 1km) from 10-day composites.The platform uses the same architecture as in [3], i.e., it isorganized into three layers: a storage layer, a processing layer,and JupyterHub services for user-interaction. It is designedto store the data in well-known file formats like GeoTiffsand Hierarchical Data Format (HDF). For the data analysisthe user expresses the operations in Jupyter notebooks asPython, R, or Scala code (Fig. 1). Hence, with a browser andremote connection, the user can express a research questionand/or collect insights from large data sets. All computationsare pushed down to the computational platform, and resultsfetched back for data visualization.To extract the phenological metrics, we rely on TimeSat[5]. TimeSat is a software package that can be used to fit afunction (e.g. double logistic) to time series of VIs. After that,it uses various approaches to extract vegetation seasonalitymetrics such as SOS. The programs numerical and graphicalroutines are coded in Matlab and Fortran. These routines arehighly vectorized and efficient for use with large data sets.However, distributed processing is required to determine SOSat continental scales. Through an efficient partition of the data,and Spark’s scheduling policies, these single-core routines arescheduled for parallel execution over multiple machines.The study evaluates which VIs and fitting functions are mostFig. 1. Computational platformsuitable for certain vegetation types by comparing the SOSmetrics to volunteered phenological observations curated bythe USA national phenological network [6]. Our preliminaryresults show there can be up to 20-30 days differences inthe SOS depending on the fitting function, the VI and theapproach used to extract the SOS metric. In the South, SOSis around mid-February or March whereas in mountainousregions and the North, the SOS can be as late as June-July. Weare to further evaluate how our results compare to the groundvolunteered observations. This work is then a first steppingstone towards being able to systematically analyze and mapthe impact of climate change on the seasonality of plants. Ourtests show that the platform is scalable and can be extended towork with even higher resolution VIs, such as those that canbe derived from Sentinel-2 images (10 m resolution). Becauseof this, our work opens the door to studies at continental toglobal scales, and to the use of high and very high spatialresolution data.

UR - https://ezproxy2.utwente.nl/login?url=https://webapps.itc.utwente.nl/library/2018/conf/zuritamilla_spa.pdf

U2 - 10.1109/eScience.2018.00095

DO - 10.1109/eScience.2018.00095

M3 - Conference contribution

SP - 354

EP - 355

BT - 2018 IEEE 14th International Conference on e-Science (e-Science)

ER -

Bakayov V, Goncalves R, Zurita-Milla R, Izquierdo-Verdiguier E. A Spark-based platform to extract phenological information from satellite images. In 2018 IEEE 14th International Conference on e-Science (e-Science). 2018. p. 354-355 https://doi.org/10.1109/eScience.2018.00095