Abstract
The processing of earth observation big data (EOBD) in distributed environments has increased significantly, driven by advances in satellite technology and the growing number of earth observation missions. This massive influx of data presents unprecedented opportunities for environmental monitoring, climate change studies, and natural resource management, while simultaneously posing significant computational challenges. Cloud computing has emerged as an enabler for handling such EOBD, offering scalable computational resources, flexible storage solutions, and on-demand processing capabilities through platforms such as Google Earth Engine (GEE), AWS SageMaker, OpenEO, and Pangeo Cloud.
While these cloud-based EOBD processing platforms offer varying levels of monitoring capabilities to help users understand their workflow execution, they primarily focus on traditional performance metrics. GEE provides basic performance insights focusing on task execution status, AWS SageMaker offers comprehensive resource utilization metrics through Amazon CloudWatch, and Pangeo Cloud implements the Dask profiler for real-time monitoring of cluster performance. However, a significant gap exists: none of these platforms incorporate energy consumption as a standard monitoring metric. This limitation becomes increasingly critical as the scientific community grows more concerned about the environmental impact of large-scale data processing operations.
The absence of energy-related metrics from monitoring may hinder users from understanding the environmental impact associated with their EOBD processing workflows. This knowledge is particularly crucial in the earth observation domain, where the balance between computational requirements and environmental impact directly aligns with the field's core mission of environmental protection. Furthermore, recent green computing initiatives have emphasized the importance of sustainable IT infrastructure, yet the lack of standardized energy consumption metrics in EOBD processing platforms hinders researchers' ability to make informed decisions about computational resource usage.
To address this gap, we propose a monitoring toolkit for understanding the energy consumption patterns in distributed EOBD processing. We develop an integrated approach that combines multi-level energy measurements: (1) hardware-level power data collected through RAPL for CPU and DRAM, IPMI for system-level metrics, and external power sensors for overall consumption; (2) software-level resource utilization metrics from the operating system including CPU usage, memory allocation, I/O operations, and network traffic; and (3) application-level profiling through integration with Dask's distributed processing framework. Our methodology employs power ratio modeling to correlate these measurements and estimate process-level energy consumption, enabling fine-grained energy profiling of EOBD workflows.
The toolkit generates comprehensive monitoring reports that include energy consumption patterns, resource utilization correlations, and efficiency metrics, allowing users to make informed decisions about their processing strategies. By providing visibility into the energy consumption of computational workflows, this work contributes to the development of more sustainable EOBD processing practices. The toolkit enables users to better evaluate the true environmental cost of their computational workflows and optimize their processing strategies accordingly, supporting the broader goal of environmental protection through more energy-efficient earth observation data processing.
While these cloud-based EOBD processing platforms offer varying levels of monitoring capabilities to help users understand their workflow execution, they primarily focus on traditional performance metrics. GEE provides basic performance insights focusing on task execution status, AWS SageMaker offers comprehensive resource utilization metrics through Amazon CloudWatch, and Pangeo Cloud implements the Dask profiler for real-time monitoring of cluster performance. However, a significant gap exists: none of these platforms incorporate energy consumption as a standard monitoring metric. This limitation becomes increasingly critical as the scientific community grows more concerned about the environmental impact of large-scale data processing operations.
The absence of energy-related metrics from monitoring may hinder users from understanding the environmental impact associated with their EOBD processing workflows. This knowledge is particularly crucial in the earth observation domain, where the balance between computational requirements and environmental impact directly aligns with the field's core mission of environmental protection. Furthermore, recent green computing initiatives have emphasized the importance of sustainable IT infrastructure, yet the lack of standardized energy consumption metrics in EOBD processing platforms hinders researchers' ability to make informed decisions about computational resource usage.
To address this gap, we propose a monitoring toolkit for understanding the energy consumption patterns in distributed EOBD processing. We develop an integrated approach that combines multi-level energy measurements: (1) hardware-level power data collected through RAPL for CPU and DRAM, IPMI for system-level metrics, and external power sensors for overall consumption; (2) software-level resource utilization metrics from the operating system including CPU usage, memory allocation, I/O operations, and network traffic; and (3) application-level profiling through integration with Dask's distributed processing framework. Our methodology employs power ratio modeling to correlate these measurements and estimate process-level energy consumption, enabling fine-grained energy profiling of EOBD workflows.
The toolkit generates comprehensive monitoring reports that include energy consumption patterns, resource utilization correlations, and efficiency metrics, allowing users to make informed decisions about their processing strategies. By providing visibility into the energy consumption of computational workflows, this work contributes to the development of more sustainable EOBD processing practices. The toolkit enables users to better evaluate the true environmental cost of their computational workflows and optimize their processing strategies accordingly, supporting the broader goal of environmental protection through more energy-efficient earth observation data processing.
| Original language | English |
|---|---|
| DOIs | |
| Publication status | Published - 26 Jun 2025 |
| Event | Living Planet Symposium 2025: From Observation to Climate Action and Sustainability for Earth - Austria Center Vienna (ACV), Vienna, Austria Duration: 23 Jul 2025 → 27 Jul 2025 https://lps25.esa.int/ |
Conference
| Conference | Living Planet Symposium 2025 |
|---|---|
| Country/Territory | Austria |
| City | Vienna |
| Period | 23/07/25 → 27/07/25 |
| Internet address |