Remote Sensing & Raster Analysis in Python GIS

Time Series Analysis of Earth Observation Data

Earth observation satellites systematically revisit the same geographic coordinates, generating multi-temporal datasets that document landscape dynamics across days, seasons, and decades. Time series analysis converts these sequential snapshots into quantitative insights for monitoring deforestation, tracking urban sprawl, assessing agricultural productivity, and evaluating climate resilience. Within the broader discipline of Remote Sensing & Raster Analysis, temporal workflows demand rigorous attention to spatial alignment, atmospheric correction, and computational efficiency. This guide outlines a production-ready Python workflow for ingesting, cleaning, and extracting trends from satellite time series.

Understanding the Temporal Data Cube

Single-date imagery operates in two spatial dimensions, but time series data introduces a third axis: acquisition date. Each pixel transforms from a static value into a chronological vector of observations. Managing this structure efficiently requires shifting from traditional 2D raster libraries to n-dimensional array frameworks that preserve coordinate labels across all axes. Before constructing temporal stacks, practitioners should be comfortable with foundational scene handling, as covered in Reading and Processing Satellite Imagery, particularly regarding coordinate reference systems, geotransforms, and metadata parsing.

Python’s xarray library serves as the standard for labeled, multi-dimensional arrays. When combined with cloud-native discovery protocols like the SpatioTemporal Asset Catalog (STAC), analysts can query public data archives and load imagery lazily. Lazy loading means array operations are queued rather than executed immediately, allowing massive temporal stacks to be manipulated without exhausting local memory.

flowchart LR
    A["Build 4D cube<br/>(time, band, y, x)"] --> B["Cloud mask<br/>(SCL / QA)"]
    B --> C["Compute NDVI<br/>per timestamp"]
    C --> D["Temporal aggregation<br/>(.resample)"]
    D --> E["Trend extraction<br/>(.polyfit slope)"]
import xarray as xr
import stackstac
import pystac_client

# Initialize connection to a public STAC catalog
catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v0")

# Search for Sentinel-2 L2A scenes over a specific region and time window
search = catalog.search(
    collections=["sentinel-s2-l2a-cogs"],
    bbox=[-122.5, 45.0, -122.0, 45.5],
    datetime="2023-01-01/2023-06-01",
    query={"eo:cloud_cover": {"lt": 20}}
)
items = search.item_collection()

# Build a lazy 4D data cube: (time, band, y, x)
stack = stackstac.stack(items, assets=["B04", "B08", "SCL"])
print(stack.dims)  # Output: ('time', 'band', 'y', 'x')

Cloud Masking and Temporal Continuity

Raw optical satellite data contains clouds, cloud shadows, and atmospheric aerosols that introduce artificial spectral drops. These artifacts break temporal continuity and skew trend calculations if left unaddressed. Every modern satellite mission includes a quality assessment (QA) band that classifies each pixel by surface condition. Sentinel-2 uses the Scene Classification Layer (SCL), while Landsat relies on QA_PIXEL. Applying these masks per timestamp ensures that only valid surface reflectance values propagate through the analysis.

# Isolate the quality band
scl = stack.sel(band="SCL")

# Define clear-surface classes (Sentinel-2 L2A: 4=veg, 5=bare soil, 6=water, 7=urban)
clear_mask = (scl >= 4) & (scl <= 7)

# Apply mask to optical bands; invalid pixels become NaN
optical = stack.sel(band=["B04", "B08"])
masked_stack = optical.where(clear_mask)

After masking, temporal gaps often remain where clouds obscured the surface. Analysts typically address these gaps using temporal interpolation (e.g., linear or Savitzky-Golay filtering) or by aggregating overlapping observations from adjacent dates. The choice depends on the application: phenological studies benefit from interpolation, while disturbance detection prioritizes raw, unsmoothed observations.

Spectral Index Computation and Raster Algebra

Once the temporal stack is cleaned, the next step involves deriving biophysical metrics. Spectral indices combine reflectance from specific wavelength bands to highlight features like vegetation vigor, moisture content, or urban heat. Because xarray supports vectorized operations across all dimensions, calculating an index over hundreds of dates requires no explicit loops. This aligns directly with the principles of Raster Algebra and Calculations, extended into the temporal domain.

# Extract Near-Infrared (B08) and Red (B04) bands
nir = masked_stack.sel(band="B08")
red = masked_stack.sel(band="B04")

# Compute Normalized Difference Vegetation Index (NDVI) across all timestamps
ndvi = ((nir - red) / (nir + red)).rename("ndvi")

# Verify dimensions and coordinate alignment
print(ndvi.shape)  # Output: (number_of_dates, y_size, x_size)

The resulting array retains the original time and spatial coordinates, enabling downstream statistical operations without manual index tracking.

Temporal Aggregation and Trend Extraction

High-frequency satellite revisits (e.g., every 2–5 days) produce noisy time series due to residual atmospheric effects and bidirectional reflectance distribution function (BRDF) variations. Aggregating observations into regular intervals smooths this noise while preserving seasonal signals. xarray provides .resample() to bin timestamps into monthly, seasonal, or annual composites using statistical reducers like median, mean, or maximum.

# Generate monthly median composites to reduce noise
monthly_ndvi = ndvi.resample(time="1ME").median()

# Fit a linear trend per pixel across the aggregated timeline
trend_result = monthly_ndvi.polyfit(dim="time", deg=1)

# Extract the slope (rate of change) and intercept
slope = trend_result["ndvi_polyfit_coefficients"].sel(degree=1)

The slope array reveals spatial patterns of greening or browning over the study period. For large-scale analyses, this workflow scales seamlessly using distributed computing frameworks, as detailed in Analyzing vegetation trends with Xarray and Dask.

Scaling for Production Environments

Temporal analysis rarely stays confined to a single machine. As study areas expand to regional or continental scales, memory constraints and I/O bottlenecks become the primary limiting factors. Cloud-native geospatial formats like Cloud-Optimized GeoTIFF (COG) and Zarr enable chunked, parallelized reads that integrate natively with xarray and Dask. By partitioning data along spatial and temporal axes, workflows can execute out-of-core computations across distributed clusters or serverless environments.

When designing production pipelines, prioritize:

  1. Lazy evaluation chains: Defer .compute() until the final aggregation step.
  2. Chunk alignment: Ensure spatial and temporal chunk sizes match the underlying storage layout to avoid cross-chunk reads.
  3. Metadata preservation: Maintain CF-compliant attributes so downstream tools recognize coordinate systems and temporal units.

Conclusion

Time series analysis of Earth observation data transforms raw satellite archives into dynamic environmental narratives. By leveraging n-dimensional arrays, rigorous quality masking, and vectorized algebra, Python GIS practitioners can extract reproducible trends from terabytes of imagery. As cloud-native data lakes and standardized catalogs mature, temporal workflows will become increasingly accessible, enabling faster response to ecological changes and more informed spatial decision-making.

Guides in this topic