Building Serverless Functions for Spatial Triggers
Automating geospatial workflows requires moving past manual downloads and local processing. Serverless functions provide an event-driven architecture...
Remote Sensing & Raster Analysis in Python GIS
A cloud-native spatial data lake is a centralized repository that stores geospatial files directly in cloud object storage, such as Amazon S3 or Google Cloud Storage. Instead of downloading entire datasets to a local machine, analysts stream only the specific geographic areas and spectral bands they need. This architecture fundamentally changes how teams approach Remote Sensing & Raster Analysis, turning multi-terabyte satellite archives into on-demand, queryable resources.
Before writing code, it helps to understand how the underlying components interact. Cloud object storage holds files as flat, addressable objects rather than nested directory trees. Geospatial imagery is stored as rasters: grids of cells where each cell holds a numeric value representing a physical measurement like surface reflectance or elevation. To make these files cloud-friendly, they are formatted as Cloud Optimized GeoTIFFs (COGs). Following the Cloud Optimized GeoTIFF specification, a COG rearranges the internal structure of a standard TIFF into compressed, spatially indexed chunks. This allows Python libraries to request exact byte ranges over standard HTTP, skipping unnecessary data and drastically reducing memory usage.
To locate these files efficiently, data lakes use the SpatioTemporal Asset Catalog (STAC) specification. STAC provides a standardized JSON index that describes what data exists, where it is stored, and its spatial and temporal coverage. For a complete breakdown of how to unify disparate catalogs, see Federating multiple GIS data sources with STAC.
flowchart LR
A["Python client<br/>(pystac-client)"] -->|"search (bbox, time)"| B["STAC catalog"]
B -->|"asset href"| A
A -->|"HTTP range request"| C["COG in<br/>object storage"]
C -->|"only needed tiles"| D["rasterio +<br/>numpy"]
D --> E["Analysis<br/>(NDVI, stats)"]
Install the core Python packages required for cloud storage access, spatial indexing, and raster manipulation:
pip install rasterio pystac-client numpy
pystac-client handles STAC API queries, rasterio reads and writes geospatial grids, and numpy performs fast array mathematics. Ensure your cloud provider credentials are configured in your environment so Python can authenticate with private buckets.
Use pystac-client to search for imagery without knowing exact file paths. The following example queries a public STAC endpoint for Sentinel-2 data over San Francisco during June 2023, filtering for scenes with less than 10% cloud cover.
import pystac_client
# Connect to a public STAC API
catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
# Search parameters
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=[-122.5, 37.7, -122.3, 37.8],
datetime="2023-06-01/2023-06-30",
query={"eo:cloud_cover": {"lt": 10}}
)
items = list(search.items())
if not items:
raise ValueError("No items found matching the search criteria.")
# Extract the first matching item and its asset URL
target_item = items[0]
# Asset keys vary by collection; "visual" is standard for Sentinel-2
asset_url = target_item.assets["visual"].href
print(f"Found asset at: {asset_url}")
The href returned by STAC points directly to the cloud-hosted COG, ready for streaming.
With the asset URL, rasterio can open the remote file and read only the pixels required for your analysis. This avoids loading gigabytes of data into RAM.
import rasterio
import numpy as np
# Open the remote COG directly via HTTP
with rasterio.open(asset_url) as src:
# Read the entire first band
band_data = src.read(1)
# Calculate basic statistics using numpy
valid_pixels = band_data[band_data != 0] # Filter out padding/zero values
mean_val = np.mean(valid_pixels)
print(f"Mean reflectance: {mean_val:.2f}")
This streaming approach scales seamlessly when combined with Raster Algebra and Calculations to derive indices like NDVI or perform multi-band math. If you are working with raw satellite downloads instead of pre-optimized files, the Reading and Processing Satellite Imagery guide covers the necessary preprocessing steps.
Once your analysis pipeline is stable, you can automate data ingestion and trigger processing jobs. Organizations often start by Migrating legacy shapefile archives to cloud storage to consolidate historical records. After consolidation, you can deploy Building serverless functions for spatial triggers to automatically run Python scripts whenever new imagery lands in your storage bucket.
Cloud-native data lakes remove the friction of local storage limits and manual file transfers. By combining STAC for discovery, COGs for efficient streaming, and Python for processing, analysts can work directly with planetary-scale datasets using minimal infrastructure.
Automating geospatial workflows requires moving past manual downloads and local processing. Serverless functions provide an event-driven architecture...
Processing satellite imagery in the cloud frequently triggers unexpected billing spikes because traditional GIS workflows treat remote files as local...
Federating distributed geospatial archives through the SpatioTemporal Asset Catalog (STAC) standard eliminates the need for redundant ingestion...
A spatial data mesh replaces centralized geospatial warehouses with a decentralized, domain-driven architecture. Instead of routing all satellite...
Legacy shapefile archives create immediate friction when moving geospatial workflows to cloud infrastructure. The format requires four or more...
Modern Python geospatial pipelines require infrastructure that supports partial HTTP reads, parallel processing, and strict access controls. Terraform...