Building Serverless Functions for Spatial Triggers
Automating geospatial workflows requires moving past manual downloads and local processing. Serverless functions provide an event-driven architecture that executes code only when specific conditions are met. By attaching spatial triggers to these functions, you can filter, validate, or preprocess raster data the moment it arrives in a Cloud-Native Spatial Data Lakes environment. This eliminates unnecessary compute costs and ensures downstream pipelines only process relevant imagery.
The core challenge in this pattern is ensuring the function reacts to geographic context rather than blindly processing every file that appears in storage. A precise spatial trigger evaluates whether a newly uploaded raster overlaps with a predefined region of interest (ROI) before initiating heavy computations. This approach is foundational to modern Remote Sensing & Raster Analysis pipelines, where data volume frequently outpaces available processing capacity.
sequenceDiagram
participant S as Object storage
participant F as Serverless function
participant R as Raster (/vsis3/)
participant Q as Downstream queue
S->>F: ObjectCreated event (*.tif)
F->>R: open, read bounds & CRS
R-->>F: bounding box
F->>F: intersects(ROI)?
F-->>S: skip (no overlap)
F->>Q: accept & enqueue (overlap)
Complete Python Implementation
The following script is production-ready for AWS Lambda, Google Cloud Functions, or Azure Functions. It uses GDAL’s virtual file system (/vsis3/) to read cloud-hosted GeoTIFFs without downloading them locally, extracts spatial metadata, and validates intersection against a WGS84 bounding box.
import os
import json
import rasterio
from shapely.geometry import box
from shapely.ops import transform
from pyproj import Transformer
# Optimize GDAL for serverless environments (prevents directory listing timeouts)
os.environ["GDAL_DISABLE_READDIR_ON_OPEN"] = "EMPTY_DIR"
# Predefined ROI in WGS84 (EPSG:4326)
ROI_WGS84 = box(-118.5, 33.8, -117.9, 34.3)
def check_spatial_intersection(raster_path: str) -> bool:
"""
Opens a cloud-hosted raster, extracts its bounding box, and checks
for intersection with a predefined region of interest.
"""
try:
with rasterio.open(raster_path) as src:
if not src.crs:
raise ValueError("Raster lacks CRS metadata")
# Extract native bounds
bounds = src.bounds
raster_geom = box(bounds.left, bounds.bottom, bounds.right, bounds.top)
# Transform to WGS84 if the raster uses a different projection
if str(src.crs) != "EPSG:4326":
transformer = Transformer.from_crs(src.crs, "EPSG:4326", always_xy=True)
raster_geom = transform(transformer.transform, raster_geom)
return raster_geom.intersects(ROI_WGS84)
except Exception as e:
print(f"Spatial check failed: {e}")
return False
def lambda_handler(event: dict, context: dict) -> dict:
"""
Serverless entry point. Parses cloud storage events and routes based on spatial overlap.
"""
try:
# Extract S3 bucket/key from standard cloud storage event payload
record = event["Records"][0]
s3_bucket = record["s3"]["bucket"]["name"]
s3_key = record["s3"]["object"]["key"]
# Use GDAL virtual file system for direct cloud reading
raster_path = f"/vsis3/{s3_bucket}/{s3_key}"
except (KeyError, IndexError) as e:
return {"statusCode": 400, "body": json.dumps({"error": f"Invalid event structure: {e}"})}
if not check_spatial_intersection(raster_path):
return {"statusCode": 200, "body": json.dumps({"status": "skipped", "reason": "No spatial intersection"})}
# Proceed with downstream processing (e.g., queue to SQS, trigger Step Function, run raster algebra)
return {"statusCode": 200, "body": json.dumps({"status": "accepted", "message": "Intersection confirmed. Processing initiated."})}
Rapid Debugging Checklist
Serverless spatial functions fail predictably. Use this checklist to resolve issues in under five minutes:
- Timeout on Cloud Storage Reads
- Symptom: Function times out after 3-5 seconds.
- Fix: Ensure
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIRis set. GDAL attempts to list sibling files by default, which stalls on cloud storage. Also verify your function has outbound internet access or a configured NAT gateway if deployed inside a VPC.
- CRS Mismatch or Transformation Errors
- Symptom:
pyproj.exceptions.ProjErroror false negatives on intersection. - Fix: Confirm
pyproj>=2.1is in your deployment package. Always usealways_xy=TrueinTransformer.from_crsto enforce longitude/latitude order. Validate the raster’s native CRS withrasterio.open(path).crsbefore transformation.
- Missing Spatial Metadata
- Symptom:
ValueError: Raster lacks CRS metadataorsrc.boundsreturns(0, 0, 0, 0). - Fix: The file is likely a standard TIFF, not a GeoTIFF. Verify with
gdalinfo filename.tiflocally. If metadata is missing, inject it during ingestion or reject the file early.
- IAM/Permission Denials
- Symptom:
CPLE_OpenFailedError: /vsis3/...or HTTP 403. - Fix: Attach an IAM policy granting
s3:GetObjectto the function’s execution role. If using private buckets, ensure the region matches the bucket’s region to avoid cross-region latency or auth routing issues.
- Dependency Packaging (GDAL/rasterio)
- Symptom:
ImportError: libgdal.so.32: cannot open shared object file. - Fix: Serverless runtimes require compiled binaries. Use a deployment tool like
Dockerwithpublic.ecr.aws/sam/build-python3.9orAWS Lambda Layersfor GDAL. Consult the official AWS Lambda Python packaging documentation for binary compatibility guidelines.
Wiring to Cloud Storage Events
Deploy the function and attach it to your storage bucket’s notification configuration. For AWS S3, configure an s3:ObjectCreated:* event filter targeting *.tif or *.tiff prefixes. The function will execute synchronously on upload, evaluate the spatial trigger, and return a clean exit if the ROI is not met. For heavier workloads, route the accepted response to an asynchronous queue or workflow orchestrator rather than blocking the serverless execution window.
For advanced virtual file system configurations and direct cloud raster streaming, refer to the rasterio VSI documentation.