Building Serverless Functions for Spatial Triggers

Automating geospatial workflows requires moving past manual downloads and local processing. Serverless functions provide an event-driven architecture that executes code only when specific conditions are met. By attaching spatial triggers to these functions, you can filter, validate, or preprocess raster data the moment it arrives in a Cloud-Native Spatial Data Lakes environment. This eliminates unnecessary compute costs and ensures downstream pipelines only process relevant imagery.

The core challenge in this pattern is ensuring the function reacts to geographic context rather than blindly processing every file that appears in storage. A precise spatial trigger evaluates whether a newly uploaded raster overlaps with a predefined region of interest (ROI) before initiating heavy computations. This approach is foundational to modern Remote Sensing & Raster Analysis pipelines, where data volume frequently outpaces available processing capacity.

sequenceDiagram
    participant S as Object storage
    participant F as Serverless function
    participant R as Raster (/vsis3/)
    participant Q as Downstream queue
    S->>F: ObjectCreated event (*.tif)
    F->>R: open, read bounds & CRS
    R-->>F: bounding box
    F->>F: intersects(ROI)?
    F-->>S: skip (no overlap)
    F->>Q: accept & enqueue (overlap)

Complete Python Implementation

The following script is production-ready for AWS Lambda, Google Cloud Functions, or Azure Functions. It uses GDAL’s virtual file system (/vsis3/) to read cloud-hosted GeoTIFFs without downloading them locally, extracts spatial metadata, and validates intersection against a WGS84 bounding box.

import os
import json
import rasterio
from shapely.geometry import box
from shapely.ops import transform
from pyproj import Transformer

# Optimize GDAL for serverless environments (prevents directory listing timeouts)
os.environ["GDAL_DISABLE_READDIR_ON_OPEN"] = "EMPTY_DIR"

# Predefined ROI in WGS84 (EPSG:4326)
ROI_WGS84 = box(-118.5, 33.8, -117.9, 34.3)

def check_spatial_intersection(raster_path: str) -> bool:
    """
    Opens a cloud-hosted raster, extracts its bounding box, and checks 
    for intersection with a predefined region of interest.
    """
    try:
        with rasterio.open(raster_path) as src:
            if not src.crs:
                raise ValueError("Raster lacks CRS metadata")

            # Extract native bounds
            bounds = src.bounds
            raster_geom = box(bounds.left, bounds.bottom, bounds.right, bounds.top)

            # Transform to WGS84 if the raster uses a different projection
            if str(src.crs) != "EPSG:4326":
                transformer = Transformer.from_crs(src.crs, "EPSG:4326", always_xy=True)
                raster_geom = transform(transformer.transform, raster_geom)

            return raster_geom.intersects(ROI_WGS84)

    except Exception as e:
        print(f"Spatial check failed: {e}")
        return False

def lambda_handler(event: dict, context: dict) -> dict:
    """
    Serverless entry point. Parses cloud storage events and routes based on spatial overlap.
    """
    try:
        # Extract S3 bucket/key from standard cloud storage event payload
        record = event["Records"][0]
        s3_bucket = record["s3"]["bucket"]["name"]
        s3_key = record["s3"]["object"]["key"]

        # Use GDAL virtual file system for direct cloud reading
        raster_path = f"/vsis3/{s3_bucket}/{s3_key}"
    except (KeyError, IndexError) as e:
        return {"statusCode": 400, "body": json.dumps({"error": f"Invalid event structure: {e}"})}

    if not check_spatial_intersection(raster_path):
        return {"statusCode": 200, "body": json.dumps({"status": "skipped", "reason": "No spatial intersection"})}

    # Proceed with downstream processing (e.g., queue to SQS, trigger Step Function, run raster algebra)
    return {"statusCode": 200, "body": json.dumps({"status": "accepted", "message": "Intersection confirmed. Processing initiated."})}

Rapid Debugging Checklist

Serverless spatial functions fail predictably. Use this checklist to resolve issues in under five minutes:

  1. Timeout on Cloud Storage Reads
  • Symptom: Function times out after 3-5 seconds.
  • Fix: Ensure GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR is set. GDAL attempts to list sibling files by default, which stalls on cloud storage. Also verify your function has outbound internet access or a configured NAT gateway if deployed inside a VPC.
  1. CRS Mismatch or Transformation Errors
  • Symptom: pyproj.exceptions.ProjError or false negatives on intersection.
  • Fix: Confirm pyproj>=2.1 is in your deployment package. Always use always_xy=True in Transformer.from_crs to enforce longitude/latitude order. Validate the raster’s native CRS with rasterio.open(path).crs before transformation.
  1. Missing Spatial Metadata
  • Symptom: ValueError: Raster lacks CRS metadata or src.bounds returns (0, 0, 0, 0).
  • Fix: The file is likely a standard TIFF, not a GeoTIFF. Verify with gdalinfo filename.tif locally. If metadata is missing, inject it during ingestion or reject the file early.
  1. IAM/Permission Denials
  • Symptom: CPLE_OpenFailedError: /vsis3/... or HTTP 403.
  • Fix: Attach an IAM policy granting s3:GetObject to the function’s execution role. If using private buckets, ensure the region matches the bucket’s region to avoid cross-region latency or auth routing issues.
  1. Dependency Packaging (GDAL/rasterio)
  • Symptom: ImportError: libgdal.so.32: cannot open shared object file.
  • Fix: Serverless runtimes require compiled binaries. Use a deployment tool like Docker with public.ecr.aws/sam/build-python3.9 or AWS Lambda Layers for GDAL. Consult the official AWS Lambda Python packaging documentation for binary compatibility guidelines.

Wiring to Cloud Storage Events

Deploy the function and attach it to your storage bucket’s notification configuration. For AWS S3, configure an s3:ObjectCreated:* event filter targeting *.tif or *.tiff prefixes. The function will execute synchronously on upload, evaluate the spatial trigger, and return a clean exit if the ROI is not met. For heavier workloads, route the accepted response to an asynchronous queue or workflow orchestrator rather than blocking the serverless execution window.

For advanced virtual file system configurations and direct cloud raster streaming, refer to the rasterio VSI documentation.