How to Load Sentinel-2 Imagery with Rasterio

Loading satellite imagery efficiently is the foundational step in any geospatial analysis pipeline. The European Space Agency’s Sentinel-2 mission delivers high-resolution multispectral data that powers environmental monitoring, precision agriculture, and land cover mapping. While the imagery is freely accessible, its directory structure and file formats can quickly overwhelm beginners. Rasterio, a Python library built on the GDAL engine, provides a clean, Pythonic interface for reading, inspecting, and processing raster data. This guide walks you through loading Sentinel-2 imagery with Rasterio, emphasizing spatial metadata handling, memory management, and production-ready code patterns. For broader context on satellite data pipelines, explore the foundational concepts covered in Remote Sensing & Raster Analysis.

Understanding Sentinel-2 Data Structure

Before writing code, you must understand how Sentinel-2 products are organized. Traditional Level-1C (top-of-atmosphere reflectance) and Level-2A (surface reflectance) products are distributed as .SAFE directories. Inside each folder, spectral bands are stored as individual JPEG2000 (.jp2) files nested within a GRANULE directory. Each file corresponds to a specific wavelength and spatial resolution: 10 meters (visible and near-infrared), 20 meters (red-edge and shortwave infrared), or 60 meters (atmospheric bands).

Modern workflows increasingly bypass local .SAFE archives in favor of Cloud-Optimized GeoTIFFs (COGs) or Virtual Rasters (.vrt). COGs allow HTTP range requests, meaning you can read specific image tiles without downloading the entire file. Rasterio handles both legacy .jp2 archives and modern cloud-native formats seamlessly. For a comprehensive overview of data acquisition, format conversion, and preprocessing pipelines, refer to our guide on Reading and Processing Satellite Imagery.

flowchart TD
    A[".SAFE product"] --> B["GRANULE<br/>directory"]
    B --> C["IMG_DATA"]
    C --> D["R10m (.jp2)<br/>visible, NIR"]
    C --> E["R20m (.jp2)<br/>red-edge, SWIR"]
    C --> F["R60m (.jp2)<br/>atmospheric"]

Prerequisites

You will need Python 3.9 or newer and the rasterio package installed. Rasterio relies on compiled GDAL binaries, so using a package manager like conda is highly recommended to avoid dependency conflicts and ensure spatial library compatibility.

# Using conda (recommended for geospatial libraries)
conda install -c conda-forge rasterio numpy

# Or using pip (ensure your system has GDAL and PROJ installed)
pip install rasterio numpy

Ensure your Sentinel-2 files are accessible locally or via a supported cloud storage path. You will also need numpy for array manipulation.

Step 1: Reading a Single Spectral Band

We begin by opening a single band, extracting its spatial metadata, and reading pixel values into a NumPy array. Spatial concepts like the Coordinate Reference System (CRS), affine geotransform, and nodata values must be captured before any mathematical operations.

import rasterio
import numpy as np

# Path to a single Sentinel-2 band (e.g., Band 4 - Red, 20m resolution)
band_path = "S2A_MSIL2A_20230615T100031_N0509_R122_T33TVE_20230615T143045.SAFE/GRANULE/L2A_T33TVE_20230615T100031/IMG_DATA/R20m/T33TVE_20230615T100031_B04_20m.jp2"

with rasterio.open(band_path) as src:
    # Extract spatial metadata
    print(f"Driver: {src.driver}")
    print(f"Dimensions: {src.width} x {src.height}")
    print(f"CRS: {src.crs}")
    print(f"Affine Transform: {src.transform}")
    print(f"Nodata Value: {src.nodata}")
    print(f"Data Type: {src.dtypes[0]}")

    # Read band into a NumPy array
    band_array = src.read(1)
    print(f"Array shape: {band_array.shape}")
    print(f"Min/Max reflectance: {band_array.min()} / {band_array.max()}")

Key metadata explained:

  • crs: Defines how pixel coordinates map to real-world geographic locations (typically EPSG:326xx for UTM zones).
  • transform: An affine transformation matrix that converts row/column indices to geographic coordinates.
  • nodata: The sentinel value representing missing or invalid pixels (often 0 or None in Sentinel-2 products). Always mask these before calculations.

Step 2: Stacking Multiple Bands

Sentinel-2 analysis rarely uses a single band. You will typically stack bands to create false-color composites or feed machine learning models. Rasterio allows you to read multiple bands simultaneously by passing a list of indices to read().

# Paths to 10m bands: Blue (2), Green (3), Red (4), NIR (8)
band_paths = {
    "B02": "path/to/B02_10m.jp2",
    "B03": "path/to/B03_10m.jp2",
    "B04": "path/to/B04_10m.jp2",
    "B08": "path/to/B08_10m.jp2"
}

stacked_data = {}
meta = None

for name, path in band_paths.items():
    with rasterio.open(path) as src:
        if meta is None:
            meta = {
                "crs": src.crs,
                "transform": src.transform,
                "width": src.width,
                "height": src.height
            }
        stacked_data[name] = src.read(1)

# Stack into a single 3D array (bands, height, width)
cube = np.stack(list(stacked_data.values()), axis=0)
print(f"Multiband cube shape: {cube.shape}")

Important note on resolution: Sentinel-2 bands have different native resolutions. Never stack 10m and 20m bands directly without resampling. Use rasterio.warp.reproject or rasterio.merge.merge to align grids before stacking.

Step 3: Memory-Efficient Windowed Reading

Loading a full Sentinel-2 scene (often 10,000×10,000 pixels per band) into memory will trigger MemoryError on standard machines. Rasterio supports windowed reading, which processes imagery in manageable chunks.

from rasterio.windows import Window

chunk_size = 1024  # Process 1024x1024 pixel tiles

with rasterio.open(band_paths["B04"]) as src:
    for row in range(0, src.height, chunk_size):
        for col in range(0, src.width, chunk_size):
            width = min(chunk_size, src.width - col)
            height = min(chunk_size, src.height - row)

            window = Window(col, row, width, height)
            chunk = src.read(1, window=window)

            # Process chunk here (e.g., calculate NDVI, apply filters)
            # Avoid storing all chunks; write results to disk or stream them

Windowed reading pairs naturally with rasterio.open(..., mode="w") to write processed tiles to a new output file incrementally.

Step 4: Direct Cloud Access Without Local Downloads

You do not need to download .SAFE archives to work with Sentinel-2. Rasterio natively supports HTTP range requests via GDAL’s /vsicurl/ driver, enabling direct reads from cloud storage providers like AWS S3 or Microsoft Planetary Computer.

# Example: Direct HTTP read from a public Sentinel-2 COG
cog_url = "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/33/T/VE/2023/6/S2A_33TVE_20230615_0_L2A/B04.tif"

with rasterio.open(cog_url) as src:
    # Only reads the header and requested tile, not the full 500MB file
    thumbnail = src.read(1, out_shape=(1, 512, 512))
    print(f"Cloud-loaded thumbnail shape: {thumbnail.shape}")

This approach drastically reduces storage costs and accelerates prototyping. For official documentation on Rasterio’s cloud capabilities and GDAL virtual file systems, consult the Rasterio documentation and the ESA Sentinel-2 User Handbook.

Production Best Practices

  • Always use context managers: with rasterio.open(...) as src: guarantees file handles are closed, preventing memory leaks in long-running scripts.
  • Validate CRS alignment: Before performing raster algebra, verify that all bands share identical crs and transform values. Misaligned grids produce silent spatial errors.
  • Handle nodata explicitly: Sentinel-2 JPEG2000 files sometimes lack embedded nodata values. Apply masks using src.read_masks(1) or threshold invalid reflectance values (e.g., <0 or >10000 for scaled L2A data).
  • Scale reflectance values: Level-2A products store surface reflectance as integers scaled by 10,000. Divide by 10000.0 to obtain physical reflectance values between 0.0 and 1.0.

Mastering these loading patterns ensures your downstream workflows—whether time series analysis, image classification, or raster algebra—operate reliably at scale.