How to Load Sentinel-2 Imagery with Rasterio
Loading satellite imagery efficiently is the foundational step in any geospatial analysis pipeline. The European Space Agency’s Sentinel-2 mission delivers high-resolution multispectral data that powers environmental monitoring, precision agriculture, and land cover mapping. While the imagery is freely accessible, its directory structure and file formats can quickly overwhelm beginners. Rasterio, a Python library built on the GDAL engine, provides a clean, Pythonic interface for reading, inspecting, and processing raster data. This guide walks you through loading Sentinel-2 imagery with Rasterio, emphasizing spatial metadata handling, memory management, and production-ready code patterns. For broader context on satellite data pipelines, explore the foundational concepts covered in Remote Sensing & Raster Analysis.
Understanding Sentinel-2 Data Structure
Before writing code, you must understand how Sentinel-2 products are organized. Traditional Level-1C (top-of-atmosphere reflectance) and Level-2A (surface reflectance) products are distributed as .SAFE directories. Inside each folder, spectral bands are stored as individual JPEG2000 (.jp2) files nested within a GRANULE directory. Each file corresponds to a specific wavelength and spatial resolution: 10 meters (visible and near-infrared), 20 meters (red-edge and shortwave infrared), or 60 meters (atmospheric bands).
Modern workflows increasingly bypass local .SAFE archives in favor of Cloud-Optimized GeoTIFFs (COGs) or Virtual Rasters (.vrt). COGs allow HTTP range requests, meaning you can read specific image tiles without downloading the entire file. Rasterio handles both legacy .jp2 archives and modern cloud-native formats seamlessly. For a comprehensive overview of data acquisition, format conversion, and preprocessing pipelines, refer to our guide on Reading and Processing Satellite Imagery.
flowchart TD
A[".SAFE product"] --> B["GRANULE<br/>directory"]
B --> C["IMG_DATA"]
C --> D["R10m (.jp2)<br/>visible, NIR"]
C --> E["R20m (.jp2)<br/>red-edge, SWIR"]
C --> F["R60m (.jp2)<br/>atmospheric"]
Prerequisites
You will need Python 3.9 or newer and the rasterio package installed. Rasterio relies on compiled GDAL binaries, so using a package manager like conda is highly recommended to avoid dependency conflicts and ensure spatial library compatibility.
# Using conda (recommended for geospatial libraries)
conda install -c conda-forge rasterio numpy
# Or using pip (ensure your system has GDAL and PROJ installed)
pip install rasterio numpy
Ensure your Sentinel-2 files are accessible locally or via a supported cloud storage path. You will also need numpy for array manipulation.
Step 1: Reading a Single Spectral Band
We begin by opening a single band, extracting its spatial metadata, and reading pixel values into a NumPy array. Spatial concepts like the Coordinate Reference System (CRS), affine geotransform, and nodata values must be captured before any mathematical operations.
import rasterio
import numpy as np
# Path to a single Sentinel-2 band (e.g., Band 4 - Red, 20m resolution)
band_path = "S2A_MSIL2A_20230615T100031_N0509_R122_T33TVE_20230615T143045.SAFE/GRANULE/L2A_T33TVE_20230615T100031/IMG_DATA/R20m/T33TVE_20230615T100031_B04_20m.jp2"
with rasterio.open(band_path) as src:
# Extract spatial metadata
print(f"Driver: {src.driver}")
print(f"Dimensions: {src.width} x {src.height}")
print(f"CRS: {src.crs}")
print(f"Affine Transform: {src.transform}")
print(f"Nodata Value: {src.nodata}")
print(f"Data Type: {src.dtypes[0]}")
# Read band into a NumPy array
band_array = src.read(1)
print(f"Array shape: {band_array.shape}")
print(f"Min/Max reflectance: {band_array.min()} / {band_array.max()}")
Key metadata explained:
crs: Defines how pixel coordinates map to real-world geographic locations (typically EPSG:326xx for UTM zones).transform: An affine transformation matrix that converts row/column indices to geographic coordinates.nodata: The sentinel value representing missing or invalid pixels (often0orNonein Sentinel-2 products). Always mask these before calculations.
Step 2: Stacking Multiple Bands
Sentinel-2 analysis rarely uses a single band. You will typically stack bands to create false-color composites or feed machine learning models. Rasterio allows you to read multiple bands simultaneously by passing a list of indices to read().
# Paths to 10m bands: Blue (2), Green (3), Red (4), NIR (8)
band_paths = {
"B02": "path/to/B02_10m.jp2",
"B03": "path/to/B03_10m.jp2",
"B04": "path/to/B04_10m.jp2",
"B08": "path/to/B08_10m.jp2"
}
stacked_data = {}
meta = None
for name, path in band_paths.items():
with rasterio.open(path) as src:
if meta is None:
meta = {
"crs": src.crs,
"transform": src.transform,
"width": src.width,
"height": src.height
}
stacked_data[name] = src.read(1)
# Stack into a single 3D array (bands, height, width)
cube = np.stack(list(stacked_data.values()), axis=0)
print(f"Multiband cube shape: {cube.shape}")
Important note on resolution: Sentinel-2 bands have different native resolutions. Never stack 10m and 20m bands directly without resampling. Use rasterio.warp.reproject or rasterio.merge.merge to align grids before stacking.
Step 3: Memory-Efficient Windowed Reading
Loading a full Sentinel-2 scene (often 10,000×10,000 pixels per band) into memory will trigger MemoryError on standard machines. Rasterio supports windowed reading, which processes imagery in manageable chunks.
from rasterio.windows import Window
chunk_size = 1024 # Process 1024x1024 pixel tiles
with rasterio.open(band_paths["B04"]) as src:
for row in range(0, src.height, chunk_size):
for col in range(0, src.width, chunk_size):
width = min(chunk_size, src.width - col)
height = min(chunk_size, src.height - row)
window = Window(col, row, width, height)
chunk = src.read(1, window=window)
# Process chunk here (e.g., calculate NDVI, apply filters)
# Avoid storing all chunks; write results to disk or stream them
Windowed reading pairs naturally with rasterio.open(..., mode="w") to write processed tiles to a new output file incrementally.
Step 4: Direct Cloud Access Without Local Downloads
You do not need to download .SAFE archives to work with Sentinel-2. Rasterio natively supports HTTP range requests via GDAL’s /vsicurl/ driver, enabling direct reads from cloud storage providers like AWS S3 or Microsoft Planetary Computer.
# Example: Direct HTTP read from a public Sentinel-2 COG
cog_url = "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/33/T/VE/2023/6/S2A_33TVE_20230615_0_L2A/B04.tif"
with rasterio.open(cog_url) as src:
# Only reads the header and requested tile, not the full 500MB file
thumbnail = src.read(1, out_shape=(1, 512, 512))
print(f"Cloud-loaded thumbnail shape: {thumbnail.shape}")
This approach drastically reduces storage costs and accelerates prototyping. For official documentation on Rasterio’s cloud capabilities and GDAL virtual file systems, consult the Rasterio documentation and the ESA Sentinel-2 User Handbook.
Production Best Practices
- Always use context managers:
with rasterio.open(...) as src:guarantees file handles are closed, preventing memory leaks in long-running scripts. - Validate CRS alignment: Before performing raster algebra, verify that all bands share identical
crsandtransformvalues. Misaligned grids produce silent spatial errors. - Handle
nodataexplicitly: Sentinel-2 JPEG2000 files sometimes lack embeddednodatavalues. Apply masks usingsrc.read_masks(1)or threshold invalid reflectance values (e.g.,<0or>10000for scaled L2A data). - Scale reflectance values: Level-2A products store surface reflectance as integers scaled by 10,000. Divide by
10000.0to obtain physical reflectance values between0.0and1.0.
Mastering these loading patterns ensures your downstream workflows—whether time series analysis, image classification, or raster algebra—operate reliably at scale.