How to Merge Multiple Raster Tiles into a Single Mosaic in Python

Satellite imagery and aerial surveys are rarely delivered as single, seamless files. Instead, data providers distribute imagery in manageable grid tiles to optimize storage, bandwidth, and download reliability. When your area of interest spans multiple tile boundaries, you must stitch those fragments together into a continuous spatial dataset. This process, known as raster mosaicking, aligns geographic footprints, resolves overlapping pixels, and produces a unified extent. Mastering this workflow is a foundational requirement in Remote Sensing & Raster Analysis and serves as a critical preprocessing step before Reading and Processing Satellite Imagery.

Understanding the Mosaicking Process

Mosaicking is fundamentally different from standard image concatenation. The underlying algorithm must first compute a unified bounding box that encompasses the spatial footprint of every input tile. Before merging can occur, all datasets must share an identical Coordinate Reference System (CRS)—the mathematical framework that maps pixel coordinates to real-world locations—and a consistent ground sampling distance (pixel resolution). Mismatched projections or differing pixel sizes will trigger geometric distortion or cause the operation to fail entirely.

The algorithm relies heavily on nodata values, which are special numeric markers that indicate missing, masked, or invalid measurements. During the merge, the software uses these masks to ignore empty space and prioritize valid pixels from overlapping regions. You control how conflicts in overlapping zones are resolved by selecting an overlap strategy:

  • first: Retains the pixel value from the topmost dataset encountered.
  • last: Uses the value from the bottommost dataset.
  • min / max: Selects the lowest or highest pixel value across overlaps.
  • mean: Computes the arithmetic average of overlapping pixels.

Choosing the appropriate strategy depends on your sensor type and analytical objectives. For example, optical imagery often uses first to preserve cloud-free pixels, while elevation models typically require max or mean to smooth terrain transitions.

Step-by-Step Implementation

The Python geospatial ecosystem handles this workflow efficiently through rasterio, a high-performance library built on the GDAL engine. A production-ready mosaicking script follows a predictable, memory-conscious sequence:

flowchart LR
    A["Discover<br/>tile files"] --> B["Open dataset<br/>handles"]
    B --> C["merge()<br/>with overlap rule"]
    C --> D["Rebuild<br/>metadata"]
    D --> E["Export<br/>GeoTIFF mosaic"]
  1. Discover input files: Scan a directory or read a manifest to collect absolute paths for all relevant raster tiles.
  2. Initialize dataset handles: Open each file in read mode. rasterio employs lazy loading, meaning it only reads headers and spatial metadata until pixel arrays are explicitly requested.
  3. Compute and execute the merge: Pass the open dataset objects to the merge function alongside your chosen overlap rule. The library calculates the optimal output transform (the mathematical mapping from pixel rows/columns to geographic coordinates) and allocates memory for the resulting array.
  4. Rebuild metadata: Copy the profile (a dictionary containing driver, CRS, dtype, and compression settings) from a source file and update it with the new dimensions, spatial transform, and band count.
  5. Export the mosaic: Write the merged array to a new GeoTIFF, ensuring compression and block tiling are configured for optimal read performance.

Complete Python Script

The following script implements the full workflow. It assumes all input tiles are GeoTIFFs located in a single directory and share matching CRS and resolution.

import glob
import os
import rasterio
from rasterio.merge import merge

def mosaic_raster_tiles(input_dir, output_path, overlap_method="first"):
    """
    Merges multiple GeoTIFF tiles into a single raster mosaic.
    Assumes all inputs share the same CRS and pixel resolution.
    """
    # 1. Collect all .tif files in the target directory
    raster_files = glob.glob(os.path.join(input_dir, "*.tif"))
    if not raster_files:
        raise FileNotFoundError(f"No raster tiles found in {input_dir}")

    # 2. Open datasets using context managers for safe resource handling
    # We read the profile from the first file to establish baseline metadata
    with rasterio.open(raster_files[0]) as src_template:
        src_profile = src_template.profile.copy()

    # Open all datasets for the merge operation
    src_datasets = [rasterio.open(fp) for fp in raster_files]

    try:
        # 3. Perform the merge
        # Returns a tuple: (merged_numpy_array, output_spatial_transform)
        mosaic_array, out_transform = merge(
            src_datasets,
            method=overlap_method,
            nodata=src_profile.get("nodata", None)
        )

        # 4. Update metadata for the new mosaic
        src_profile.update({
            "height": mosaic_array.shape[1],
            "width": mosaic_array.shape[2],
            "transform": out_transform,
            "count": mosaic_array.shape[0],
            "compress": "lzw",      # Lossless compression reduces file size
            "tiled": True,          # Enables faster spatial window reads
            "blockxsize": 256,
            "blockysize": 256
        })

        # 5. Write the output to disk
        with rasterio.open(output_path, "w", **src_profile) as dest:
            dest.write(mosaic_array)

        print(f"Successfully created mosaic at: {output_path}")

    finally:
        # Ensure all file handles are closed to prevent memory leaks
        for ds in src_datasets:
            ds.close()

# Example usage
if __name__ == "__main__":
    TILE_DIR = "./data/satellite_tiles"
    OUTPUT_FILE = "./output/region_mosaic.tif"
    mosaic_raster_tiles(TILE_DIR, OUTPUT_FILE, overlap_method="first")

Production Considerations and Optimization

While rasterio.merge performs reliably for moderate-sized projects, loading hundreds of high-resolution tiles into system memory can exhaust available RAM. For enterprise-scale workflows, consider generating a Virtual Raster (VRT) instead of a physical file. A VRT is a lightweight XML metadata file that references the original tiles without duplicating pixel data, allowing downstream GIS software to read the mosaic on-the-fly. You can create one programmatically using GDAL’s Python bindings (osgeo.gdal.BuildVRT()) or via the official GDAL buildvrt utility.

When working with cloud-native formats like Cloud Optimized GeoTIFFs (COGs), you can bypass local downloads entirely. By leveraging HTTP range requests, Python libraries stream only the required pixel blocks directly from object storage. This approach dramatically reduces local disk usage and accelerates distributed processing pipelines. For official guidance on structuring spatial data for cloud environments, consult the Cloud Optimized GeoTIFF specification. Additionally, understanding how to configure spatial data lakes is essential for scalable remote sensing architectures; the Open Geospatial Consortium (OGC) provides comprehensive standards for cloud-native geospatial interoperability and API design.