Detecting Buildings from Aerial Imagery Using YOLOv8
Bridging high-speed computer vision with precise geospatial coordinate systems is one of the most common challenges in modern spatial data science. Standard object detection models output bounding boxes in pixel space, but geographic information systems (GIS) demand real-world coordinates, proper projection handling, and spatially aware post-processing. This guide provides a complete, production-ready Python pipeline to tile large orthomosaics, run YOLOv8 inference, transform pixel predictions into geographic polygons, and export validated building footprints.
The complete building-detection pipeline is shown below.
flowchart LR
A["Orthomosaic<br/>(GeoTIFF)"] --> B["Tile<br/>(overlap)"]
B --> C["YOLOv8 inference<br/>(pixel boxes)"]
C --> D["Pixel to geo<br/>(affine transform)"]
D --> E["Dedupe<br/>(IoU-based NMS)"]
E --> F["Export<br/>(GeoJSON / Shapefile)"]
Environment Setup and Prerequisites
Before executing the pipeline, ensure your environment includes the necessary geospatial and machine learning libraries. The workflow relies on ultralytics for model inference, rasterio for geospatial raster I/O, geopandas and shapely for vector geometry manipulation, and torch for GPU acceleration.
pip install ultralytics rasterio geopandas shapely numpy opencv-python-headless
Your input imagery must be georeferenced (GeoTIFF format with embedded Coordinate Reference System and affine transform). If you are working with drone orthomosaics or satellite tiles, verify that the CRS is consistent across the entire dataset. Mixed projections will cause coordinate drift during the transformation phase.
Step 1: Tiling Large Aerial Imagery
Aerial imagery frequently exceeds GPU memory limits. Tiling splits large rasters into manageable, overlapping patches. Overlap is critical to prevent buildings from being sliced at tile boundaries, which degrades detection accuracy and fragments footprints.
import rasterio
import os
from pathlib import Path
def tile_aerial_image(image_path: str, output_dir: str, tile_size: int = 640, overlap: int = 128) -> None:
"""Splits a georeferenced raster into overlapping tiles while preserving spatial metadata."""
Path(output_dir).mkdir(parents=True, exist_ok=True)
with rasterio.open(image_path) as src:
height, width = src.height, src.width
step = tile_size - overlap
for y in range(0, height, step):
for x in range(0, width, step):
# Define window and clamp to image boundaries
window = rasterio.windows.Window(x, y, tile_size, tile_size)
window = window.intersection(rasterio.windows.Window(0, 0, width, height))
tile = src.read(window=window)
if tile.max() == 0: # Skip empty/black tiles
continue
tile_path = os.path.join(output_dir, f"tile_{y}_{x}.tif")
with rasterio.open(
tile_path, 'w', driver='GTiff',
height=window.height, width=window.width,
count=src.count, dtype=tile.dtype, crs=src.crs,
transform=rasterio.windows.transform(window, src.transform)
) as dst:
dst.write(tile)
This function preserves the original affine transform and CRS for each tile, which is essential for later coordinate mapping. Proper tiling also serves as a foundation for Feature Engineering for Spatial Models, allowing you to inject elevation, NDVI, or shadow masks into each patch before feeding them into the neural network.
Step 2: Running YOLOv8 Inference
Once the imagery is tiled, we load a pre-trained YOLOv8 model and run batch inference. The model will return bounding boxes in (x1, y1, x2, y2) pixel coordinates relative to each tile.
from ultralytics import YOLO
import glob
import torch
def run_yolo_inference(tile_dir: str, model_name: str = "yolov8n.pt", conf_thresh: float = 0.45):
"""Loads YOLOv8 and runs inference on all tiles, returning raw predictions."""
device = "cuda" if torch.cuda.is_available() else "cpu"
model = YOLO(model_name).to(device)
tile_paths = sorted(glob.glob(os.path.join(tile_dir, "*.tif")))
predictions = []
for tile_path in tile_paths:
results = model.predict(
source=tile_path,
conf=conf_thresh,
verbose=False,
device=device
)
# Extract boxes, confidence scores, and tile metadata
for result in results:
if result.boxes is not None and len(result.boxes) > 0:
boxes = result.boxes.xyxy.cpu().numpy()
confs = result.boxes.conf.cpu().numpy()
predictions.append({
"tile_path": tile_path,
"boxes": boxes,
"confs": confs
})
return predictions
The architecture behind YOLOv8 represents a significant leap in Deep Learning for Object Detection, offering anchor-free detection heads and improved gradient flow that translate directly to higher recall on small, densely packed urban structures.
Step 3: Transforming Pixel Predictions to Geographic Coordinates
This is the core GIS step. We must convert pixel coordinates back to real-world geographic coordinates using each tile’s stored affine transform. rasterio provides robust utilities for this conversion.
import geopandas as gpd
from shapely.geometry import box
import rasterio
from rasterio.transform import xy
def pixel_to_geo_polygons(predictions: list) -> gpd.GeoDataFrame:
"""Converts YOLOv8 pixel bounding boxes to georeferenced polygons."""
geo_rows = []
for pred in predictions:
tile_path = pred["tile_path"]
boxes = pred["boxes"]
confs = pred["confs"]
with rasterio.open(tile_path) as src:
transform = src.transform
for i, (x1, y1, x2, y2) in enumerate(boxes):
# Convert pixel corners to geographic coordinates
# rasterio expects (row, col) which maps to (y, x)
lon1, lat1 = xy(transform, y1, x1, offset='center')
lon2, lat2 = xy(transform, y2, x2, offset='center')
# Create Shapely polygon
polygon = box(min(lon1, lon2), min(lat1, lat2),
max(lon1, lon2), max(lat1, lat2))
geo_rows.append({
"geometry": polygon,
"confidence": confs[i],
"source_tile": os.path.basename(tile_path)
})
return gpd.GeoDataFrame(geo_rows, crs="EPSG:4326") # Default WGS84, adjust if needed
For a deeper understanding of how affine matrices map raster indices to geographic space, consult the official Rasterio Transform Documentation. Always verify that your output CRS matches your project requirements; you can reproject the GeoDataFrame using gdf.to_crs("EPSG:XXXX") before export.
Step 4: Spatial Post-Processing and Deduplication
Overlapping tiles inevitably produce duplicate detections. A single building may appear in two adjacent patches, resulting in fragmented or overlapping polygons. We resolve this using spatial clustering and geometric union.
def deduplicate_footprints(gdf: gpd.GeoDataFrame, iou_thresh: float = 0.5) -> gpd.GeoDataFrame:
"""Greedy non-maximum suppression on overlapping detections (IoU-based)."""
if len(gdf) == 0:
return gdf
# Process detections from highest to lowest confidence
gdf = gdf.sort_values("confidence", ascending=False).reset_index(drop=True)
keep = []
suppressed = set()
geoms = gdf.geometry.values
for i in range(len(gdf)):
if i in suppressed:
continue
keep.append(i)
gi = geoms[i]
# Suppress lower-confidence detections that overlap this one above the IoU threshold
for j in range(i + 1, len(gdf)):
if j in suppressed:
continue
gj = geoms[j]
inter = gi.intersection(gj).area
if inter == 0:
continue
union = gi.area + gj.area - inter
if union > 0 and inter / union >= iou_thresh:
suppressed.add(j)
return gdf.iloc[keep].reset_index(drop=True)
Spatial clustering metrics can further validate detection density against known urban patterns, drawing on principles from Spatial Autocorrelation and Statistics. By analyzing Moran’s I or Getis-Ord Gi* on your output, you can identify systematic under-detection in specific neighborhoods or sensor artifacts.
Step 5: Exporting and Validating Production Footprints
Once cleaned, the data is ready for integration into enterprise GIS platforms or web mapping applications.
def export_building_footprints(gdf: gpd.GeoDataFrame, output_path: str):
"""Exports validated footprints to GeoJSON and Shapefile."""
gdf.to_file(output_path.replace(".geojson", ".shp"), driver="ESRI Shapefile")
gdf.to_file(output_path, driver="GeoJSON")
print(f"Exported {len(gdf)} building footprints to {output_path}")
This pipeline directly supports Model Deployment for GIS Applications by generating standardized vector formats that integrate seamlessly with PostGIS, ArcGIS Enterprise, or QGIS. Before pushing to production, rigorously benchmark your results against manually digitized ground truth. Evaluating Geospatial AI Performance requires metrics beyond standard computer vision benchmarks; calculate geospatial Intersection over Union (IoU), perimeter accuracy, and false-positive rates per square kilometer to ensure operational reliability.
Scaling and Optimization Strategies
Processing regional-scale orthomosaics demands careful resource management. Implement Advanced Geospatial AI Optimization by leveraging mixed-precision inference (torch.float16), dynamic tile batching, and asynchronous I/O. For large deployments, consider tiling with Dask or Ray to parallelize raster reads and model inference across multiple GPUs. Always cache intermediate tile outputs and monitor VRAM utilization to prevent out-of-memory crashes during extended runs.
By combining robust geospatial libraries with state-of-the-art detection models, this pipeline transforms raw aerial imagery into actionable, coordinate-accurate building inventories. The methodology scales from municipal planning to disaster response, providing a reproducible foundation for automated spatial feature extraction.