Clipping rasters can be trivial with a desktop GIS like QGIS or with command line tools like GDAL. However, I recently ran into a situation where I needed to clip large rasters in an automated, online Python process. It simply wouldn't do to interrupt the procedure and clip them myself. The Python bindings for GDAL/OGR are pretty neat but they are very low-level; how could I use Python to clip the rasters as part of a continuous Python session?

Jared Erickson posted an excellent tutorial on this topic, one of many in his "Python GDAL/OGR Cookbook". Even this simple example hints at how complicated something like clipping can be with the low-level GDAL/OGR API. However, there are still some limitations to the example he provided:

  • It only works for single clipping features.
  • It only works for multiple-band rasters.
  • The clipping features must be bounded within the raster.

Here I'll present a solution to the second and third points: Clipping a raster, stacked or single-band, with clipping features that extend beyond its bounds.

Improvement: Support for Single-Band Images

This is an easy fix. We simply need to use a try...catch sequence of statements to anticipate the error in our NumPy indexing when a single-band raster is presented.

# Multi-band image?
try:
    clip = rast[:, ulY:lrY, ulX:lrX]

# Nope: Must be single-band
except IndexError:
    clip = rast[ulY:lrY, ulX:lrX]

Improvement: Out-of-Bounds Clipping Features

The second part is not so easy. The clipping in Jared's example is based on NumPy arrays and therefore must be conformal. One thing that can happen when the clipping features extend "above" a raster's extent is that the estimated pixel coordinate for the upper-left corner become negative--and negative pixels don't make sense! We have to catch this case, remember the negative pixel coordinate, and set it to zero.

# If the clipping features extend out-of-bounds and ABOVE the raster...
if gt[3] < maxY:
    # In such a case... ulY ends up being negative--can't have that!
    iY = ulY
    ulY = 0

However, in doing this, the clipping features are effectively "pushed down" (southwards) because we prevented the upper-left corner coordinate from being negative. To compensate, we must pull the clipping features "back up" (northwards). Note that here we need to add an else clause for the case when the clipping features don't extend above the raster.

# If the clipping features extend out-of-bounds and ABOVE the raster...
if gt[3] < maxY:
    # The clip features were "pushed down" to match the bounds of the
    #   raster; this step "pulls" them back up
    premask = image_to_array(raster_poly)
    # We slice out the piece of our clip features that are "off the map"
    mask = np.ndarray((premask.shape[-2] - abs(iY), premask.shape[-1]), premask.dtype)
    mask[:] = premask[abs(iY):, :]
    mask.resize(premask.shape) # Then fill in from the bottom

    # Most importantly, push the clipped piece down
    gt2[3] = maxY - (maxY - gt[3])

else:
    mask = image_to_array(raster_poly)

Finally, we have to deal with the converse problem: clipping features that extend below the raster's bounds. This is trickier. What I've done is created a larger NumPy array onto which the clipping features are projected.

# Clip the image using the mask
try:
    clip = gdalnumeric.choose(mask, (clip, nodata))

# If the clipping features extend out-of-bounds and BELOW the raster...
except ValueError:
    # We have to cut the clipping features to the raster!
    rshp = list(mask.shape)
    if mask.shape[-2] != clip.shape[-2]:
        rshp[0] = clip.shape[-2]

    if mask.shape[-1] != clip.shape[-1]:
        rshp[1] = clip.shape[-1]

    mask.resize(*rshp, refcheck=False)

    clip = gdalnumeric.choose(mask, (clip, nodata))

The Complete Source Code

Here's a complete Python function for clipping a raster. It follows Jared's example but expands on the comments throughout and includes the two improvements I've described. It does not yet support clipping by multiple features at once. Also, I haven't evaluated (nor come across the problem of) its performance with clipping features that extend only to the side of the raster; not sure if this would be a problem.

The function's API consists of two required arguments: The raster to clip and the file path to the clipping features (e.g., a Shapefile). An optional GDAL GeoTransform can be provided. Also, the NoData value can be set.

from osgeo import gdal, gdalnumeric, ogr
from PIL import Image, ImageDraw
import os
import numpy as np

def clip_raster(rast, features_path, gt=None, nodata=-9999):
    '''
    Clips a raster (given as either a gdal.Dataset or as a numpy.array
    instance) to a polygon layer provided by a Shapefile (or other vector
    layer). If a numpy.array is given, a "GeoTransform" must be provided
    (via dataset.GetGeoTransform() in GDAL). Returns an array. Clip features
    must be a dissolved, single-part geometry (not multi-part). Modified from:

    http://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html
    #clip-a-geotiff-with-shapefile

    Arguments:
        rast            A gdal.Dataset or a NumPy array
        features_path   The path to the clipping features
        gt              An optional GDAL GeoTransform to use instead
        nodata          The NoData value; defaults to -9999.
    '''
    def array_to_image(a):
        '''
        Converts a gdalnumeric array to a Python Imaging Library (PIL) Image.
        '''
        i = Image.fromstring('L',(a.shape[1], a.shape[0]),
            (a.astype('b')).tostring())
        return i

    def image_to_array(i):
        '''
        Converts a Python Imaging Library (PIL) array to a gdalnumeric image.
        '''
        a = gdalnumeric.fromstring(i.tobytes(), 'b')
        a.shape = i.im.size[1], i.im.size[0]
        return a

    def world_to_pixel(geo_matrix, x, y):
        '''
        Uses a gdal geomatrix (gdal.GetGeoTransform()) to calculate
        the pixel location of a geospatial coordinate; from:
        http://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#clip-a-geotiff-with-shapefile
        '''
        ulX = geo_matrix[0]
        ulY = geo_matrix[3]
        xDist = geo_matrix[1]
        yDist = geo_matrix[5]
        rtnX = geo_matrix[2]
        rtnY = geo_matrix[4]
        pixel = int((x - ulX) / xDist)
        line = int((ulY - y) / xDist)
        return (pixel, line)

    # Can accept either a gdal.Dataset or numpy.array instance
    if not isinstance(rast, np.ndarray):
        gt = rast.GetGeoTransform()
        rast = rast.ReadAsArray()

    # Create an OGR layer from a boundary shapefile
    features = ogr.Open(features_path)
    if features.GetDriver().GetName() == 'ESRI Shapefile':
        lyr = features.GetLayer(os.path.split(os.path.splitext(features_path)[0])[1])

    else:
        lyr = features.GetLayer()

    # Get the first feature
    poly = lyr.GetNextFeature()

    # Convert the layer extent to image pixel coordinates
    minX, maxX, minY, maxY = lyr.GetExtent()
    ulX, ulY = world_to_pixel(gt, minX, maxY)
    lrX, lrY = world_to_pixel(gt, maxX, minY)

    # Calculate the pixel size of the new image
    pxWidth = int(lrX - ulX)
    pxHeight = int(lrY - ulY)

    # If the clipping features extend out-of-bounds and ABOVE the raster...
    if gt[3] < maxY:
        # In such a case... ulY ends up being negative--can't have that!
        iY = ulY
        ulY = 0

    # Multi-band image?
    try:
        clip = rast[:, ulY:lrY, ulX:lrX]

    except IndexError:
        clip = rast[ulY:lrY, ulX:lrX]

    # Create a new geomatrix for the image
    gt2 = list(gt)
    gt2[0] = minX
    gt2[3] = maxY

    # Map points to pixels for drawing the boundary on a blank 8-bit,
    #   black and white, mask image.
    points = []
    pixels = []
    geom = poly.GetGeometryRef()
    pts = geom.GetGeometryRef(0)

    for p in range(pts.GetPointCount()):
        points.append((pts.GetX(p), pts.GetY(p)))

    for p in points:
        pixels.append(world_to_pixel(gt2, p[0], p[1]))

    raster_poly = Image.new('L', (pxWidth, pxHeight), 1)
    rasterize = ImageDraw.Draw(raster_poly)
    rasterize.polygon(pixels, 0) # Fill with zeroes

    # If the clipping features extend out-of-bounds and ABOVE the raster...
    if gt[3] < maxY:
        # The clip features were "pushed down" to match the bounds of the
        #   raster; this step "pulls" them back up
        premask = image_to_array(raster_poly)
        # We slice out the piece of our clip features that are "off the map"
        mask = np.ndarray((premask.shape[-2] - abs(iY), premask.shape[-1]), premask.dtype)
        mask[:] = premask[abs(iY):, :]
        mask.resize(premask.shape) # Then fill in from the bottom

        # Most importantly, push the clipped piece down
        gt2[3] = maxY - (maxY - gt[3])

    else:
        mask = image_to_array(raster_poly)

    # Clip the image using the mask
    try:
        clip = gdalnumeric.choose(mask, (clip, nodata))

    # If the clipping features extend out-of-bounds and BELOW the raster...
    except ValueError:
        # We have to cut the clipping features to the raster!
        rshp = list(mask.shape)
        if mask.shape[-2] != clip.shape[-2]:
            rshp[0] = clip.shape[-2]

        if mask.shape[-1] != clip.shape[-1]:
            rshp[1] = clip.shape[-1]

        mask.resize(*rshp, refcheck=False)

        clip = gdalnumeric.choose(mask, (clip, nodata))

    return (clip, ulX, ulY, gt2)