For ongoing work I'm doing with Landsat data, I recently needed to generate some quick validation points against high-resolution aerial photography. I wanted to generate a fixed number of random rectangles, 90-meters squared (3 by 3 Landsat pixels), within the extent of a high-resolution aerial photograph I was using as "ground data." More specifically, we want to generate a grid of 90-meter squared polygons distributed throughout that align with our Landsat image.

The steps involved are:

  1. Clip the Landsat image to the bounds of (the intersection with) my aerial photograph.
  2. Resample this clip to the desired resolution of my validation window (e.g., 90 meters).
  3. Use QGIS to generate a grid from the coarsened Landsat clip.
  4. Randomly sample from the generated grid.

Clipping the Landsat Image to the Airphoto Reference

So, we want, e.g., 100 validation points within the bounds of the aerial photograph. To create the validation points within a defined extent, we first need to measure the extent of the reference image. The gdalinfo utility is useful for this (as is the image properties dialog in QGIS). I recently created a GDAL-esque tool that can help automate this process called gdal_extent.py. After using it to extract the bounds of the aerial photograph, airphoto.tiff as a GeoJSON file, I can convert it to a Shapefile, which is the expected format for cutline features when clipping with GDAL. As the input file to ogr2ogr is GeoJSON, I have to tell it the spatial reference system (SRS) it should expect from the source file with the -s_srs switch.

python gdal_extent.py -geojson airphoto.tiff > extent.json
ogr2ogr -f "ESRI Shapefile" -s_srs "EPSG:32617" extent.shp extent.json

Alternatively, I could have just created a Shapefile that represents the bounds of my aerial photograph any other way. Also, my approach generates a rectangular bounding box whereas we could use any more complex polygon. So, assuming that you have a file extent.shp that somehow represents the bounds of your image, it's time to clip our Landsat image (L7Image.tiff) to this extent. I'll use gdalwarp; note that I specify the output resolution as 90 meters in both the horizontal and vertical directions.

gdalwarp -cutline extent.shp -cl extent -crop_to_cutline -tr 90 90 L7Image.tiff grid_90m.tiff

Generating a Sampling Grid in QGIS

This is where QGIS comes in. A close cousin of ArcGIS' Fishnet tool, QGIS' "Vector grid" tool (under "Vector, Research Tools") is what I'll use to convert the pixels of my Landsat image to polygons.

The Vector grid tool is a dear invention.

The output Shapefile should resemble a grid of pixels.

Selecting a Subset of Random Validation points

In the last step, I must downselect from this grid of pixels a random subset. Here, I'll present two techniques for introducing a stochastic selection process.

Random Selection within the Database

The simplest way would be to let a file database randomly select features from my polygon grid. In this example, I'll generate KML output because, for another application, I want to validate my points in Google Earth. The output Shapefile from the Vector grid tool is vector_grid.shp and we're downselecting 100 validation points:

ogr2ogr -f "KML" -dialect "sqlite" -sql "SELECT * FROM vector_grid ORDER BY RANDOM() LIMIT 100" validation_points.kml vector_grid.shp

Random Selection with Bash

If you don't trust the randomization capabilities of a database or have your own reason for generating random features, you might prefer this approach. In this example, the output Shapefile from the Vector grid tool is vector_grid.shp; we'll use a combination of Bash built-ins and GDAL tools here:

# Generate 100 randoms, reverse the string, cut the last character, reverse again
PIXELS=$(ogrinfo vector_grid.shp -al | grep "Feature Count: " | cut -c 16-)
RANDS="$(shuf -i 0-$PIXELS -n 100 | tr '\n' ',' | rev | cut -c 2- | rev)"
ogr2ogr -f "ESRI Shapefile" -where "ID IN ($RANDS)" validation_points.shp vector_grid.shp

As an added bonus, this approach allows us to create fields for validating certain quantities directly in our Shapefile. This is a great way to automate the generation of validation points for the worthwhile consideration of the undergraduate interns, graduate students, and other valuable research assistants working with you. For example, if I want to validate the fraction of vegetation cover (VegFrac) and impervious surface cover (ImpFrac) and record that information directly in my Shapefile (while editing it in QGIS, later), I simply add those fields as part of a SELECT statement that includes my previous WHERE condition:

ogr2ogr -f "ESRI Shapefile" -sql "SELECT ID, 0.0 AS VegFrac, 0.0 AS ImpFrac FROM vector_grid WHERE ID IN ($RANDS)" validation_points.shp vector_grid.shp

In Conclusion

Now I'm ready to analyze the areas specified in validation_points.shp by going through them, one at a time, in QGIS or another desktop GIS tool. Technically, I used more than Bash and QGIS but GDAL really is the most fundamental tool in Bash, right?