For ongoing work I’m doing with Landsat data, I recently needed to generate some quick validation points against high-resolution aerial photography. I wanted to generate a fixed number of random rectangles, 90-meters squared (3 by 3 Landsat pixels), within the extent of a high-resolution aerial photograph I was using as “ground data.” More specifically, we want to generate a grid of 90-meter squared polygons distributed throughout that align with our Landsat image.
The steps involved are:
- Clip the Landsat image to the bounds of (the intersection with) my aerial photograph.
- Resample this clip to the desired resolution of my validation window (e.g., 90 meters).
- Use QGIS to generate a grid from the coarsened Landsat clip.
- Randomly sample from the generated grid.
Clipping the Landsat Image to the Airphoto Reference
So, we want, e.g., 100 validation points within the bounds of the aerial photograph.
To create the validation points within a defined extent, we first need to measure the extent of the reference image.
gdalinfo utility is useful for this (as is the image properties dialog in QGIS).
I recently created a GDAL-esque tool that can help automate this process called
After using it to extract the bounds of the aerial photograph,
airphoto.tiff as a GeoJSON file, I can convert it to a Shapefile, which is the expected format for cutline features when clipping with GDAL.
As the input file to
ogr2ogr is GeoJSON, I have to tell it the spatial reference system (SRS) it should expect from the source file with the
python gdal_extent.py -geojson airphoto.tiff > extent.json ogr2ogr -f "ESRI Shapefile" -s_srs "EPSG:32617" extent.shp extent.json
Alternatively, I could have just created a Shapefile that represents the bounds of my aerial photograph any other way.
Also, my approach generates a rectangular bounding box whereas we could use any more complex polygon.
So, assuming that you have a file
extent.shp that somehow represents the bounds of your image, it’s time to clip our Landsat image (
L7Image.tiff) to this extent.
gdalwarp; note that I specify the output resolution as 90 meters in both the horizontal and vertical directions.
gdalwarp -cutline extent.shp -cl extent -crop_to_cutline -tr 90 90 L7Image.tiff grid_90m.tiff
Generating a Sampling Grid in QGIS
This is where QGIS comes in. A close cousin of ArcGIS’ Fishnet tool, QGIS’ “Vector grid” tool (under “Vector, Research Tools”) is what I’ll use to convert the pixels of my Landsat image to polygons.
The output Shapefile should resemble a grid of pixels.
Selecting a Subset of Random Validation points
In the last step, I must downselect from this grid of pixels a random subset. Here, I’ll present two techniques for introducing a stochastic selection process.
Random Selection within the Database
The simplest way would be to let a file database randomly select features from my polygon grid.
In this example, I’ll generate KML output because, for another application, I want to validate my points in Google Earth.
The output Shapefile from the Vector grid tool is
vector_grid.shp and we’re downselecting 100 validation points:
ogr2ogr -f "KML" -dialect "sqlite" -sql "SELECT * FROM vector_grid ORDER BY RANDOM() LIMIT 100" validation_points.kml vector_grid.shp
Random Selection with Bash
If you don’t trust the randomization capabilities of a database or have your own reason for generating random features, you might prefer this approach.
In this example, the output Shapefile from the Vector grid tool is
vector_grid.shp; we’ll use a combination of Bash built-ins and GDAL tools here:
# Generate 100 randoms, reverse the string, cut the last character, reverse again PIXELS=$(ogrinfo vector_grid.shp -al | grep "Feature Count: " | cut -c 16-) RANDS="$(shuf -i 0-$PIXELS -n 100 | tr '\n' ',' | rev | cut -c 2- | rev)" ogr2ogr -f "ESRI Shapefile" -where "ID IN ($RANDS)" validation_points.shp vector_grid.shp
As an added bonus, this approach allows us to create fields for validating certain quantities directly in our Shapefile.
This is a great way to automate the generation of validation points for the worthwhile consideration of the undergraduate interns, graduate students, and other valuable research assistants working with you.
For example, if I want to validate the fraction of vegetation cover (
VegFrac) and impervious surface cover (
ImpFrac) and record that information directly in my Shapefile (while editing it in QGIS, later), I simply add those fields as part of a
SELECT statement that includes my previous
ogr2ogr -f "ESRI Shapefile" -sql "SELECT ID, 0.0 AS VegFrac, 0.0 AS ImpFrac FROM vector_grid WHERE ID IN ($RANDS)" validation_points.shp vector_grid.shp
Now I’m ready to analyze the areas specified in
validation_points.shp by going through them, one at a time, in QGIS or another desktop GIS tool.
Technically, I used more than Bash and QGIS but GDAL really is the most fundamental tool in Bash, right?