Analysis Ready Datasets

Larry Watkins

Published Dec 6, 2022

One of the luxuries of my position in policy division of the Ministry of Natural Resources and Forestry is that I don’t actually own any corporate data, I have to privilege of utilizing the hard work of others to do my job. Having said that however, I do have a lot of clients that rely on the compiled sets of spatial and tabular information I maintain.

As a policy analyst, I often get tagged with “what if” questions that come from all sorts of sources, from external clients to the ADM or Minister wanting to know how much of something we have, where is it and what does it look like.

In the 90s and early 2000s, policy analysis often relied on ad-hoc collections of data without any cleaning or validation, and the tiniest of influences had a huge impact provincially. Upon further analysis, we realized that most of the impacts were duplicate values, redundancies, errors and missing data. With the advent of the Forest Information Manual in 2000, the MNRF began to collect large volumes of spatial data in a very rigorous and standardized format. Even with these rules, there are completely acceptable differences in submissions that can cause errors and double counting.

Recommended by LinkedIn

My first foray into QGIS

Monte Kluemper 7 months ago

Technical debt in forest analytics

Matt Russell 1 year ago

Building a Bridge to the R Community

Matt Artz 10 years ago

In 2012 I took the initiative to start enforcing even more standards on our collective data. The bulk of the change was enforcing a standard projection (MNRF Lambert), removing non-standard fields, the elimination of micro-slivers (sub-hectare artifacts derived from intersecting administrative data) and the erasure of overlapping polygons where the rules do not allow overlap (latest data trumps older). Some practitioners questioned this approach, but the actual change in any values is miniscule. I also add extra classification fields that are not part of the standard data submission, both in our operational planning inventories and depletion and renewal layers. The old database term for this was “fast, fat and flat” meaning it was a larger file, but easier to use.

Using these “analysis ready datasets” meant that I could now derive an answer in minutes rather than days or weeks, and the results are consistent and reproducible. As new data arrives in annual reports or forest management planning submissions, they are appended to or replace existing sets. Automation is the key (Python) since we have 39 forest management units and manual processing would take months. There are also derived datasets we generate, such as simplified geometry versions to facilitate faster processing or modeling. Dropping resolution to a five-metre vertex rule has brought some modelling from days to minutes.

So when I get asked about the occurrence of black ash on the landscape as part of the COSEWIC assessment in Ontario, or hemlock related to wooly adelgid, it’s a few very simple lines of Python to derive a layer in under five minutes. Analytics? Right click the layer and drop it into Tableau for data discovery and summarizing on the fly and be able to share it with the requesting clients.

Dan Rouillard R.P.F. 3y

Awesome work Larry, proud to be one of the many that have been able to leverage your hardwork over the years! The amazing part is much of this work was done as part of your vision and persistence with very little support from MNRF leadership. Please never retire!!!

2 Reactions

See more comments

Analysis Ready Datasets

Larry Watkins

Recommended by LinkedIn

More articles by this author

Others also viewed

Yangon Landcover Classification with Landsat 9 data using Machine Learning

ArcGIS Survey123: Quick Hacks & Tips

𝐇𝐚𝐧𝐝𝐥𝐢𝐧𝐠 𝐩𝐨𝐢𝐧𝐭𝐬 𝐰𝐢𝐭𝐡𝐢𝐧 𝐠𝐫𝐢𝐝𝐬

Ground Truth Pipelines: The Part Nobody Warns You About

arcgisbinding: Testing the new ArcGIS interface for the R language

How to extract a Shapefile from the Open Street Map?

What I Wish I Knew When I Started in GIS

Outlier management in precision agriculture with QGIS

GIS 2017 and Beyond

Explore content categories

Recommended by LinkedIn

Technical Visualizations

May 24, 2023

Others also viewed

Yangon Landcover Classification with Landsat 9 data using Machine Learning

ArcGIS Survey123: Quick Hacks & Tips

𝐇𝐚𝐧𝐝𝐥𝐢𝐧𝐠 𝐩𝐨𝐢𝐧𝐭𝐬 𝐰𝐢𝐭𝐡𝐢𝐧 𝐠𝐫𝐢𝐝𝐬

Ground Truth Pipelines: The Part Nobody Warns You About

arcgisbinding: Testing the new ArcGIS interface for the R language

How to extract a Shapefile from the Open Street Map?

What I Wish I Knew When I Started in GIS

Outlier management in precision agriculture with QGIS

GIS 2017 and Beyond

Explore content categories