Analysis Ready Datasets
One of the luxuries of my position in policy division of the Ministry of Natural Resources and Forestry is that I don’t actually own any corporate data, I have to privilege of utilizing the hard work of others to do my job. Having said that however, I do have a lot of clients that rely on the compiled sets of spatial and tabular information I maintain.
As a policy analyst, I often get tagged with “what if” questions that come from all sorts of sources, from external clients to the ADM or Minister wanting to know how much of something we have, where is it and what does it look like.
In the 90s and early 2000s, policy analysis often relied on ad-hoc collections of data without any cleaning or validation, and the tiniest of influences had a huge impact provincially. Upon further analysis, we realized that most of the impacts were duplicate values, redundancies, errors and missing data. With the advent of the Forest Information Manual in 2000, the MNRF began to collect large volumes of spatial data in a very rigorous and standardized format. Even with these rules, there are completely acceptable differences in submissions that can cause errors and double counting.
Recommended by LinkedIn
In 2012 I took the initiative to start enforcing even more standards on our collective data. The bulk of the change was enforcing a standard projection (MNRF Lambert), removing non-standard fields, the elimination of micro-slivers (sub-hectare artifacts derived from intersecting administrative data) and the erasure of overlapping polygons where the rules do not allow overlap (latest data trumps older). Some practitioners questioned this approach, but the actual change in any values is miniscule. I also add extra classification fields that are not part of the standard data submission, both in our operational planning inventories and depletion and renewal layers. The old database term for this was “fast, fat and flat” meaning it was a larger file, but easier to use.
Using these “analysis ready datasets” meant that I could now derive an answer in minutes rather than days or weeks, and the results are consistent and reproducible. As new data arrives in annual reports or forest management planning submissions, they are appended to or replace existing sets. Automation is the key (Python) since we have 39 forest management units and manual processing would take months. There are also derived datasets we generate, such as simplified geometry versions to facilitate faster processing or modeling. Dropping resolution to a five-metre vertex rule has brought some modelling from days to minutes.
So when I get asked about the occurrence of black ash on the landscape as part of the COSEWIC assessment in Ontario, or hemlock related to wooly adelgid, it’s a few very simple lines of Python to derive a layer in under five minutes. Analytics? Right click the layer and drop it into Tableau for data discovery and summarizing on the fly and be able to share it with the requesting clients.
Awesome work Larry, proud to be one of the many that have been able to leverage your hardwork over the years! The amazing part is much of this work was done as part of your vision and persistence with very little support from MNRF leadership. Please never retire!!!