Statistical analysis of spatial data in agro-environmental research
Lecture on spatio-temporal data analysis in R

Statistical analysis of spatial data in agro-environmental research

This year I was invited to teach a couple of days on the use of R for spatio-temporal data analysis and machine learning for environmental data. The lectures were part of a summer school organized by the universities of Milan and Pavia, held in Villa del Grumello on beautiful Lake Como and funded by the Fondazione Alessandro Volta (https://sdae.lakecomoschool.org/).

Organizing the summer school was a massive work lead by my good friend Calogero Schillaci and his colleagues. I am extremely thankful to him personally and to all the professors, researchers and students involved in this first year of school for making the whole experience such a enjoyable event. The school received a total of around 80 applications, and after careful consideration the best 27 students were selected from both Academia and Industry. The background of the students was absolutely mixed, but they all shared an interest for spatial and temporal data. This shows very clearly that the trend in data analysis, particularly in the environmental sector (in the general sense, because for the water sector the same principle applies), is the gradual increase in the collection of data that have both a spatial as well as a temporal component (which come from sensors, satellite or drones and machinery).

For this reason the focus on statistical techniques to analyse spatio-temporal data was particularly welcome by the wider scientific community (in which I include data scientists like me that are always interested in training to stay up to date with the latest developments). The school was particularly successful because it included only experts in the field of statistics, spatial statistics and machine learning. We do hope we can replicate the whole thing next year, because we saw a real interest from all the students and we unfortunately had to limit the size of the class to 27 students for logistic reasons, but we would have loved to be able to include more people and help them stay updated. We are also thinking about another event for advanced machine learning, where we can focus more on creating models for production. However, for the time being this is just an idea and it would be nice to understand whether this would be of some interest to the data science community.

I personally developed a lecture on data manipulation in R, where I covered tabular data manipulation, static and interactive plots, time-series analysis, and finally spatial data analysis and the creation of beautiful interactive maps with leaflet. Since I knew that not all students were advanced R users and not all of them had experience in data analysis, I developed a detailed markdown document where I tried to explain the whole process and the final HTML document was shared with the students via a dropbox folder. This allowed the students to revisit the material after the lecture and fully digest it. I had really good feedback from the students and I was really pleased about it because I clearly spent some of my free time developing the material and it is always pleasing when this is recognized and appreciated. I also helped deliver the lecture of tree based machine learning methods, lead by Calogero.

I though the use of dplyr, ggplot2, plotly, tsibble, feable, sf, tmap, leaflet

Generally speaking the whole process was reasonably smooth, even though there were students who struggled to install some of the packages and they were forced to update R and/or R Studio to make the code work for them. Luckily I was assisted by Calogero in the room, who was able to help me manage the technical issues and make the day a success. It would be good to test R Studio Cloud to see whether this platform can solve some of the technical issues. I am wondering however, whether working on their own laptop and dealing with technical challenges is actually better to improve the ability to work effectively with R (just a thought!).

In hindsight I think for next time I will add material about raster data manipulation because a lot of the students were interested in that topic. Unfortunately this is something that I do not do much at WRc (we mostly deal with vector data) and therefore I did not think about it when I was developing the course. However, I was able to point students to my blog (http://r-video-tutorial.blogspot.com/) where I shared material on the use of the package raster. I need to use this article as a personal note to my future self to remind me that I need to add material to the course.

I had a lot of fun teaching in the summer school (even though I forgot how hard it is to teach for a full day!) and I do really hope we can repeat this experience next year and maybe hold an event for advanced machine learning. I am extremely thankful to Calogero Schillaci for organizing the event, to the two school directors Marco Acutis and Michael Märker for being so thoughtful to even create personalized certificates for all teachers (which I will definitely keep on my desk!!) and all teachers and students. I hope the professional relations we started at the school can continue to be successful in the future.

Thank you Marco Acutis and Michael Märker for the certificate!!

To finish the post I thought of sharing the view on beautiful Lake Como that teachers and students enjoyed at each coffee and lunch break throughout the week. If you are jealous do not worry, it is totally normal and can be solved by coming next year to the second edition of the summer school!

No alt text provided for this image


I do enjoy your posts Fabio👍

Like
Reply

Thank you for share

Like
Reply

Fabio, you lecture was really interesting and the markdown is extremely useful! Thanks so much for putting this together and sharing with us!

To view or add a comment, sign in

More articles by Fabio Veronesi

Others also viewed

Explore content categories