Exploratory Spatial Data Analysis: Spatial Autocorrelation and Choropleth Maps
Illustrations of spatial autocorrelation. From Radil, S. M. (2011). University of Illinois at Urbana-Champaign.

Exploratory Spatial Data Analysis: Spatial Autocorrelation and Choropleth Maps

For the purpose of comprehensible content, Python code is not included in this article. The complete code can be found in the Jupyter Notebook file (.ipynb) on my GitHub:

https://github.com/ThucDao/ExploratorySpatialDataAnalysis

As a data analyst, you are familiar with Exploratory Data Analysis (EDA). It helps find out the patterns and relationships between variables and how they affect each other. In order to measure the relationship, you calculate the correlation coefficients and visualize them in a heat map.

What if you want to find the correlation between a variable and a location to get the patterns geographically? You cannot use EDA because it treats location data like other regular features. You need a new kind of analysis: Exploratory Spatial Data Analysis (ESDA). In lieu of the correlation and the heat map, the new measure is spatial autocorrelation, and the visualization is a choropleth map.

Instead of continuing with the theory, we should jump into an example now: Airbnb average listing prices in Canadian cities in 2022.

Here is the process:

1. Import required libraries

2. Load listings data and neighbourhoods geodata of the chosen city

3. Convert the listings data to listings geodata

4. Join listings geodata and neighbourhoods geodata

5. Calculate the average price of every neighbourhood

6. Create an interactive choropleth map of the average listing price in the chosen city

(I will also show the difference between the three classifications of the choropleth map.)

7. Determine the global spatial autocorrelation with Moran's I statistics to prove the presence (or absence) of clusters.

8. Determine the local spatial autocorrelation with LISA statistics and make a choropleth map of LISA cluster to show where the clusters are in the chosen city.

(I will verify the choropleth maps of some cities with their Moran's I values and p-values from the Moran's I statistics.)

 

STEP 1:

The following Python libraries or modules are needed:

  • pandas for data manipulation
  • matplotlib.pyplot for static visualization
  • folium for interactive geovisualization
  • geopandas for geodata manipulation
  • libpysal for spatial computation
  • esda for statistics and classes in exploratory spatial data analysis
  • splot for connecting spatial analysis done in PySAL (e.g., libpysal) to different popular visualization toolkits like matplotlib.

 

STEP 2 TO 5:

It is pretty straightforward and should not need further explanation. You can refer to Python code on my GitHub to learn how to do each step. At the end of step 5, you should get a data frame of unique neighbourhoods and their average listing prices like this:

No alt text provided for this image


STEP 6:

I have made interactive choropleth maps of all six cities which can be viewed here:

https://interactive-choropleth-map.thucdao.repl.co/

It should be noted that there are three classifications of the choropleth map:

1. Classification by equal intervals: divides the data into equal size classes (here, classes are price ranges). This is the one used in all interactive choropleth maps above.

No alt text provided for this image

In this classification, it seems that only one neighbourhood (Hampstead) has a high average price, and other neighbourhoods all have low average prices.

 

2. Classification by quantiles: places an equal number of observations in each class (here, it means an equal number of neighbourhoods per price range).

No alt text provided for this image

With this classification, the number of high-price neighbourhoods seems equal to the number of low-price neighbourhoods.

 

3. Classification by natural breaks: minimizes within-class variance and maximizes between-class differences

No alt text provided for this image

This classification tends to give a harmonized arrangement of classes (price ranges). It ensures that the variance in each price range is minimum.

 

While these choropleth maps show the average price per neighbourhood grouped into 10 ranges, they do not give us any pattern. We don't know whether the price is dispersed, clustered, or distributed randomly? If it is clustered, where the clusters are? We will answer two questions in step 7 (global spatial autocorrelation) and step 8 (local spatial autocorrelation).

 

STEP 7:

Determine the global spatial autocorrelation with Moran's I statistics.

Moran's I is a way to measure spatial autocorrelation. In simple terms, it's a way to quantify how closely values are clustered together in a 2-D space.

Moran's I Test uses the following null and alternative hypotheses:

  • Null Hypothesis: The data is randomly dispersed.
  • Alternative Hypothesis: The data is not randomly dispersed, i.e., it is either clustered or dispersed in noticeable patterns.

The value of Moran's I can range from -1 to 1 where:

  • -1: The variable of interest is perfectly dispersed
  • 0: The variable of interest is randomly dispersed
  • 1: The variable of interest is perfectly clustered together

The corresponding p-value can be used to determine whether the data is randomly dispersed or not. If the p-value is less than a certain significance level (i.e., α = 0.05), then we can reject the null hypothesis and conclude that the data is spatially clustered together in such a way that it is unlikely to have occurred by chance alone.

 

Let’s look at the result of six cities:

Toronto – Moran's I value: 0.37939917945817603 | p-value: 0.001

Vancouver – Moran's I value: 0.25926115056716487 | p-value: 0.014

Victoria – Moran's I value: 0.23031037456272505 | p-value: 0.014

Montreal – Moran's I value: 0.08129999107479662 | p-value: 0.109

Quebec City – Moran's I value: -0.0998559053213681 | p-value: 0.227

Winnipeg – Moran's I value: -0.34351245464294594 | p-value: 0.002

 

Provided that the significance level is 0.05, we can reject the null hypothesis for Toronto, Winnipeg, Vancouver, and Victoria, which have p-values < 0.05. These cities have evidence of clustered prices in neighbourhoods. Among four cities, only Winnipeg has a negative Moran's I value but > - 0.5, which shows that there is a slightly dispersed price. The other three cities have positive Moran's I value and < 0.5, which can be interpreted as a slightly clustered price.

We do not reject the null hypothesis for Montreal and Quebec City as their p-values > 0.05. These cities have the price randomly dispersed. The fact that their Moran's I values are close to 0 supports the random price pattern.

 

STEP 8:

Determine the local spatial autocorrelation with LISA statistics and make a choropleth map of LISA cluster.

While the global spatial autocorrelation can prove the existence of clusters (or a positive spatial autocorrelation between the listing price and their neighborhoods), it does not show where the clusters are. That is when the local spatial autocorrelation resulting from Local Indicators of Spatial Association (LISA) statistics comes into play.

In general, local Moran's I values are interpreted as follows:

  • Negative: nearby areas are dissimilar or dispersed, e.g., High-Low or Low-High
  • Neutral: nearby areas have no particular relationship or random, absence of pattern
  • Positive: nearby areas are similar or clustered, e.g., High-High or Low-Low

The LISA uses local Moran's I values to identify the clusters in localized map regions and categorize the clusters into five types:

  1. High-High (HH): the area having high values of the variable is surrounded by neighbors that also have high values
  2. Low-Low (LL): the area having low values of the variable is surrounded by neighbors that also have low values
  3. Low-High (LH): the area having low values of the variable is surrounded by neighbors that have high values
  4. High-Low (HL): the area having high values of the variable is surrounded by neighbors that have low values
  5. Not Significant (NS)

High-High and Low-Low represent positive spatial autocorrelation, while High-Low and Low-High represent negative spatial correlation.

 

Finally, we make LISA cluster maps from the LISA results. Although LISA cluster maps are also choropleth maps, they do not show the average price per neighbourhood but instead the price relationship in each neighbourhood.

 

Let’s view some LISA cluster maps and compare them with Moran's I values and p-values.

No alt text provided for this image

Toronto – Moran's I value: 0.37939917945817603 | p-value: 0.001

The map shows clearly some clusters, as proven by Moran's I value and p-value. Because the total number of High-High and Low-Low is bigger than the total number of High-Low and Low-High, the overall trend is positive spatial autocorrelation.

 

No alt text provided for this image

Winnipeg – Moran's I value: -0.34351245464294594 | p-value: 0.002

The map shows only one High-Low, which explains the negative Moran's I value and hence the overall negative spatial correlation.

 

No alt text provided for this image

Vancouver – Moran's I value: 0.25926115056716487 | p-value: 0.014

The map shows clearly some clusters, as proven by Moran's I value and p-value. Because there are only High-High and Low-Low, the overall trend is positive spatial autocorrelation.

 

No alt text provided for this image

Montreal – Moran's I value: 0.08129999107479662 | p-value: 0.109

From the global spatial autocorrelation, we have already known that Montreal has the price randomly dispersed. This is verified by the appearance of only 2 Low-Low out of 34 neighbourhoods.

  

Reference

Radil, Steven M. (2011). Spatializing social networks: making space for theory in spatial analysis. University of Illinois at Urbana-Champaign. Retrieved from https://www.ideals.illinois.edu/handle/2142/26222


Data source: Inside Airbnb

http://insideairbnb.com/get-the-data/

  • Montreal

http://data.insideairbnb.com/canada/qc/montreal/2022-03-12/visualisations/listings.csv

http://data.insideairbnb.com/canada/qc/montreal/2022-03-12/visualisations/neighbourhoods.geojson

  • Quebec City

http://data.insideairbnb.com/canada/qc/quebec-city/2022-03-09/visualisations/listings.csv

http://data.insideairbnb.com/canada/qc/quebec-city/2022-03-09/visualisations/neighbourhoods.geojson

  • Toronto

http://data.insideairbnb.com/canada/on/toronto/2022-03-08/visualisations/listings.csv 

http://data.insideairbnb.com/canada/on/toronto/2022-03-08/visualisations/neighbourhoods.geojson

  • Vancouver

http://data.insideairbnb.com/canada/bc/vancouver/2022-03-10/visualisations/listings.csv

http://data.insideairbnb.com/canada/bc/vancouver/2022-03-10/visualisations/neighbourhoods.geojson

  • Victoria

http://data.insideairbnb.com/canada/bc/victoria/2022-03-29/visualisations/listings.csv

http://data.insideairbnb.com/canada/bc/victoria/2022-03-29/visualisations/neighbourhoods.geojson

  • Winnipeg

http://data.insideairbnb.com/canada/mb/winnipeg/2022-06-08/visualisations/listings.csv

http://data.insideairbnb.com/canada/mb/winnipeg/2022-06-08/visualisations/neighbourhoods.geojson

To view or add a comment, sign in

More articles by Thuc Dao

Others also viewed

Explore content categories