Plot GISTEMP climate data using Python and why it matters.
Intended audience: Early Career/Students; Educators; Science communicators
By Vasco Mantas (June 2025)
🎯 Learning objectives:
This article provides a concise, end‑to‑end workflow for processing NASA GISS Surface Temperature Analysis (GISTEMP v4) data in Google Colab. Using a single compressed NetCDF file (gistemp1200_GHCNv4_ERSSTv5.nc.gz), we demonstrate how to:
All steps can be completed in under two minutes on standard Colab hardware.
❓Target audience: This tutorial is designed for those at the introductory level, making it particularly valuable for students or early career professionals seeking to learn about the dataset and quick methods for plotting its data. Additionally, educators and science communicators with similar objectives will find this tutorial a valuable resource.
Background on the dataset used
GISTEMP v4, short for the GISS Surface Temperature Analysis version 4, is a pivotal climate dataset created by NASA's Goddard Institute for Space Studies. Drawing from meteorological stations (NOAA GHCN v4) and sea surface temperature records (ERSST v5), it serves as a robust tool for estimating changes in global surface temperatures. What makes GISTEMP such an important dataset is its ability to provide a monthly-resolution timeline of temperature records, spanning from 1880 to the present day. An important attribute of the dataset is the careful quantification of uncertainty. Employing interpolation methods to fill data gaps, GISTEMP offers a comprehensive perspective on global and regional temperature fluctuations, making it an indispensable resource for climate researchers.
Step 1: Download GISTEMP Data
The first crucial step in your journey to analyzing climate data with GISS Surface Temperature Analysis version 4 (GISTEMP) in Python is to obtain the dataset. You can access this data from the official GISTEMP website provided by NASA. Here's how to do it:
Step 2. Computing Environment
The workflow assumes:
Step 3. Setup and data preparation
Copy each cell into a Colab notebook as you read.
a. Mount the Google Drive
from google.colab import drive
drive.mount('/content/drive')
b. Install and import libraries (first run only)
!apt-get -qq update
!apt-get -qq install -y proj-bin libproj-dev libgeos-dev > /dev/null
!pip -q install xarray netCDF4 cartopy cmocean pandas matplotlib tqdm
import os, gzip, shutil, datetime
import numpy as np, xarray as xr, matplotlib.pyplot as plt, cartopy.crs as ccrs
import pandas as pd, cmocean
plt.rcParams["figure.dpi"] = 120 # higher‑resolution plots
c. Define file paths
BASE = "/content/drive/My Drive/Colab Notebooks/2025/gisstemp2"
GZ_IN = f"{BASE}/gistemp1200_GHCNv4_ERSSTv5.nc.gz"
NC_OUT = GZ_IN[:-3] # remove the .gz extension
Recommended by LinkedIn
d. Decompress NetCDF
if not os.path.exists(NC_OUT):
with gzip.open(GZ_IN, "rb") as fin, open(NC_OUT, "wb") as fout:
shutil.copyfileobj(fin, fout)
print(f"Decompressed → {NC_OUT}")
else:
print("NetCDF already available.")
e. Read the dataset
ds = xr.open_dataset(NC_OUT) # dimensions: time, lat, lon
da = ds["tempanomaly"] # surface‑temperature anomaly (°C)
f. Calculate the area-weighted global mean
weights = np.cos(np.deg2rad(ds["lat"]))
global_ts = (da.weighted(weights)
.mean(dim=("lat", "lon"))
.to_series()
.sort_index())
Step 4. Visualisations
All figures use the 1951–1980 baseline for consistency with NASA GISS publications.
Figure 1. Monthly Anomaly Heat Map (1880 – present)
n_months = (len(global_ts) // 12) * 12
heat = global_ts.values[:n_months].reshape(-1, 12)
start_year = global_ts.index[0].year
plt.imshow(heat, aspect="auto", origin="lower",
cmap="cmo.thermal", vmin=-1, vmax=1)
plt.xlabel("Month"); plt.ylabel("Year")
plt.title(f"GISTEMP Monthly Anomalies {start_year} – {datetime.datetime.utcnow().year}")
plt.colorbar(label="°C relative to 1951–1980"); plt.tight_layout()
plt.show()
Figure 2. Global Mean Temperature Anomaly Time Series
plt.figure(figsize=(10, 5))
global_ts.plot()
plt.title("Global Mean Surface‑Temperature Anomaly")
plt.xlabel("Year"); plt.ylabel("°C relative to 1951–1980")
plt.grid(True); plt.tight_layout(); plt.show()
Figure 3. Spatial Distribution of the Latest 12‑Month Mean Anomaly (April 2024 - April 2025).
latest_time = da["time"].max().values
start_period = pd.to_datetime(latest_time) - pd.DateOffset(months=11)
recent_12 = da.sel(time=slice(start_period, latest_time)).mean("time")
fig = plt.figure(figsize=(7.5, 4))
ax = plt.axes(projection=ccrs.Robinson())
ax.set_global(); ax.coastlines(linewidth=0.4)
mesh = ax.pcolormesh(recent_12["lon"], recent_12["lat"], recent_12,
transform=ccrs.PlateCarree(), cmap="cmo.balance",
vmin=-2, vmax=2)
plt.colorbar(mesh, orientation="horizontal", pad=0.05,
label="12‑month mean anomaly (°C)")
ax.set_title("Most Recent 12‑Month Mean Surface‑Temperature Anomaly")
plt.show()
Key Takeaways
For questions or suggestions, please contact the author via LinkedIn.
📖 GISTEMP citation:
Thanks for sharing, Vasco..