Xarray and working with NetCDF data

Xarray and working with NetCDF data

Xarray focuses on providing a better implementation of multi-dimensional arrays than Numpy. Real world multi-dimensional data cannot be easily represented by just raw numbers, which the Numpy version of arrays are. For example, weather datasets contain several variables (like air temperature, specific humidity, wind speed), coordinate variables (like latitude and longitude), and dimensions.

This is how a typical weather dataset structure looks like:

No alt text provided for this image

Data Structures in xarray:

There are two main data structures provided by xarray –

  • DataArray
  • Dataset

DataArray class implements multi-dimensional arrays with dimension names, coordinates, and attributes linked to them.

Dataset is a combination of multiple xarray DataArrays. It is a dictionary like container which maps one variable to each DataArray it holds.

Below is an example of a dataset that holds the variable tas (near surface air temperature).

No alt text provided for this image

We can breakdown this output into the descriptions of –

  • Dimensions – time, bands, latitude, longitude along with their lengths
  • Coordinates mapped with dimensions mentioned in parentheses (A dimension may or may not have a mapped coordinate). Coordinates themselves are dictionary like container that map coordinate names to values. For example:

No alt text provided for this image

Here, time is the coordinate name, (time) is the dimension name to which it is attached, datetime64[ns] is the datatype of individual values of the time coordinate, the values in the array are the individual coordinate values and can be visualized as ticks of an axis of a graph.[There are also non dimension coordinates, but we will not get into the details of that as of now.]

  • Data variables which are themselves DataArrays, along with their dimensions in parentheses. These dimensions are a subset of the complete list of dimensions listed at the beginning at the dataset level.
  • Attributes

We can try to access a single array out of this dataset. Since the main variable of importance here is ‘tas’, we can try selecting that:

No alt text provided for this image


Working with NetCDF data

Downloading, extracting, reading the data into xarray dataset

Below code snippet downloads CMIP6 dataset, from Copernicus climate datastore. The region selected below broadly encompasses the geographical region of India, though not with 100% accuracy:

No alt text provided for this image

This downloads one file for each variable listed in the vars list. The downloaded files are zipped. Below code snippet can be used to unzip the files

No alt text provided for this image

Reading the downloaded NetCDF files into xarray datasets:

No alt text provided for this image


Exploring one of the datasets:

No alt text provided for this image

Selecting one variable of the dataset individually, which itself is a DataArray

No alt text provided for this image


Selecting data through different kinds of indexing:

Using .sel() and .isel() to index data from tas variable within the dataset ds_cmip6_tas.

No alt text provided for this image

In the above example, position-based indexing has been used. We can make the same selection by passing the label of the first longitude using the .sel() method.

No alt text provided for this image

Selecting using the first label of longitude

No alt text provided for this image

Another useful functionality is the ‘nearest’ method. If we do not know the exact label of a coordinate, we can use the approximate value in conjunction with the nearest method.

No alt text provided for this image

We can also select a value if we know the coordinate labels against more than one dimensions

No alt text provided for this image

Use of masking through ‘.where()

No alt text provided for this image

All the values lying outside the mask will be converted to nan.

No alt text provided for this image


Visualizing NetCDF data

Plotting temperature difference between two dates

No alt text provided for this image

Plotting aggregations

No alt text provided for this image

Plotting aggregations over groups

Below is the plot showing mean temperature of each month throughout all years. This is a two dimensional data.

No alt text provided for this image

Below plot shows median values for each month of all the years. These median values are calculated over the dimensions – latitude and longitude. Thus, the final plotted data is one dimensional.

No alt text provided for this image

To view or add a comment, sign in

More articles by Danish Ansari

Others also viewed

Explore content categories