from __future__ import annotations

2D datasets in imas2xarray

To work with large datasets, imas2xarray uses xarray to store the data in a more managable format. This notebooks explores how 2D data can be handled, focusing on the psi from the equilibrium ids.

Defining variables

First we must define the relations between the data. This is done via the Variable model.

This gives the name of the variable (user defined), the ids to grab it from, the path to the data. Note that the time dimension can be defined by *, which links this dimension to the variable named time.

2D variables like psi have two dimensions, in this case we use grid/dim1 and grid/dim2. Because these may vary between time slices, we assign them to placeholder dimensions, $dim1 and $dim2. These will be squashed to dim1 and dim2 when imas2xarray loads the variables.

from imas2xarray import Variable

variables = (
    Variable(
        name='dim1',
        ids='equilibrium',
        path='time_slice/*/profiles_2d/0/grid/dim1',
        dims=['time_eq', '$dim1'],
    ),
    Variable(
        name='dim2',
        ids='equilibrium',
        path='time_slice/*/profiles_2d/0/grid/dim2',
        dims=['time_eq', '$dim2'],
    ),
    Variable(
        name='psi',
        ids='equilibrium',
        path='time_slice/*/profiles_2d/0/psi',
        dims=['time_eq', '$dim1', '$dim2'],
    ),
)

As a side note, if we know that dim1 and dim2 do not vary per time slice, we can define the variables in a different way. In this case, we can skip dealing with placeholder dimensions. The difference is that for dim1 and dim2, we do not set the time index (i.e. the path no longer contains the index *) and the dimension name refers to itself.

dim_variables = (
    Variable(
        name='dim1',
        ids='equilibrium',
        path='time_slice/0/profiles_2d/0/grid/dim1',
        dims=['dim1'],
    ),
    Variable(
        name='dim2',
        ids='equilibrium',
        path='time_slice/0/profiles_2d/0/grid/dim2',
        dims=['dim2'],
    ),
    Variable(
        name='psi',
        ids='equilibrium',
        path='time_slice/*/profiles_2d/0/psi',
        dims=['time_eq', 'dim1', 'dim2'],
    ),
)

Loading the data

Whichever option we choose, the resulting datasets now have dim1 and dim2 as dimensions. The data within each group are interpolated to match this grid.

from imas2xarray import to_xarray

path = './1/data'
ids = 'equilibrium'

ds = to_xarray(path, ids=ids, variables=variables)
print(ds)

<xarray.Dataset>
Dimensions:  (time_eq: 1, dim1: 65, dim2: 65)
Coordinates:
  * time_eq  (time_eq) float64 50.04
  * dim1     (dim1) float64 1.65 1.687 1.725 1.762 ... 3.938 3.975 4.013 4.05
  * dim2     (dim2) float64 -1.9 -1.837 -1.773 -1.71 ... 1.96 2.023 2.087 2.15
Data variables:
    psi      (time_eq, dim1, dim2) float64 -6.212 -6.112 -6.012 ... -10.8 -11.03

Grid rebasing

IMAS data may not be on the same grid (i.e. x-values do not correspond between data sets) or use the same time steps. Therefore, the data must be standardized to the same set of reference coordinates so that the grid and time stamps correspond between different data sets. Because this is such a common operation, imas2xarray has helper functions to deal with these special cases. rebase_on_grid helps to rebase on the grid, and rebase_on_time to rebase on the time stamps. standardize_grid_and_time combines these two functions and can make a sequence of datasets consistent.

Now that all datasets have internally consistent dimensions, we can interpolate all datasets to the same reference grid. We could do this using two calls to rebase_on_grid()...

from imas2xarray import rebase_on_grid

paths = (
    './1/data',
    './2/data',
    './3/data',
)

datasets = [to_xarray(path, ids=ids, variables=variables) for path in paths]

for dim in ('dim1', 'dim2'):
    reference_grid = datasets[0][dim].data
    datasets = [rebase_on_grid(ds, coord_dim=dim, new_coords=reference_grid) for ds in datasets]

...but also using xarray.Dataset directly. Here we change the grid to one of our own choosing.

import numpy as np

_ = [
    ds.interp({'dim1': np.linspace(2.0, 4.0, 51), 'dim2': np.linspace(-1.5, 1.5, 51)})
    for ds in datasets
]

Time Standardizing

Sometimes we have datasets with various starting times, but we want to compare them anyway for this you can use the standardize_time() function, which is an in-place operation:

from imas2xarray import rezero_time

for ds in datasets:
    rezero_time(ds, start=0.1, key='time_eq')

Time rebasing

We do the same for the time coordinate using rebase_on_time().

If you know your data have the same time stamps, for example if they are from the same set of simulations, you can skip this step. Here we take the first dataset as the reference.

from imas2xarray import rebase_on_time

reference_time = datasets[0]['time_eq'].data

datasets = [
    rebase_on_time(ds, time_dim='time_eq', new_coords=reference_time) for ds in datasets
]

Data concatenation

Finally, we can concatenate along the run dimension. We set the run coordinates to the name of the data so they can be re-used later.

import xarray as xr

dataset = xr.concat(datasets, 'run')
dataset['run'] = [f'run_{i}' for i in range(len(paths))]

Now we have the data in a nicely structured xarray dataset.

print(dataset)

<xarray.Dataset>
Dimensions:  (run: 3, time_eq: 1, dim1: 65, dim2: 65)
Coordinates:
  * time_eq  (time_eq) float64 0.1
  * dim1     (dim1) float64 1.65 1.687 1.725 1.762 ... 3.938 3.975 4.013 4.05
  * dim2     (dim2) float64 -1.9 -1.837 -1.773 -1.71 ... 1.96 2.023 2.087 2.15
  * run      (run) <U5 'run_0' 'run_1' 'run_2'
Data variables:
    psi      (run, time_eq, dim1, dim2) float64 -6.212 -6.112 ... -10.8 -11.03

Plotting

Now that we have standardized and rebased the grid and time coordinates, plotting and other operations on the data becomes straightforward.

xarray has some built-in functionality to make plots using matplotlib.

dataset['psi'].isel(time_eq=[0]).plot(row='time_eq', col='run');

No description has been provided for this image

Data reduction

To reduce the data along some dimension, we can use dataset.reduce(). This method takes a function as the first argument, and will apply it for each slice in the given dimension. xarray has some shortcuts for common operators, so dataset.reduce(np.mean, dim='run') is equivalent to dataset.mean(dim='run').

mean = dataset.mean(dim='run').isel(time_eq=[0])

mean['psi'].plot(col='time_eq');

This can be used to calculate the uncertainty in different regions of the 2D map.

std = dataset.isel(time_eq=[0]).std(dim='run')

std['psi'].plot(col='time_eq');