from __future__ import annotations
2D datasets in imas2xarray
To work with large datasets, imas2xarray uses xarray to store the data in a more managable format. This notebooks explores how 2D data can be handled, focusing on the psi from the equilibrium ids.
Defining variables
First we must define the relations between the data. This is done via the Variable
model.
This gives the name of the variable (user defined), the ids to grab it from, the path to the data. Note that the time dimension can be defined by *
, which links this dimension to the variable named time
.
2D variables like psi
have two dimensions, in this case we use grid/dim1
and grid/dim2
. Because these may vary between time slices, we assign them to placeholder dimensions, $dim1
and $dim2
. These will be squashed to dim1
and dim2
when imas2xarray loads the variables.
from imas2xarray import Variable
variables = (
Variable(
name='dim1',
ids='equilibrium',
path='time_slice/*/profiles_2d/0/grid/dim1',
dims=['time_eq', '$dim1'],
),
Variable(
name='dim2',
ids='equilibrium',
path='time_slice/*/profiles_2d/0/grid/dim2',
dims=['time_eq', '$dim2'],
),
Variable(
name='psi',
ids='equilibrium',
path='time_slice/*/profiles_2d/0/psi',
dims=['time_eq', '$dim1', '$dim2'],
),
)
As a side note, if we know that dim1
and dim2
do not vary per time slice, we can define the variables in a different way. In this case, we can skip dealing with placeholder dimensions. The difference is that for dim1
and dim2
, we do not set the time index (i.e. the path no longer contains the index *
) and the dimension name refers to itself.
dim_variables = (
Variable(
name='dim1',
ids='equilibrium',
path='time_slice/0/profiles_2d/0/grid/dim1',
dims=['dim1'],
),
Variable(
name='dim2',
ids='equilibrium',
path='time_slice/0/profiles_2d/0/grid/dim2',
dims=['dim2'],
),
Variable(
name='psi',
ids='equilibrium',
path='time_slice/*/profiles_2d/0/psi',
dims=['time_eq', 'dim1', 'dim2'],
),
)
Loading the data
Whichever option we choose, the resulting datasets now have dim1
and dim2
as dimensions. The data within each group are interpolated to match this grid.
from imas2xarray import to_xarray
path = './1/data'
ids = 'equilibrium'
ds = to_xarray(path, ids=ids, variables=variables)
print(ds)
Grid rebasing
IMAS data may not be on the same grid (i.e. x-values do not correspond between data sets) or use the same time steps. Therefore, the data must be standardized to the same set of reference coordinates so that the grid and time stamps correspond between different data sets. Because this is such a common operation, imas2xarray has helper functions to deal with these special cases. rebase_on_grid
helps to rebase on the grid, and rebase_on_time
to rebase on the time stamps. standardize_grid_and_time
combines these two functions and can make a sequence of datasets consistent.
Now that all datasets have internally consistent dimensions, we can interpolate all datasets to the same reference grid. We could do this using two calls to rebase_on_grid()
...
from imas2xarray import rebase_on_grid
paths = (
'./1/data',
'./2/data',
'./3/data',
)
datasets = [to_xarray(path, ids=ids, variables=variables) for path in paths]
for dim in ('dim1', 'dim2'):
reference_grid = datasets[0][dim].data
datasets = [rebase_on_grid(ds, coord_dim=dim, new_coords=reference_grid) for ds in datasets]
...but also using xarray.Dataset
directly. Here we change the grid to one of our own choosing.
import numpy as np
_ = [
ds.interp({'dim1': np.linspace(2.0, 4.0, 51), 'dim2': np.linspace(-1.5, 1.5, 51)})
for ds in datasets
]
Time Standardizing
Sometimes we have datasets with various starting times, but we want to compare them anyway
for this you can use the standardize_time()
function, which is an in-place operation:
from imas2xarray import rezero_time
for ds in datasets:
rezero_time(ds, start=0.1, key='time_eq')
Time rebasing
We do the same for the time coordinate using rebase_on_time()
.
If you know your data have the same time stamps, for example if they are from the same set of simulations, you can skip this step. Here we take the first dataset as the reference.
from imas2xarray import rebase_on_time
reference_time = datasets[0]['time_eq'].data
datasets = [
rebase_on_time(ds, time_dim='time_eq', new_coords=reference_time) for ds in datasets
]
Data concatenation
Finally, we can concatenate along the run dimension. We set the run coordinates to the name of the data so they can be re-used later.
import xarray as xr
dataset = xr.concat(datasets, 'run')
dataset['run'] = [f'run_{i}' for i in range(len(paths))]
Now we have the data in a nicely structured xarray dataset.
print(dataset)
Plotting
Now that we have standardized and rebased the grid and time coordinates, plotting and other operations on the data becomes straightforward.
xarray
has some built-in functionality to make plots using matplotlib.
dataset['psi'].isel(time_eq=[0]).plot(row='time_eq', col='run');
Data reduction
To reduce the data along some dimension, we can use dataset.reduce()
. This method takes a function as the first argument, and will apply it for each slice in the given dimension. xarray
has some shortcuts for common operators, so dataset.reduce(np.mean, dim='run')
is equivalent to dataset.mean(dim='run')
.
mean = dataset.mean(dim='run').isel(time_eq=[0])
mean['psi'].plot(col='time_eq');
This can be used to calculate the uncertainty in different regions of the 2D map.
std = dataset.isel(time_eq=[0]).std(dim='run')
std['psi'].plot(col='time_eq');