Overview

Design Principles

The design of the data model tries to draw on similarities of different data types and structures and and come up with entities that are as generic and versatile as meaningful. At the same time we aim for clearly established links between different entities to keep the model as expressive as possible.

Most entities of the NIX-model have a name and a type field which are meant to provide information about the entity. While the name can be freely chose, the type is meant to provide semantic information about the entity and we aim at definitions of different types. Via the type, the generic entities can become domain specific.

For the electrophysiology disciplines of the neuroscience, an INCF working groups has set out to define such data types. For more information see here

Creating a file

So far we have implemented the nix model only for the HDF5 file format. In order to store data in a file we need to create one.

import nixio as nix

nix_file = nix.File.open('example.h5', nix.FileMode.Overwrite)

The File entity is the root of this document and it has only two children the data and metadata nodes. You may want to use the hdfview tool to open the file and look at it. Of course you can access both parts using the File API.

All information directly related to a chunk of data is stored in the data node as children of a top-level entity called Block. A Block is a grouping element that can represent many things. For example it can take up everything that was recorded in the same session. Therefore, the Block has a name and a type.

block = nix_file.create_block("Test block", "nix.session")

Names can be freely chosen. Duplication of names on the same hierarchy-level is not allowed. In this example creating a second Block with the very same name leads to an error. Names must not contain ‘/’ characters since they are path separators in the HDF5 file. To avoid collisions across files every created entity has an unique id (UUID).

block.id
'017d7764-173b-4716-a6c2-45f6d37ddb52'

Storing data

The heart of our data model is an entity called DataArray. This is the entity that actually stores all data. It can take n-dimensional arrays and provides sufficient information to create a basic plot of the data. To achieve this, one essential part is to define what kind of data is stored. Hence, every dimension of the stored data must be defined using the available Dimension descriptors (below). The following code snippets show how to create a DataArray and how to store data in it.

# create a DataArray and store data in it
data = block.create_data_array("my data", "nix.sampled", data=some_numpy_array)

Using this call will create a DataArray, set name and type, set the dataType according to the dtype of the passed data, and store the data in the file. You can also create empty DataArrays to take up data-to-be-recorded. In this case you have to provide the space that will be needed in advance.

import numpy as np
# create an empty DataArray to store 2x1000 values
data = block.create_data_array("my data", "nix.sampled", dtype=nix.DataType.Double, shape=(2, 1000))
some_numpy_array = np.random.randn(2, 1000)
data.write_direct(some_numpy_array)

If you do not know the size of the data in advance, you can append data to an already existing DataArray. Beware: Though it is possible to extend the data, it is not possible to change the dimensionality (rank) of the data afterwards.

# create an empty DataArray to store 2x1000 values
data = block.create_data_array("my data", "nix.sampled", dtype=nix.DataType.Double, shape=(2, 1000))
some_numpy_array = np.random.randn(2, 1000)
data[:, :] = some_numpy_array
some_more_data = np.random.randn(2, 10)
data.data_extent((2, 1010))
data[:, 1000:] = some_more_data

Dimension descriptors

In the above examples we have created DataArray entities that are used to store the data. The goal of our model design is that the data containing structures carry enough information to create a basic plot. Let’s assume a time-series of data needs to be stored: The data is just a vector of measurements (e.g. voltages). The data would be plotted as a line-plot. We thus need to define the x- and the y-axis of the plot. The y- or value axis is defined by setting the label and the unit properties of the DataArray, the x-axis needs a dimension descriptor. In the nix model three different dimension descriptors are defined. SampledDimension, RangeDimension, and SetDimension which are used for (i) data that has been sampled in space or time in regular intervals, (ii) data that has been sampled in irregular intervals, and (iii) data that belongs to categories.

sample_interval = 0.001 # s
sinewave = np.sin(np.arange(0, 1.0, sample_interval) * 2 * np.pi)
data = block.create_data_array("sinewave", "nix.regular_sampled", data=sinewave)
data.label = "voltage"
data.unit = "mV"
# define the time dimension of the data
dim = data.append_sampled_dimension(sample_interval)
dim.label = "time"
dim.unit = "s"

The SampledDimension can also be used to desribe space dimensions, e.g. in case of images.

If the data was sampled at irregular intervals the sample points of the x-axis are defined using the ticks property of a RangeDimension.

sample_times = [1.0, 3.0, 4.2, 4.7, 9.6]
dim = data.append_range_dimension(sample_times)
dim.label = "time"
dim.unit = "s"

Finally, some data belongs into categories which do not necessarily have a natural order. In these cases a SetDimension is used. This descriptor can store for each category an optional label.

observations = [0, 0, 5, 20, 45, 40, 28, 12, 2, 0, 1, 0]
categories = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
              'Jul', 'Aug','Sep','Oct','Nov', 'Dec']
data = block.create_data_array("observations", "nix.histogram", data=observations)
dim = data.append_set_dimension()
dim.labels = categories

Annotate regions in the data

Annotating points of regions of interest is one of the key features of the nix data-model. There are two entities for this purpose: (i) the Tag is used for single points or regions while the (ii) MultiTag is used to mark multiple of these. Tags have one or many positions and extents which define the point or the region in the referenced DataArray. Further they can have Features to store additional information about the positions (see tutorials below).

Tag

The tag is a relatively simple structure directly storing the position the tag points and the, optional, extent of this region. Each of these are vectors of a length matching the dimensionality of the referenced data.

position = [10, 10]
extent = [5, 20]
tag = block.create_tag('interesting part', 'nix.roi', position)
tag.extent = extent
# finally, add the referenced data to this tag
tag.references.append(data)

MuliTag

MultiTags are made to tag multiple points (regions) at once. The main difference to the Tag is that position and extent are stored in DataArray entities. These entities must be 2-D. Both dimensions are SetDimensions. The first dimension represents the individual positions, the second dimension takes the coordinates in the referenced n-dimensional DataArray.

# fake data
frame = np.random.randn(100, 100)
data = block.create_data_array('random image', 'nix.image', data=frame)
dim_x = data.append_sampled_dimension(1.0)
dim_x.label = 'x'
dim_y = data.append_sampled_dimension(1.0)
dim_y.label = 'y'
# positions array must be 2D
p = np.zeros((3, 2)) # 1st dim, represents the positions, 2nd the coordinates
p[1, :] = [10, 10]
p[2, :] = [20, 10]
positions = block.create_data_array('special points', 'nix.positions', data=p)
positions.append_set_dimension()
dim = positions.append_set_dimension()
dim.labels = ['x', 'y']
# create a multi tag
tag = block.create_multi_tag('interesting points', 'nix.multiple_roi', positions)
tag.references.append(data)

Adding further information

The tags establish links between datasets. If one needs to attach further information to each of the regions defined by the tag, one can add Features to them. A Feature references a DataArray as its data and specifies with the link_type how the link has to be interpreted. The link_type can either be tagged, indexed, or untagged indicating that the tag should be applied also to the feature data (tagged), for each position given in the tag, a slice of the feature data (ith index along the first dimension) is the feature (indexed), or all feature data applies for all positions (untagged).

Let’s say we want to give each point a name, we can create a feature like this:

spot_names = block.create_data_array('spot ids', 'nix.feature', dtype=nix.DataType.Int8, data=[1, 2])
spot_names.append_set_dimension()
feature = tag.create_feature(spot_names, nix.LinkType.Indexed)

We could also say that each point in the tagged data (e.g. a matrix of measurements) has a corresponding point in an input matrix.

input_matrix = np.random.random(data.shape)
input_data = block.create_data_array('input matrix', 'nix.feature', data=input_matrix)
dim_x = input_data.append_sampled_dimension(1.0)
dim_x.label = 'x'
dim_y = input_data.append_sampled_dimension(1.0)
dim_y.label = 'y'
tag.create_feature(input_data, nix.LinkType.Tagged)

Finally, one could need to attach the same information to all positions defined in the tag. In this case the feature is untagged

common_feature = block.create_data_array('common feature', 'nix.feature', data=some_common_data)
tag.create_feature(common_feature, nix.LinkType.Untagged)

Defining the Source of the data

In cases in which we want to store where the data originates Source entities can be used. Almost all entities of the NIX-model can have Sources. For example, if the recorded data originates from experiments done with one specific experimental subject. Sources have a name and a type and can have some definition.

subject = block.create_source('subject A', 'nix.experimental_subject')
subject.definition = 'The experimental subject used in this experiment'
data.sources.append(subject)

Sources may depend on other Sources. For example, in an electrophysiological experiment we record from different cells in the same brain region of the same animal. To represent this hierarchy, Sources can be nested, create a tree-like structure.

subject = block.create_source('subject A', 'nix.experimental_subject')
brain_region = subject.create_source('hippocampus', 'nix.experimental_subject')
cell_a = brain_region.create_source('Cell 1', 'nix.experimental_subject')
cell_b = brain_region.create_source('Cell 2', 'nix.experimental_subject')

Arbitrary metadata

The entities discussed so far carry just enough information to get a basic understanding of the stored data. Often much more information than that is required. Storing additional metadata is a central part of the NIX concept. We use a slightly modified version of the odML data model for metadata to store additional information. In brief: the model consists of Sections that contain Properties which in turn contain one or more Values. Again, Sections can be nested to represent logical dependencies in the hierarchy of a tree. While all data entities discussed above are children of Block entities, the metadata lives parallel to the Blocks. The idea behind this is that several blocks may refer to the same metadata, or, the other way round the metadata applies to data entities in several blocks. The types used for the Sections in the following example are defined in the odml terminologies

Most of the data entities can link to metadata sections.

sec = nix_file.create_section('recording session', 'odml.recording')
sec.create_property('experimenter', nix.Value('John Doe'))
sec.create_property('recording date', nix.Value('2014-01-01'))
subject = sec.create_section('subject', 'odml.subject')
subject.create_property('id', nix.Value('mouse xyz'))
cell = subject.create_section('cell', 'odml.cell')
v = nix.Value(-64.5)
v.uncertainty = 2.25
p = cell.create_property('resting potential', v)
p.unit = 'mV'
# set the recording block metadata
block.metadata = sec

Units

In NIX we accept only SI units (plus dB, %) wherever units can be given. We also accept compound units like mV/cm. Units are most of the times handled transparently. That is, when you tag a region of data that has been specified with a time axis in seconds and use e.g. the tag.retrieve_data method to get this data slice, the API will handle unit scaling. The correct data will be returned even if the tag’s position is given in ms.

x_positions=[2, 4, 6, 8, 10, 12]
tag=block.create_tag('unit example', 'nix.sampled', x_positions)

#single SI unit is supported like mV,cm etc.
tag.units=["cm"]

#for compound units we can do
tag.units=["mV/cm"]