1.2.2. Reading neuroimaging datasets: advanced tips¶
If you are familiar with BidsDataset (see previous section),
you may find the following tools useful for speeding up data loading, joining multiple datasets, or reading non-BIDS-compliant
datasets.
1.2.2.1. Converting NIfTI images to tensors¶
Opening a NIfTI file is comparatively slow. When you iterate over a dataset many
times, which is exactly what is done during training phase, it pays off to convert your images to
PyTorch tensors once and read from the .pt files afterwards. This is what
BidsDataset.to_tensors
does:
dataset = BidsDataset(
bids="bids_directory",
file_type=BidsFileType(data_type="anat", suffix="T1w"),
)
tensor_dataset = dataset.to_tensors(conversion_name="T1")
The tensors are written to a BIDS derivative named “tensors”`, together with a .json
file describing the conversion. to_tensors returns a
TensorDataset that you can use exactly like the
original BidsDataset, only faster to load.
You can also save the transformed images (save_transforms=True) so that the
image_transforms of your TransformsHandler are
applied once during the conversion instead of on every load. Conversion
can be parallelised with n_proc.
To reopen a previously converted dataset, point a
TensorDataset at the description .json file:
from clinicadl.data.datasets import TensorDataset
tensor_dataset = TensorDataset(
description_json="bids_directory/derivatives/tensors/src-T1w_conv-T1_description.json",
data="bids_directory/metadata.tsv",
)
Warning
If you saved the transformed images during the conversion, do not apply the same
image_transforms again when re-reading the data: the image_transforms of
the TransformsHandler you pass to TensorDataset should usually be empty.
1.2.2.2. Joining multiple datasets¶
You may need to combine several datasets, for instance images coming from different cohorts, or different modalities for the same participants. ClinicaDL offers three ways to do so.
Concatenating¶
ConcatDataset concatenates datasets end to end. The
length of the result is the sum of the lengths of its parts. Use it for instance to gather images
coming from different cohorts (i.e. with different participants).
from clinicadl.data.datasets import BidsDataset, ConcatDataset
from clinicadl.io.bids import BidsFileType
bids_1 = BidsDataset("bids_1", file_type=BidsFileType(data_type="pet", suffix="pet"))
bids_2 = BidsDataset("bids_2", file_type=BidsFileType(data_type="pet", suffix="pet"))
full_dataset = ConcatDataset([bids_1, bids_2])
>>> len(bids_1), len(bids_2), len(full_dataset)
(4, 8, 12)
Pairing¶
PairedDataset associates datasets through a
unique mapping keyed by the (participant, session) pairs. It is the tool for
multimodal data: each sample becomes a tuple holding the corresponding image from
each dataset. All datasets must therefore contain exactly the same
(participant, session) pairs and the same number of samples per image. Consequently,
PairedDataset does not support participants who are missing a modality.
from clinicadl.data.datasets import BidsDataset, PairedDataset
from clinicadl.io.bids import BidsFileType
bids_t1 = BidsDataset("bids_directory", file_type=BidsFileType(data_type="anat", suffix="T1w"))
bids_pet = BidsDataset("bids_directory", file_type=BidsFileType(data_type="pet", suffix="pet"))
multimodal_dataset = PairedDataset([bids_t1, bids_pet])
>>> sample = multimodal_dataset[0]
>>> len(sample) # one Sample per modality
2
Stacking¶
UnpairedDataset also returns a tuple of samples,
but associates the datasets randomly rather than through a fixed mapping. The
datasets need not share their (participant, session) pairs. This is useful, for
example, to feed a generative model with images that should not be paired. The
random association can be re-drawn for each epoch with
set_epoch(), and the oversample
argument controls how datasets of different sizes are reconciled.
1.2.2.3. Non-BIDS dataset?¶
If, for some reasons, none of the previous dataset classes is able to read your data,
you can still create your own dataset by inheriting from clinicadl.data.datasets.Dataset.
You now know how to read your data and assemble it into datasets. The next section
covers the transforms argument we have only mentioned so far: how to extract
patches and slices, and how to preprocess and augment your images.