clinicadl.data.datasets.PairedDataset

class clinicadl.data.datasets.PairedDataset(datasets: Iterable[Dataset])[source]

For pairing multiple Dataset (e.g., different modalities).

Pairing datasets means uniquely associating images across the datasets. The keys of this association are the (participant, session) pairs present in the underlying datasets. So, all datasets must contain the same (participant, session) pairs.

Furthermore, for a (participant, session) pair, all the datasets must have the same number of samples: if one of your dataset contains whole images and a second one contains a single slice of the images, it’s ok; but if the second dataset now contains two slices of the images, this will raise an error because the second dataset will thus be two times bigger than the first one, and the two datasets cannot be paired.

A PairedDataset returns a tuple of Sample (one for each underlying dataset).

Parameters:

datasets (Iterable[Dataset]) – The Datasets to pair.

Raises:
  • ValueError – If the datasets contain duplicated (participant, session) pairs. This is an issue because it will prevent PairedDataset from finding a bijective mapping between the datasets.

  • ValueError – If there is a mismatch of (participant, session) pairs across the datasets. An error will also be raised if the number of samples per image is not the same across datasets.

Examples

bids
├── sub-001
│   └── ses-M000
│   │   ├── pet
│   │   │   └── sub-001_ses-M000_trc-18FAV45_pet.nii.gz
│   │   └── anat
│   │       └── sub-001_ses-M000_T1w.nii.gz
    ...
...
from clinicadl.data.datasets import BidsDataset, PairedDataset
from clinicadl.io.bids import BidsFileType

bids_t1 = BidsDataset("bids", file_type=BidsFileType(data_type="anat", suffix="T1w"))
bids_pet = BidsDataset("bids", file_type=BidsFileType(data_type="pet", suffix="pet"))

multimodal_dataset = PairedDataset([bids_t1, bids_pet])
>>> len(bids_t1)
4
>>> len(bids_pet)
4
>>> len(multimodal_dataset)
4
>>> sample = multimodal_dataset[0]
>>> len(sample)
2
>>> sample[0].file_type
BidsFileType(suffix=re.compile('T1w'), data_type=re.compile('anat'), extension=re.compile('.nii.*'), with_entities=None, without_entities=None, description=None)
>>> sample[1].file_type
BidsFileType(suffix=re.compile('pet'), data_type=re.compile('pet'), extension=re.compile('.nii.*'), with_entities=None, without_entities=None, description=None)
property df

The output of the merger of the metadata DataFrames of the underlying datasets.

__len__() int[source]

Computes the total number of samples in the dataset.

Returns:

int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.

__getitem__(idx: int) tuple[Sample, ...][source]

Retrieves the collection of samples at a given index.

Parameters:

idx (int) – Index of the samples in the dataset.

Returns:

tuple[Sample, …] – A structured output containing the processed data and metadata from each dataset of the PairedDataset, in a tuple of Sample.

get_sample_info(idx: int, column: str) Any[source]

Retrieves information on a given sample.

It will look for column in the DataFrame of each underlying dataset. If several values are found, it will raise an error.

Parameters:
  • idx (int) – The index of the sample in the dataset.

  • column (str) – The information to look for, i.e. a column of df.

Returns:

Any – The value for this sample.

Raises:

RuntimeError – If different values are found across the datasets.

eval() None

Sets all the underlying datasets in evaluation mode.

get_participant_session_couples() set[tuple[str, str]]

Retrieves all (participant, session) pairs in the dataset.

Returns:

set[tuple[str, str]] – The set of (participant, session).

subset(particpants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self

To get a subset of the dataset from a list of (participant, session) pairs.

Parameters:

data (Union[DataFrameType, Sequence[tuple[str, str]]]) –

Can be either:

  • a sequence of (participant, session);

  • a pandas.DataFrame (or a path to a TSV file containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named "participant_id" and "session_id" (other columns won’t be considered).

Returns:

Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in data.

train() None

Sets all the underlying datasets in training mode.