clinicadl.data.datasets.ConcatDataset

class clinicadl.data.datasets.ConcatDataset(datasets: Iterable[Dataset])[source]

For assembling multiple Dataset (e.g., images coming from different BIDS datasets).

ConcatDataset concatenates the input datasets, so the length of the new dataset will be equal to the sum of the lengths of each individual dataset.

Parameters:

datasets (Iterable[Dataset]) – The Datasets to concatenate.

Examples

bids_1
├── sub-001
│   ├── ses-M000
│   │   └── pet
│   │       └── sub-001_ses-M000_pet.nii.gz
│   ...
...

bids_2
├── sub-A
│   ├── ses-M003
│   │   └── pet
│   │       └── sub-A_ses-M003_pet.nii.gz
│   ...
...
from clinicadl.data.datasets import BidsDataset, ConcatDataset
from clinicadl.io.bids import BidsFileType

bids_1 = BidsDataset("bids_1", file_type=BidsFileType(data_type="pet", suffix="pet"))
bids_2 = BidsDataset("bids_2", file_type=BidsFileType(data_type="pet", suffix="pet"))

full_dataset = ConcatDataset([bids_1, bids_2])
>>> len(bids_1)
4
>>> len(bids_2)
8
>>> len(full_dataset)
12
>>> full_dataset[0].participant_id, full_dataset[0].session_id
('sub-001', 'ses-M000')
>>> full_dataset[4].participant_id, full_dataset[4].session_id
('sub-A', 'ses-M003')
property df

The concatenation of the two underlying metadata DataFrames.

subset(particpants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self[source]

To get a subset of the dataset from a list of (participant, session) pairs.

Parameters:

data (Union[DataFrameType, Sequence[tuple[str, str]]]) –

Can be either:

  • a sequence of (participant, session);

  • a pandas.DataFrame (or a path to a TSV file containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named "participant_id" and "session_id" (other columns won’t be considered).

Returns:

Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in data.

get_sample_info(idx: int, column: str) Any[source]

Retrieves information on a given sample in the metadata DataFrame. The information corresponds to the information on the image the sample was extracted from.

Parameters:
  • idx (int) – The index of the sample in the dataset.

  • column (str) – The information to look for, i.e. a column of df.

Returns:

Any – The value of the column for this sample.

__len__() int[source]

Computes the total number of samples in the dataset.

Returns:

int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.

__getitem__(idx: int) Sample[source]

Retrieves the sample at a given index.

Parameters:

idx (int) – Index of the sample in the dataset.

Returns:

Sample – A Sample containing the processed data and metadata.

eval() None

Sets all the underlying datasets in evaluation mode.

get_participant_session_couples() set[tuple[str, str]]

Retrieves all (participant, session) pairs in the dataset.

Returns:

set[tuple[str, str]] – The set of (participant, session).

sanity_check(spatial_checks: Iterable[str | SpatialCheck] | None = ('affine', 'shape', 'global_spacing')) None

Performs a sanity check on the current dataset.

It will iterate over the whole dataset to check if images are loaded and transformed correctly, and potentially perform spatial checks on the loaded images.

Parameters:

spatial_checks (Optional[Iterable[str | SpatialCheck]], default=("affine", "shape", "global_spacing")) –

Spatial checks to perform on the images:

  • "spacing": checks intra-sample voxel spacing consistency, i.e. that all the images and masks in the Sample output by the current dataset have the same voxel spacing.

  • "affine": checks intra-sample affine matrix consistency (so it includes "spacing").

  • "shape": checks intra-sample spatial shape consistency.

  • "global_spacing": checks inter-sample voxel spacing consistency, i.e. that all the Samples in the dataset have the same voxel spacing (so it includes "spacing").

  • global_shape": checks inter-sample spatial shape consistency (so it includes "shape").

If None, no spatial check performed.

train() None

Sets all the underlying datasets in training mode.