clinicadl.data.datasets.Dataset

class clinicadl.data.datasets.Dataset[source]

Abstract class for ClinicaDL datasets, which inherits from torch.utils.data.Dataset, to work with 3D neuroimaging data.

To work properly with ClinicaDL, all datasets must inherit from this class.

See also

BidsDataset

A Dataset to read data organized in a BIDS.

property df: DataFrame

A DataFrame containing metadata on the images present in the dataset.

Each image must have its associated line in the DataFrame, which must contain at least the columns “participant_id” and “session_id”, with the ids (strings) of the participant and the session.

Example

participant_id  session_id   age   sex   diagnosis
sub-001         ses-M000     55.0  M     CN
sub-001         ses-M003     55.0  M     AD
sub-002         ses-M000     62.0  F     MCI
sub-002         ses-M003     62.0  F     AD
sub-003         ses-M000     67.0  F     CN
abstract eval() None[source]

Sets the dataset to evaluation mode.

For example, disabling data augmentation in the transformation pipeline.

abstract train() None[source]

Sets the dataset to training mode.

For example, enabling data augmentation in the transformation pipeline.

get_participant_session_couples() set[tuple[str, str]][source]

Retrieves all (participant, session) pairs in the dataset.

Returns:

set[tuple[str, str]] – The set of (participant, session).

subset(participants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self[source]

To get a subset of the dataset from a list of (participant, session) pairs.

Parameters:

data (Union[DataFrameType, Sequence[tuple[str, str]]]) –

Can be either:

  • a sequence of (participant, session);

  • a pandas.DataFrame (or a path to a TSV file containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named "participant_id" and "session_id" (other columns won’t be considered).

Returns:

Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in data.

abstract get_sample_info(idx: int, column: str) Any[source]

Retrieves information on a given sample in the metadata DataFrame. The information corresponds to the information on the image the sample was extracted from.

Parameters:
  • idx (int) – The index of the sample in the dataset.

  • column (str) – The information to look for, i.e. a column of df.

Returns:

Any – The value of the column for this sample.

abstract __len__() int[source]

Computes the total number of samples in the dataset.

Returns:

int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.

abstract __getitem__(idx: int) SampleT[source]

Retrieves the sample at a given index.

Parameters:

idx (int) – Index of the sample in the dataset.

Returns:

Union[Sample, Sequence[Sample], dict[Any, Sample]] – A structured output containing the processed data and metadata, as a Sample, or a sequence or dictionary of samples.