clinicadl.data.datasets.TensorDataset

class clinicadl.data.datasets.TensorDataset(description_json: ~pathlib.Path | str, data: ~pathlib.Path | str | ~pandas.core.frame.DataFrame | None = None, transforms: ~clinicadl.transforms.handlers.transforms.TransformsHandler = <clinicadl.transforms.handlers.transforms.TransformsHandler object>, columns: ~typing.Sequence[str] | dict[str, ~typing.Callable[[~pandas.core.series.Series], ~pandas.core.series.Series] | None] | None = None, to_load: ~collections.abc.Sequence[str] | None = None)[source]

A Dataset to read images saved as tensors in .pt files.

This dataset enables to load tensors saved with BidsDataset.to_tensors.

Parameters:
  • description_json (PathType) – The path to the .json file saved by BidsDataset.to_tensors that describes the conversion.

  • data (Optional[DataFrameType], default=None) –

    A pandas.DataFrame (or a path to a TSV file containing the DataFrame) with the list of (participant, session) pairs to consider, as well as any other relevant information (e.g. the age of the participants). Only (participant, session) pairs mentioned in this TSV file will be in the TensorDataset.

    If None, all (participant, session) pairs whose images have been converted during this conversion will be considered.

    Warning

    Be careful if you pass a DataFrame with a column named "n_samples". BidsDataset will understand it as the number of samples for each (participant, session) pair.

  • transforms (TransformsHandler, default=TransformsHandler()) –

    Transformation pipeline to apply to the data after loading. The user also specifies here whether to work on images, patches, or slices. See clinicadl.transforms.TransformsHandler.

    Warning

    If transformed images were saved in the .pt files, make sure that you don’t apply these transforms again here (image_transforms should probably be empty in the TransformsHandler here).

  • columns (Optional[ColumnsType], default=None) –

    Columns to get in the DataFrame data and to put in the output Sample.

    Can be passed via:

    • a list of strings (e.g. ["age", "sex"]), corresponding to the names of the columns;

    • or a dictionary (e.g. {"age": <function>, "sex": None}), where the keys are the names of the columns, and the values are functions to apply to the columns. If the function is None, no function will be applied to the column.

    Note

    The potential functions applied to the columns are applied to the whole column. They must take as input a pandas.Series, and return a pandas.Series. For example, it is useful to convert string labels to integer labels for classification.

  • to_load (Optional[Sequence[str]], default=None) – The data to load from the .pt files. Data saved in this files are described in the descriptive .json file. If None, everything inside the files will be loaded.

Examples

bids
├── dataset_description.json
├── metadata.tsv
...
└── derivatives
    └── tensors
        ├── dataset_description.json
        ├── conversions.tsv
        ├── src-T1w_conv-T1WithMasks_description.json
        ├── src-T1w_conv-T1WithMasks_participantsXsessions.tsv
        ├── sub-001
        │   ├── ses-M000
        │   │   └── anat
        │   │       ├── sub-001_ses-M000_src-T1w_conv-T1WithMasks_tensors.json
        │   │       └── sub-001_ses-M000_src-T1w_conv-T1WithMasks_tensors.pt    <- contains the image + 2 masks named 'head' adn 'mni'
        │   ...
        ...

The "metadata.tsv" file looks like:

participant_id  session_id   age   sex   diagnosis
sub-001         ses-M000     55.0  M     control
sub-001         ses-M024     57.0  M     control
sub-002         ses-M000     62.0  F     control
sub-002         ses-M024     64.0  F     patient
sub-003         ses-M000     67.0  F     patient
...
from clinicadl.data.datasets import TensorDataset
from clinicadl.transforms import TransformsHandler, extraction
import pandas as pd

# to convert diagnosis to numeric values
def diagnosis_to_number(column: pd.Series) -> pd.Series:
    encoding = {"CN": 0, "MCI": 1, "AD": 2}
    return column.apply(lambda x: encoding[x])
>>> dataset = TensorDataset(
        description_json="bids/derivatives/tensors/src-T1w_conv-T1WithMasks_description.json",
        data="bids/metadata.tsv",
        columns=["age"],
    )
>>> dataset[0]
Sample(Keys: ('head', 'mni', 'age', 'file_type', 'image_path', 'sample_type', 'sample_position', 'image', 'participant_id', 'session_id'); images: 3)
>>> dataset[0].spatial_shape
(169, 208, 179)
>>> dataset = TensorDataset(
        description_json=bids / "derivatives" / "tensors" / "res-1d3x1d2x1d1_src-T1w_conv-T1Masks_description.json",
        transforms=TransformsHandler(
            extraction=extraction.Patch(patch_size=2),
        ),
        to_load=["head"],
    )
>>> dataset[0]
Sample(Keys: ('head', 'file_type', 'image_path', 'sample_type', 'sample_position', 'image', 'participant_id', 'session_id'); images: 3)
>>> dataset[0].spatial_shape
(64, 64, 64)
__getitem__(idx: int) Sample

Retrieves the sample at a given index.

Parameters:

idx (int) – Index of the sample in the dataset.

Returns:

Sample – A Sample containing the processed data and metadata.

__len__() int

Computes the total number of samples in the dataset.

Returns:

int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.

describe() dict[str, Any]

Returns a description of the dataset.

Returns:

dict[str, Any] – A dictionary describing the dataset.

property df

The DataFrame passed in data, with its columns processed with the functions passed in columns.

eval() None

Sets the dataset to evaluation mode.

For example, disabling data augmentation in the transformation pipeline.

get_participant_session_couples() set[tuple[str, str]]

Retrieves all (participant, session) pairs in the dataset.

Returns:

set[tuple[str, str]] – The set of (participant, session).

get_sample_info(idx: int, column: str) Any

Retrieves information on a given sample in the metadata DataFrame. The information corresponds to the information on the image the sample was extracted from.

Parameters:
  • idx (int) – The index of the sample in the dataset.

  • column (str) – The information to look for, i.e. a column of df.

Returns:

Any – The value of the column for this sample.

sanity_check(spatial_checks: Iterable[str | SpatialCheck] | None = ('affine', 'shape', 'global_spacing')) None

Performs a sanity check on the current dataset.

It will iterate over the whole dataset to check if images are loaded and transformed correctly, and potentially perform spatial checks on the loaded images.

Parameters:

spatial_checks (Optional[Iterable[str | SpatialCheck]], default=("affine", "shape", "global_spacing")) –

Spatial checks to perform on the images:

  • "spacing": checks intra-sample voxel spacing consistency, i.e. that all the images and masks in the Sample output by the current dataset have the same voxel spacing.

  • "affine": checks intra-sample affine matrix consistency (so it includes "spacing").

  • "shape": checks intra-sample spatial shape consistency.

  • "global_spacing": checks inter-sample voxel spacing consistency, i.e. that all the Samples in the dataset have the same voxel spacing (so it includes "spacing").

  • global_shape": checks inter-sample spatial shape consistency (so it includes "shape").

If None, no spatial check performed.

sort() None

Sorts the dataset by (participant, session) pairs (alphabetic order).

Examples

>>> dataset[0].participant_id
'sub-001'
>>> dataset[1].participant_id
'sub-000'
>>> dataset.sort()
>>> dataset[0].participant_id
'sub-000'
subset(participants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self

To get a subset of the dataset from a list of (participant, session) pairs.

Parameters:

data (Union[DataFrameType, Sequence[tuple[str, str]]]) –

Can be either:

  • a sequence of (participant, session);

  • a pandas.DataFrame (or a path to a TSV file containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named "participant_id" and "session_id" (other columns won’t be considered).

Returns:

Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in data.

train() None

Sets the dataset to training mode.

For example, enabling data augmentation in the transformation pipeline.