clinicadl.data.datasets.TensorDataset¶
- class clinicadl.data.datasets.TensorDataset(description_json: ~pathlib.Path | str, data: ~pathlib.Path | str | ~pandas.core.frame.DataFrame | None = None, transforms: ~clinicadl.transforms.handlers.transforms.TransformsHandler = <clinicadl.transforms.handlers.transforms.TransformsHandler object>, columns: ~typing.Sequence[str] | dict[str, ~typing.Callable[[~pandas.core.series.Series], ~pandas.core.series.Series] | None] | None = None, to_load: ~collections.abc.Sequence[str] | None = None)[source]¶
A
Datasetto read images saved as tensors in.ptfiles.This dataset enables to load tensors saved with
BidsDataset.to_tensors.- Parameters:
description_json (PathType) – The path to the
.jsonfile saved byBidsDataset.to_tensorsthat describes the conversion.data (Optional[DataFrameType], default=None) –
A
pandas.DataFrame(or a path to aTSVfile containing the DataFrame) with the list of (participant, session) pairs to consider, as well as any other relevant information (e.g. the age of the participants). Only (participant, session) pairs mentioned in this TSV file will be in theTensorDataset.If
None, all (participant, session) pairs whose images have been converted during this conversion will be considered.Warning
Be careful if you pass a DataFrame with a column named
"n_samples".BidsDatasetwill understand it as the number of samples for each (participant, session) pair.transforms (TransformsHandler, default=TransformsHandler()) –
Transformation pipeline to apply to the data after loading. The user also specifies here whether to work on images, patches, or slices. See
clinicadl.transforms.TransformsHandler.Warning
If transformed images were saved in the
.ptfiles, make sure that you don’t apply these transforms again here (image_transformsshould probably be empty in theTransformsHandlerhere).columns (Optional[ColumnsType], default=None) –
Columns to get in the DataFrame
dataand to put in the outputSample.Can be passed via:
a list of strings (e.g.
["age", "sex"]), corresponding to the names of the columns;or a dictionary (e.g.
{"age": <function>, "sex": None}), where the keys are the names of the columns, and the values are functions to apply to the columns. If the function isNone, no function will be applied to the column.
Note
The potential functions applied to the columns are applied to the whole column. They must take as input a
pandas.Series, and return apandas.Series. For example, it is useful to convert string labels to integer labels for classification.to_load (Optional[Sequence[str]], default=None) – The data to load from the
.ptfiles. Data saved in this files are described in the descriptive.jsonfile. IfNone, everything inside the files will be loaded.
Examples
bids ├── dataset_description.json ├── metadata.tsv ... └── derivatives └── tensors ├── dataset_description.json ├── conversions.tsv ├── src-T1w_conv-T1WithMasks_description.json ├── src-T1w_conv-T1WithMasks_participantsXsessions.tsv ├── sub-001 │ ├── ses-M000 │ │ └── anat │ │ ├── sub-001_ses-M000_src-T1w_conv-T1WithMasks_tensors.json │ │ └── sub-001_ses-M000_src-T1w_conv-T1WithMasks_tensors.pt <- contains the image + 2 masks named 'head' adn 'mni' │ ... ... The "metadata.tsv" file looks like: participant_id session_id age sex diagnosis sub-001 ses-M000 55.0 M control sub-001 ses-M024 57.0 M control sub-002 ses-M000 62.0 F control sub-002 ses-M024 64.0 F patient sub-003 ses-M000 67.0 F patient ...from clinicadl.data.datasets import TensorDataset from clinicadl.transforms import TransformsHandler, extraction import pandas as pd # to convert diagnosis to numeric values def diagnosis_to_number(column: pd.Series) -> pd.Series: encoding = {"CN": 0, "MCI": 1, "AD": 2} return column.apply(lambda x: encoding[x])
>>> dataset = TensorDataset( description_json="bids/derivatives/tensors/src-T1w_conv-T1WithMasks_description.json", data="bids/metadata.tsv", columns=["age"], ) >>> dataset[0] Sample(Keys: ('head', 'mni', 'age', 'file_type', 'image_path', 'sample_type', 'sample_position', 'image', 'participant_id', 'session_id'); images: 3) >>> dataset[0].spatial_shape (169, 208, 179)
>>> dataset = TensorDataset( description_json=bids / "derivatives" / "tensors" / "res-1d3x1d2x1d1_src-T1w_conv-T1Masks_description.json", transforms=TransformsHandler( extraction=extraction.Patch(patch_size=2), ), to_load=["head"], ) >>> dataset[0] Sample(Keys: ('head', 'file_type', 'image_path', 'sample_type', 'sample_position', 'image', 'participant_id', 'session_id'); images: 3) >>> dataset[0].spatial_shape (64, 64, 64)
See also
- __len__() int¶
Computes the total number of samples in the dataset.
- Returns:
int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.
- describe() dict[str, Any]¶
Returns a description of the dataset.
- Returns:
dict[str, Any] – A dictionary describing the dataset.
- property df¶
The DataFrame passed in
data, with its columns processed with the functions passed incolumns.
- eval() None¶
Sets the dataset to evaluation mode.
For example, disabling data augmentation in the transformation pipeline.
- get_participant_session_couples() set[tuple[str, str]]¶
Retrieves all (participant, session) pairs in the dataset.
- Returns:
set[tuple[str, str]] – The set of (participant, session).
- get_sample_info(idx: int, column: str) Any¶
Retrieves information on a given sample in the metadata DataFrame. The information corresponds to the information on the image the sample was extracted from.
- sanity_check(spatial_checks: Iterable[str | SpatialCheck] | None = ('affine', 'shape', 'global_spacing')) None¶
Performs a sanity check on the current dataset.
It will iterate over the whole dataset to check if images are loaded and transformed correctly, and potentially perform spatial checks on the loaded images.
- Parameters:
spatial_checks (Optional[Iterable[str | SpatialCheck]], default=("affine", "shape", "global_spacing")) –
Spatial checks to perform on the images:
"spacing": checks intra-sample voxel spacing consistency, i.e. that all the images and masks in theSampleoutput by the current dataset have the same voxel spacing."affine": checks intra-sample affine matrix consistency (so it includes"spacing")."shape": checks intra-sample spatial shape consistency."global_spacing": checks inter-sample voxel spacing consistency, i.e. that all theSamplesin the dataset have the same voxel spacing (so it includes"spacing").”
global_shape": checks inter-sample spatial shape consistency (so it includes"shape").
If
None, no spatial check performed.
- sort() None¶
Sorts the dataset by (participant, session) pairs (alphabetic order).
Examples
>>> dataset[0].participant_id 'sub-001' >>> dataset[1].participant_id 'sub-000' >>> dataset.sort() >>> dataset[0].participant_id 'sub-000'
- subset(participants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self¶
To get a subset of the dataset from a list of (participant, session) pairs.
- Parameters:
data (Union[DataFrameType, Sequence[tuple[str, str]]]) –
Can be either:
a sequence of (participant, session);
a
pandas.DataFrame(or a path to aTSVfile containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named"participant_id"and"session_id"(other columns won’t be considered).
- Returns:
Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in
data.