clinicadl.data.datasets.PairedDataset¶
- class clinicadl.data.datasets.PairedDataset(datasets: Iterable[Dataset])[source]¶
For pairing multiple
Dataset(e.g., different modalities).Pairing datasets means uniquely associating images across the datasets. The keys of this association are the (participant, session) pairs present in the underlying datasets. So, all datasets must contain the same (participant, session) pairs.
Furthermore, for a (participant, session) pair, all the datasets must have the same number of samples: if one of your dataset contains whole images and a second one contains a single slice of the images, it’s ok; but if the second dataset now contains two slices of the images, this will raise an error because the second dataset will thus be two times bigger than the first one, and the two datasets cannot be paired.
A
PairedDatasetreturns a tuple ofSample(one for each underlying dataset).- Parameters:
datasets (Iterable[Dataset]) – The
Datasetsto pair.- Raises:
ValueError – If the datasets contain duplicated (participant, session) pairs. This is an issue because it will prevent
PairedDatasetfrom finding a bijective mapping between the datasets.ValueError – If there is a mismatch of (participant, session) pairs across the datasets. An error will also be raised if the number of samples per image is not the same across datasets.
Examples
bids ├── sub-001 │ └── ses-M000 │ │ ├── pet │ │ │ └── sub-001_ses-M000_trc-18FAV45_pet.nii.gz │ │ └── anat │ │ └── sub-001_ses-M000_T1w.nii.gz ... ...from clinicadl.data.datasets import BidsDataset, PairedDataset from clinicadl.io.bids import BidsFileType bids_t1 = BidsDataset("bids", file_type=BidsFileType(data_type="anat", suffix="T1w")) bids_pet = BidsDataset("bids", file_type=BidsFileType(data_type="pet", suffix="pet")) multimodal_dataset = PairedDataset([bids_t1, bids_pet])
>>> len(bids_t1) 4 >>> len(bids_pet) 4 >>> len(multimodal_dataset) 4 >>> sample = multimodal_dataset[0] >>> len(sample) 2 >>> sample[0].file_type BidsFileType(suffix=re.compile('T1w'), data_type=re.compile('anat'), extension=re.compile('.nii.*'), with_entities=None, without_entities=None, description=None) >>> sample[1].file_type BidsFileType(suffix=re.compile('pet'), data_type=re.compile('pet'), extension=re.compile('.nii.*'), with_entities=None, without_entities=None, description=None)
- property df¶
The output of the merger of the metadata DataFrames of the underlying datasets.
- __len__() int[source]¶
Computes the total number of samples in the dataset.
- Returns:
int – Total number of samples in the dataset, i.e. the number of images times the number of samples per image.
- __getitem__(idx: int) tuple[Sample, ...][source]¶
Retrieves the collection of samples at a given index.
- get_sample_info(idx: int, column: str) Any[source]¶
Retrieves information on a given sample.
It will look for
columnin the DataFrame of each underlying dataset. If several values are found, it will raise an error.- Parameters:
- Returns:
Any – The value for this sample.
- Raises:
RuntimeError – If different values are found across the datasets.
- get_participant_session_couples() set[tuple[str, str]]¶
Retrieves all (participant, session) pairs in the dataset.
- Returns:
set[tuple[str, str]] – The set of (participant, session).
- subset(particpants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self¶
To get a subset of the dataset from a list of (participant, session) pairs.
- Parameters:
data (Union[DataFrameType, Sequence[tuple[str, str]]]) –
Can be either:
a sequence of (participant, session);
a
pandas.DataFrame(or a path to aTSVfile containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named"participant_id"and"session_id"(other columns won’t be considered).
- Returns:
Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in
data.