clinicadl.data.datasets.UnpairedDataset¶
- class clinicadl.data.datasets.UnpairedDataset(datasets: Iterable[Dataset], oversample: bool = False)[source]¶
For stacking multiple
Dataset(e.g. different modalities from different datasets). By “stacking”, we mean randomly associating images across datasets.So,
UnpairedDatasetdiffers fromPairedDatasetin thatPairedDatasetassociates images across datasets via a unique mapping. Therefore, as opposed toPairedDataset, there is no need for the datasets forming theUnpairedDatasetto contain the same (participant, session) pairs.The randomness of the mapping between datasets can be controlled via
set_epoch(). This enables to have different associations for each epoch.The size of an
UnpairedDatasetis set to the size of its biggest underlying dataset ifoversample=True, or to the size of its smallest underlying dataset ifoversample=False: to handle datasets with different sizes,UnpairedDatasetwill randomly replicate some of their samples so that they reach the size of the biggest dataset ifoversample=True, or will randomly drop some of their samples so that they reach the size of the smallest dataset ifoversample=False. This randomness is also controlled viaset_epoch().An
UnpairedDatasetwill return a tuple ofSample(one for each underlying dataset).- Parameters:
datasets (Iterable[Dataset]) – The
Datasetsto stack.oversample (bool, default=False) –
Strategy to adopt when the datasets have different sizes:
oversample=True: randomly replicate samples in smaller datasets so that they reach the size of the biggest dataset.oversample=False: randomly drop samples in bigger datasets so that all datasets reach the size of the smallest dataset.
Examples
bids_t1 ├── sub-001 │ └── ses-M000 │ │ └── anat │ │ └── sub-001_ses-M000_T1w.nii.gz ... ... bids_pet ├── sub-A │ └── ses-M003 │ │ └── pet │ │ └── sub-A_ses-M000_trc-18FAV45_pet.nii.gz ... ...from clinicadl.data.datasets import BidsDataset, UnpairedDataset from clinicadl.io.bids import BidsFileType bids_t1 = BidsDataset("bids_t1", file_type=BidsFileType(data_type="anat", suffix="T1w")) bids_pet = BidsDataset("bids_pet", file_type=BidsFileType(data_type="pet", suffix="pet")) multimodal_dataset = UnpairedDataset([bids_t1, bids_pet], oversample=True)
>>> len(bids_t1) 4 >>> len(bids_pet) 2 >>> len(stacked) 4 # length of the biggest dataset
We can access the random mapping made between the datasets via
.mapping:>>> stacked.mapping dataset_id 0 1 idx 0 2 0 1 3 0 2 1 0 3 0 1
idxis the index of the sample in theUnpairedDataset. In column0, you have the associated sample in the first dataset (bids_t1), and in column1, the associated sample in the second dataset (bids_pet).>>> bids_t1[2].participant_id, bids_t1[2].session_id, ('sub-002', 'ses-M000') >>> bids_pet[0].participant_id, bids_pet[0].session_id ('sub-A', 'ses-M000') >>> sample = stacked[0] >>> len(sample) 2 >>> sample[0].participant_id, sample[0].session_id ('sub-002', 'ses-M000') >>> sample[1].participant_id, sample[1].session_id ('sub-A', 'ses-M000')
Now we can change the random mapping with
set_epoch():>>> stacked.set_epoch(7) >>> stacked.mapping dataset_id 0 1 idx 0 2 1 1 1 1 2 0 0 3 3 0 >>> sample = stacked[0] >>> sample[1].participant_id, sample[1].session_id ('sub-B', 'ses-M000')
Finally, if
oversample=False:>>> stacked = UnpairedDataset([bids_t1, bids_pet], oversample=False) >>> len(stacked) 2 # = length of the smallest dataset >>> stacked.mapping dataset_id 0 1 idx 0 2 0 1 3 1
- property df¶
The output of the merger of the metadata DataFrames of the underlying datasets.
- property mapping: DataFrame¶
The random mapping between the samples of the underlying datasets.
- get_sample_info(idx: int, column: str) tuple[Any, ...][source]¶
Retrieves information on a given sample.
In an
UnpairedDataset, a sample is a tuple of “sub-samples” from the underlying datasets. Therefore,get_sample_infowill also return a tuple, containing the information on all the sub-samples forming the sample.If the information cannot be found for a sub-sample (because all the underlying datasets don’t necessarily contain the same information),
get_sample_infowill returnNonefor this sub-sample.See
Dataset.get_sample_infofor more details.- Parameters:
- Returns:
tuple[Any, …] – The information (e.g. the age, the sex, etc.) found for each sub-sample.
- Raises:
KeyError – If
columnis not in any DataFrame of the datasets forming theUnpairedDataset.
- set_epoch(epoch: int) None[source]¶
Sets the epoch.
This ensures that the random mapping between the datasets is different for each epoch.
- Parameters:
epoch (int) – Epoch number.
- __len__() int[source]¶
The length of an
UnpairedDatasetis the length of its biggest dataset.- Returns:
int – The length of the dataset.
- __getitem__(idx: int) tuple[Sample, ...][source]¶
Retrieves the collection of samples at a given index.
The random mapping between datasets (in
self.mapping) is used to determine which samples to retrieve for each underlying dataset.
- get_participant_session_couples() set[tuple[str, str]]¶
Retrieves all (participant, session) pairs in the dataset.
- Returns:
set[tuple[str, str]] – The set of (participant, session).
- subset(particpants_sessions: Path | str | DataFrame | Iterable[tuple[str, str]]) Self¶
To get a subset of the dataset from a list of (participant, session) pairs.
- Parameters:
data (Union[DataFrameType, Sequence[tuple[str, str]]]) –
Can be either:
a sequence of (participant, session);
a
pandas.DataFrame(or a path to aTSVfile containing the dataframe) with the list of (participant, session) pairs to extract. This list must be passed via two columns named"participant_id"and"session_id"(other columns won’t be considered).
- Returns:
Self – A subset of the original dataset, restricted to the (participant, session) pairs mentioned in
data.