extract
- Prepare input data for deep learning with PyTorch¶
This pipeline prepares images generated by Clinica to be used with the PyTorch deep learning library [Paszke et al., 2019]. Four types of tensors are proposed: 3D images, 3D patches, 3D regions or 2D slices.
Prerequisites¶
You will need to execute the Clinica pipeline corresponding to the modality
argument before running this pipeline.
Running the pipeline¶
The pipeline can be run with the following command line:
clinicadl extract [image|patch|slice|roi] [OPTIONS] CAPS_DIRECTORY MODALITY
The command has four sub-commands:
image
to convert the whole 3D image,patch
to extract 3D patches walking through the entire image,roi
to extract a list of regions defined by masks at the root inCAPS_DIRECTORY
,slice
to extract 2D slices from the image.
which have the same arguments:
CAPS_DIRECTORY
(Path) is the folder in a CAPS hierarchy containing the images corresponding to theMODALITY
asked.MODALITY
(str) is the name of the preprocessing performed on the original images. It can bet1-linear
orpet-linear
. You can choosecustom
if you want to get a tensor from a custom filename.
Each sub-command has its own set of options. There are three generic options:
--subjects_sessions_tsv
(Path) is a path to a TSV file listing participant and session IDs.--extract_json
(str) is the name of the JSON file that will be created to store all the information of the extraction step. Default will name the JSON fileextract_{time_stamp}.json
.--n_proc
(int) is the number of workers used to parallelize tensor extraction. Default:2
.
Default values
When using patch or slice extraction, default values were set according to [Wen et al., 2020].
Tip
Type clinicadl extract [image|patch|slice|roi] --help
to see the full list of
parameters.
Outputs¶
Results are stored in following folder of the
CAPS hierarchy:
subjects/<participant_id>/<session_id>/deeplearning_prepare_data/<tensor_format>_based/<modality_folder>
.
<modality_folder>
is equal to modality
with all -
replaced by a _
.
Files are saved with the .pt
extension and contains tensors in PyTorch format.
A JSON file is also stored in the CAPS hierarchy under the tensor_extraction
folder:
CAPS_DIRECTORY
└── tensor_extraction
└── <extract_json>
extract
command that will be necessary when reading the tensors.
Extraction method¶
In this section we consider the options needed and outputs produced for different
extraction method command. Each time we consider that the input is named <input_pattern>_<suffix>.nii.gz
.
The suffix represents the initial modality (it can be for example T1w
).
Warning
The default behavior of the pipeline is to only extract images even if
another extraction method is specified. However, all the options will be
saved in the preprocessing JSON file and then the extraction is done when
data is loaded during the training. If you want to save the extracted
method tensors in the CAPS, you have to add the --save-features
flag.
image
¶
The image
format saves all the input values. It does not require any option.
The output filename is <input_pattern>_<suffix>.pt
.
patch
¶
The patch
tensor format creates N
patches which cover the whole image.
Each patch is a 3D tensor of size patch_size
xpatch_size
xpatch_size
.
The center of the patches are separated by stride_size
voxels. Then if
stride_size
< patch_size
the patches will have some overlap, otherwise
some voxels will not be seen.
Options:
--patch_size
(int) patch size. Default value:50
.--stride_size
(int) stride size. Default value:50
.--save_features
(bool) Flag to specify if you want to save the patches as tensors in the CAPS. By default, the pipeline only extracts the images and specified patches are then extracted on-the-fly.
The output files are <input_pattern>_patchsize-<L>_stride-<S>_patch-<i>_<suffix>.pt
:
tensor version of the <i>
-th 3D isotropic patch of size <L>
with a stride of <S>
.
roi
¶
The roi
format saves the regions defined by binary masks saved in the CAPS folder.
Binary mask must be provided by the user, by using the option --roi_list
.
Options:
-
--roi_list
: list ofN
regions to be extracted. The masks corresponding to these regions should be written in<caps_directory>/masks/tpl-<tpl_name>
. For example, if one wants to extract ROI corresponding to the right and left hippocampus using the publicly availableMNI152NLin2009cSym
template, two files containing the masks should be available in a folder named<caps_directory>/masks/tpl-MNI152NLin2009cSym/
. For full (uncropped) images, the filenames of these masks are:tpl-MNI152NLin2009cSym_res-1x1x1_roi-leftHippocampusBox_mask.nii.gz
,tpl-MNI152NLin2009cSym_res-1x1x1_roi-rightHippocampusBox_mask.nii.gz
. Then, the command to invoque the extraction is:clinicadl extract CAPS_DIRECTORY t1-linear roi --roi_list rightHippocampusBox --roi_list leftHippocampusBox
-
--roi_uncrop_output
: disables cropping option, so the output tensors have the same size as the whole image instead of the ROI size. --roi_custom_template
(mandatory forcustom
): only used whenmodality
is set tocustom
. Sets the value of<tpl_name>
.--roi_custom_mask_pattern
(optional): only used whenmodality
is set tocustom
. Allows to choose a particular mask with a name following the given pattern.--save_features
(bool) Flag to specify if you want to save the regions as tensors in the CAPS. By default, the pipeline only extracts the images and specified regions are then extracted on-the-fly.
ROI masks
ROI masks are compressed nifti files (.nii.gz) containing a binary mask of the same size as the
input data it corresponds to. All masks must follow the pattern
tpl-<tpl_name>_*_roi-<roi_name>_mask.nii.gz
.
If the defined region is not cubic, clinicadl extract
will automatically extract
the smallest bounding box around the region and fill the remaining values with 0 (unless
--roi_uncrop_output
is specified).
Masks must correspond to the template used in the pipeline for registration. For t1-linear
and pet-linear
it is automatically set to MNI152NLin2009cSym
. For a custom
modality
this value must be set using custom_template
.
The chosen mask will correspond to the mask with the shortest name following the wanted pattern.
Example of a valid CAPS hierarchy:
CAPS_DIRECTORY
├── masks
│ ├── tpl-<tpl_name>
│ │ ├── tpl-<tpl_name>[_custom_pattern]_roi-<roi_1>_mask.nii.gz
│ │ ├── ...
│ │ └── tpl-<tpl_name>[_custom_pattern]_roi-<roi_N>_mask.nii.gz
│ └── tpl-MNI152NLin2009cSym
│ ├── tpl-MNI152NLin2009cSym_desc-Crop_res-1x1x1_roi-<roi_1>_mask.nii.gz
│ ├── tpl-MNI152NLin2009cSym_desc-Crop_res-1x1x1_roi-<roi_2>_mask.nii.gz
│ ├── tpl-MNI152NLin2009cSym_res-1x1x1_roi-<roi_1>_mask.nii.gz
│ └── tpl-MNI152NLin2009cSym_res-1x1x1_roi-<roi_2>_mask.nii.gz
└── subjects
└── ...
The first two masks in tpl-MNI152NLin2009cSym/
contain desc-Crop
, hence they can only be
applied to cropped input images, and their size will be (169x208x179). On the contrary the last two masks
in the same folder do not contain desc-Crop
hence they can only be applied to uncropped
input images, and their size will be (193x229x193).
The output files are <source_file>_space-<tpl_name>[_desc-{CropRoi|CropImage|Crop}][_other_descriptors]_roi-<roi_name>_<suffix>.pt
:
tensor version of the selected 3D region of interest.
Here <source_file>
corresponds to all the descriptors found in <input_pattern>
before the space
key.
Other descriptors are computed according to the mask descriptors.
The key value following desc
depends on the input and output image:
desc-CropROI
: the input image containsdesc-Crop
and ROI cropping is enabled,desc-CropImage
: the input image containsdesc-Crop
and ROI cropping is disabled,desc-Crop
: the input image do not containdesc-Crop
and ROI cropping is enabled,<no_descriptor>
: the input image do not containdesc-Crop
and ROI cropping is disabled.
slice
¶
Options:
--slice_direction
: (int) slice direction. You can choose between0
(sagittal plane),1
(coronal plane) or2
(axial plane). Default value:0
.--slice_mode
: (str) slice mode. You can choose betweenrgb
(will save the slice in three identical channels) orsingle
(will save the slice in a single channel). Default value:rgb
.--discarded_slices
: (int) Number of slices discarded from respectively the beginning and the end of the MRI volume.--save_features
(bool) Flag to specify if you want to save the slices as tensors in the CAPS. By default, the pipeline only extracts the images and specified slices are then extracted on-the-fly.
The output files are <input_pattern>_axis-{sag|cor|axi}_channel-{single|rgb}_slice-<i>_<suffix>.pt
:
tensor version of the <i>
-th 2D slice in sag
ittal, cor
onal or axi
al
plane using three identical channels (rgb
) or one channel (single
).
modality
arguments¶
t1-linear
¶
--use_uncropped_image
: by default the features are extracted from the cropped image (see the documentation of thet1-linear
pipeline). You can deactivate this behaviour with the--use_uncropped_image
flag.
pet-linear
¶
--use_uncropped_image
: by default the features are extracted from the cropped image (see the documentation of thepet-linear
pipeline). You can deactivate this behaviour with the--use_uncropped_image
flag.--acq_label
: the label given to the PET acquisition, specifying the tracer used (acq-<acq_label>
). It can be for instance 'fdg' for 18F-fluorodeoxyglucose or 'av45' for 18F-florbetapir.--suvr_reference_region
: the reference region used to perform intensity normalization (i.e. dividing each voxel of the image by the average uptake in this region) resulting in a standardized uptake value ratio (SUVR) map. It can becerebellumPons
orcerebellumPons2
(used for amyloid tracers) andpons
orpons2
(used for FDG). See PET introduction for more details about masks versions.
custom
¶
--custom_suffix
: suffix of the filename that should be converted to the tensor format. The output will be saved into a folder namedcustom
but the processed files will kep their original name. E.g.: you can convert the images from the segmentation of the gray matter registered on the Ixi549Space. These images are obtained by runningt1-volume
pipeline. The suffix for these images is "graymatter_space-Ixi549Space_modulated-off_probability.nii.gz".