`train NETWORK_TASK` - Define a network task from TOML or command line¶

This functionality enables the training of a network using different formats of inputs (whole 3D images, 3D patches, regions of interest or 2D slices). It mainly relies on the PyTorch deep learning library [Paszke et al., 2019].

Different tasks can be learnt by a network: classification, reconstruction and regression (see below).

Prerequisites¶

You need to execute the clinicadl tsvtools get-labels and clinicadl tsvtools {split|kfold} commands prior to running this task to have the correct TSV file organization. Moreover, there should be a CAPS, obtained running the preprocessing pipeline wanted (currently only t1-Linear and pet-linear preprocessing is supported, but others will be soon).

Running the task¶

The training task can be run with the following command line:

clinicadl train [OPTIONS] NETWORK_TASK CAPS_DIRECTORY PREPROCESSING_JSON \
                TSV_DIRECTORY OUTPUT_MAPS_DIRECTORY

where mandatory arguments are:

NETWORK_TASK (str) is the type of task learnt by the network. Available tasks are classification, regression and reconstruction.
CAPS_DIRECTORY (Path) is the input folder containing the neuroimaging data in a CAPS hierarchy. In case of multi-cohort training, must be a path to a TSV file.
PREPROCESSING_JSON (str) is the name of the preprocessing json file stored in the CAPS_DIRECTORY that corresponds to the clinicadl prepare-data output. This will be used to load the correct tensor inputs with the wanted preprocessing.
TSV_DIRECTORY (Path) is the input folder of a TSV file tree generated by clinicadl tsvtools {split|kfold}. In case of multi-cohort training, must be a path to a TSV file.
OUTPUT_MAPS_DIRECTORY (Path) is the folder where the results are stored.

The training can be configured through a Toml configuration file or by using the command line options. If you have a Toml configuration file (see the section below for more information) you can use the following option to load it:

--config_file (Path) is the path to a Toml configuration file. This file contains the value for the options that you want to specify (to avoid too long command line).

If an option is specified twice (in the configuration file and as an option in command line) then the value specified in the command line will have a higher priority when running the job.

Options shared for all values of NETWORK_TASK are organized in groups:

Architecture management
- --architecture (str) is the name of the architecture used. Default depends on the task. It must correspond to a class that inherits from nn.Module imported in clinicadl/utils/network/__init__.py. To implement custom models please refer to this section.
- --multi_network/--single_network (bool) is a flag to ask for a multi-network framework. Default trains only one network on all images.
- --dropout (float) is the rate of dropout applied in dropout layers. Default: 0.

Architecture limitations

Depending on the task, the output size needed to learn the task may vary:

for classification the network must output a vector of length equals to the number of classes,
for regression the network has only one output node,
for reconstruction the network outputs an image of the same size as the input.

If you want to use custom architecture, be sure to respect the output size needed for the learnt task.

Computational resources
- --gpu/--no-gpu (bool) Use GPU acceleration. Default behavior is to try to use a GPU and to raise an error if it is not found. Please specify --no-gpu to use CPU instead.
- --amp/--no-amp (bool) Enables Pytorch's Automatic Mixed Precision with float16. Saves some memory and might speedup training with modern GPUs. We do not allow AMP on CPU. Default: False.
- --fully_sharded_data_parallel (bool) Enables Zero Redundancy Optimizer with Pytorch to save memory at the cost of communications. Currently only Stage 1. In the future will be Stage 3. Requires using multiple GPUs Default: False.
- --n_proc (int) is the number of workers used by the DataLoader. Default: 2.
- --batch_size (int) is the size of the batch used in the DataLoader. Default: 8.
- --evaluation_steps (int) gives the number of iterations to perform an evaluation internal to an epoch. Default will only perform an evaluation at the end of each epoch.
Data management
- --diagnoses (List[str]) is the list of the files which will be used for training. Default will look for AD and CN TSV files.
- --baseline/--longitudinal (bool) is a flag to load only _baseline.tsv files instead of .tsv files comprising all the sessions. Default: --longitudinal.
- --normalize/--unnormalize (bool) is a flag to disable min-max normalization that is performed by default. Default: --normalize.
- --data_augmentation (List[str]) is the list of data augmentation transforms applied to the training data. Must be chosen in [None, Noise, Erasing, CropPad, Smoothing, Motion, Ghosting, Spike, BiasField, RandomBlur, RandomSwap]. Default: no data augmentation.
- --sampler (str) is the sampler used on the training set. It must be chosen in [random, weighted]. weighted will give a stronger weight to underrepresented classes. Default: random.
- --multi_cohort (bool) is a flag indicated that multi-cohort training is performed. In this case, caps_directory and tsv_path must be paths to TSV files.
Cross-validation arguments
- --n_splits (int) is a number of splits k to load in the case of a k-fold cross-validation. Default will load a single-split.
- --split (list of int) is a subset of folds that will be used for training. By default all splits available are used.
Reproducibility (for more information refer to the implementation details)
- --seed (int) is the value used to set the seed of all random operations. Default samples a seed and uses it for the experiment.
- --nondeterministic/--deterministic (bool) forces the training process to be deterministic. If any non-deterministic behaviour is encountered will raise a RuntimeError. Default: --nondeterministic.
- --compensation (str) allow to choose how CUDA will compensate to obtain a deterministic behaviour. The computation time will be longer, or the computations will require more memory space. Default: memory. Must be chosen between time and memory.
Optimization parameters
- --optimizer (str) is the name of the optimizer used to train the network. Must correspond to a Pytorch class. Default: Adam.
- --epochs (int) is the maximum number of epochs. Default: 20.
- --learning_rate (float) is the learning rate used to perform weight update. Default: 1e-4.
- --adaptive_learning_rate (bool) Enables the learning rate to be reduced by 10 when the validation loss hasn't changed during 10 epochs. Default: False.
- --weight_decay (float) is the weight decay used by the Adam optimizer. Default: 1e-4.
- --patience (int) is the number of epochs for early stopping patience. Default: 0.
- --tolerance (float) is the value used for early stopping tolerance. Default: 0.
- --accumulation_steps (int) gives the number of iterations during which gradients are accumulated before performing the weights update. This allows to virtually increase the size of the batch. Default: 1.
- --profiler/--no-profiler (bool) Enables Pytorch profiler for the first 30 steps after a short warmup. It will make an execution trace in the output directory and some statistics about the CPU and GPU usage. Default: False.
Transfer learning parameters
- --transfer_path (Path) is the path to the model used for transfer learning.
- --transfer_selection_metric (str) is the transfer learning selection metric.
- --nb_unfrozen_layer (int) is the number of layer that will be retrain during training. For example, if it is 2, the last two layers of the model will not be freezed. See Implementation details for more information about transfer learning.
Track an experiment
- --track_exp (str) is the name of the experiment tracker you want to use. Must be chosen between wandb (Weight & Biases) and mlflow. As mlflow and W&B are not ClinicaDL dependencies, you must install the one chosen on your own (by running pip install wandb/mlflow). For more information, check out the documentation of W&B or Mlflow

A few options depend on the task performed:

classification The objective of the classification is to attribute a class to input images. The criterion loss is the cross entropy between the ground truth and the network output. The evaluation metrics are the accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and balanced accuracy (BA).
- --label (str) is the name of the column containing the label for the classification task. It must be a categorical variable, but may be of any type. Default: diagnosis.
- --selection_metrics (str) are metrics used to select networks according to the best validation performance. Default: loss.
- --selection_threshold (float) is a selection threshold used for soft-voting. It is only taken into account if several images are extracted from the same original 3D image (i.e. num_networks > 1). Default: 0.
- --loss (str) is the name of the loss used to optimize the classification task. Must correspond to a Pytorch class. Default: CrossEntropyLoss.

Note

Users can also set themselves the label_code parameter, but only from the configuration file. This parameter allows to choose which name as written in the label column is associated with which node value (designated by the corresponding integer). This way several names may be associated with the same node.

regression The objective of the regression is to learn the value of a continuous variable given an image. The criterion loss is the mean squared error between the ground truth and the network output. The evaluation metrics are the mean squared error (MSE) and mean absolute error (MAE).
- --label (str) is the name of the column containing the label for the regression task. It must be a continuous variable (float or int). Default: age.
- --selection_metrics (str) are metrics used to select networks according to the best validation performance. Default: loss.
- --loss (str) is the name of the loss used to optimize the regression task. Must correspond to a Pytorch class. Default: MSELoss.
reconstruction The objective of the reconstruction is to learn to reconstruct images given in input. The criterion loss is the mean squared error between the input and the network output. The evaluation metrics are the mean squared error (MSE) and mean absolute error (MAE).
- --selection_metrics (str) are metrics used to select networks according to the best validation performance. Default: loss.
- --loss (str) is the name of the loss used to optimize the reconstruction task. Must correspond to a Pytorch class. Default: MSELoss.

Configuration file¶

Since the train pipeline has a many options, the command line can be long and difficult to use. To avoid this we created the --config_file option that allows the user to give a configuration file with all the options they need to the command line. The command line will then first load the default values, then overwrite the loaded values with the one specified in the configuration file before running the job.

TOML format is a human readable format, thus it is easy to write a configuration file with any text editor. The user just needs to specify the value of the option in front of the option name in the file.

Here is an example of a TOML configuration file with all the default values:

# CONFIG FILE FOR TRAIN PIPELINE WITH DEFAULT ARGUMENTS

[Model]
architecture = "default" # ex : Conv5_FC3 for classification and regression tasks
multi_network = false

[Architecture]
# CNN
dropout = 0.0 # between 0 and 1
# VAE
latent_space_size = 128
feature_size = 1024
n_conv = 4
io_layer_channels = 8
recons_weight = 1
KL_weight = 1

[Classification]
selection_metrics = ["loss"]
label = "diagnosis"
label_code = {}
selection_threshold = 0.0 # Will only be used if num_networks != 1
loss = "CrossEntropyLoss"

[Regression]
selection_metrics = ["loss"]
label = "age"
loss = "MSELoss"

[Reconstruction]
selection_metrics = ["loss"]
loss = "MSELoss"

[Computational]
gpu = true
n_proc = 2
batch_size = 8
evaluation_steps = 0

[Reproducibility]
seed = 0
deterministic = false
compensation = "memory" # Only used if deterministic = true

[Transfer_learning]
transfer_path = ""
transfer_selection_metric = "loss"
nb_unfrozen_layer = 0

[Mode]
# require to manually generate preprocessing json
use_extracted_features = false

[Data]
multi_cohort = false
diagnoses = ["AD", "CN"]
baseline = false
normalize = true
data_augmentation = false
sampler = "random"

[Cross_validation]
n_splits = 0
split = []

[Optimization]
optimizer = "Adam"
epochs = 20
learning_rate = 1e-4
weight_decay = 1e-4
patience = 0
tolerance = 0.0
accumulation_steps = 1

This file is available at clinicadl/resources/config/train_config.toml in the ClinicaDL folder (or on GitHub).

Warning

Ensure that the structure of the file respects the one given in example otherwise ClinicaDL won't be able to read the options. For instance if you want to specify a value for the batch_size option, the key should be in the [Computational] section of the configuration file as shown above.

Outputs¶

The clinicadl train command outputs a MAPS structure in which there are only two data groups: train and validation. To limit the size of the MAPS produced, tensor inputs and outputs of each group are only produced thanks to one image of the data set (for more information on input and output tensor serialization report to the dedicated section).

train NETWORK_TASK - Define a network task from TOML or command line¶

Prerequisites¶

Running the task¶

Configuration file¶

Outputs¶

`train NETWORK_TASK` - Define a network task from TOML or command line¶