Implementation details¶
Details of implementation corresponding to modules used in the provided architectures, autoencoder construction, transfer learning or training details are provided in this section.
Adaptive padding in pooling layers¶
Pooling layers reduce the size of their input feature maps. There are no learnable parameters in this layer, the kernel outputting the maximum value of the part of the feature map its kernels is covering.
Here is a 2D example of the standard layer of pytorch nn.MaxPool2d:
The last column may not be used depending on the size of the kernel/input and stride value.
To avoid this, pooling layers with adaptive padding PadMaxPool3d were implemented in clinicadl to exploit information from the whole feature map.
Adapt the padding... or the input!
To avoid this problem, deep learners often choose to resize their input to have sizes equal to 2n with maxpooling layers of size and stride of 2.
Autoencoders construction from CNN architectures¶
In clinicadl, an autoencoder is derived from a CNN architecture:
- the encoder corresponds to the convolutional part of the CNN,
- the decoder is composed of the transposed version of the operations used in the encoder.

The list of the transposed version of the modules can be found below:
Conv3d→ConvTranspose3dPadMaxPool3d→CropMaxUnpool3d(specific module ofclinicadlused to reconstruct the feature map produced by pooling layers with adaptive padding)MaxPool3d→MaxUnpool3dLinear→Linearwith an inversion inin_featuresandout_features,Flatten→ReshapeLeakyReLU→LeakyReLUwith the inverse value of alpha,- other → copy of itself
Transfer learning¶
It is possible to transfer trainable parameters between models. In the following list the weights are transferred from source task to target task:
autoencodertocnn: The trainable parameters of the convolutional part of thecnn(convolutions and batch normalization layers) take the values of the trainable parameters of the encoder part of the source autoencoder.cnntocnn: All the trainable parameters are transferred between the two models.autoencodertomulticnn: The convolutional part of each CNN of themulticnnrun is initialized with the weights of the encoder of the source autoencoder.cnntomulticnn: Each CNN of themulticnnrun is initialized with the weights of the source CNN.multicnntomulticnn: Each CNN is initialized with the weights of the corresponding one in the source experiment.
Optimization¶
The optimizer used in clinicadl train is Adam.
Usually, the optimizer updates the weights after one iteration, an iteration corresponding
to the processing of one batch of images.
In ClinicaDL, it is possible to accumulate the gradients with accumulation_steps during N iterations to update
the weights of the network every N iterations. This allows simulating a larger batch size
even though the computational resources are not powerful enough to allow it.
virtual_batch_size = batch_size * accumulation_steps
Evaluation¶
In some frameworks, the training loss may be approximated using the sum of the losses of the last batches of data seen by the network. In ClinicaDL, set (train or validation) performance is always evaluated on all the images of the set.
By default during training, the network performance on train and validation is evaluated at the end of each epoch.
It is possible to perform inner epoch evaluations by setting the value of evaluation_steps to the number of
weight updates before evaluation. Inner epoch evaluations allow better evaluating the progression of the network
during training.
Computation time
Setting evaluation_steps to a small value may considerably increase computation time.
Model selection¶
The selection of a model is associated to a metric evaluated on the validation set:
- autoencoders are selected based on the loss (mean squared error),
- CNNs are selected based on the balanced accuracy and the loss (cross-entropy loss).
At the end of each epoch, if the validation performance of the current state is better than the best one ever seen, the current state of the model is saved in the corresponding best model folder. Such comparison and serialization is only performed at the end of an epoch, even though inner epoch evaluations are performed.
Stopping criterion¶
By default, early stopping is enabled to save computation time. This method automatically stops training
if during patience epochs, the validation loss at the end of an epoch never became smaller than the best validation
loss ever seen * (1 - tolerance). Early stopping can be disabled by setting patience to 0.
If early stopping is disabled, or if its stopping criterion was never reached, training stops when the maximum number
of epochs epochs is reached.
Soft voting¶
For classification tasks that take as input a part of the MRI volume (patch, roi or slice), an ensemble operation is needed to obtain the label at the image level.
For example, size and stride of 50 voxels on linear preprocessing leads to the classification of 36 patches, but they are not all equally meaningful. Patches that are in the corners of the image are mainly composed of background and skull and may be misleading, whereas patches within the brain may be more useful.

Then the image-level probability of AD pAD will be:
$$ p^{AD} = {\sum_{i=0}^{35} bacc_i * p_i^{AD}}$$
where:
- piAD is the probability of AD for patch i,
- bacci is the validation balanced accuracy for patch i.
Multi-cohort¶
Starting from version 0.2.1, it is possible to use ClinicaDL's functions on several datasets at the same time.
In this case, the multi-cohort flag must be given, and the caps_directory and the tsv_path
correspond to TSV files.
The caps_directory variable must lead to a TSV file with two columns:
- cohort the name of the cohort (must correspond to the values in tsv_path),
- path the path to the corresponding CAPS hierarchy.
The tsv_path variable points to a TSV file with two columns:
- cohort the name of the cohort (must correspond to the values in caps_directory),
- path the path to the corresponding labels list, outputs of split
or kfold methods.
- diagnoses the diagnoses that will be used in the cohort. Must correspond to a single string with labels accepted by
clinicadl train (AD, BV, CN, MCI, sMCI or pMCI) separated by commas.
See the dedicated section to use custom labels.