Skip to content

Implementation details

Details of implementation corresponding to modules used in the provided architectures, autoencoder construction, transfer learning or training details are provided in this section.

Deterministic algorithms

ClinicaDL allows to set the seed of the pseudo-random libraries used by the library (numpy, random, and torch). This seed is also used in train data loaders, to avoid the randomness in data loading introduced by Pytorch.

However, this is not sufficient to guarantee the exact reproducibility when using GPU (though the results will be more close than with two different seeds). To obtain exactly the same results with the same GPU environment, the user must specify the flag --deterministic. This will force CUDA to use a deterministic behaviour, but at the cost of the computation time, or the memory use.

Non-deterministic functions in Pytorch

Pytorch library is currently improving the reproducibility of their methods, however all the functions do not have a reproducible equivalent yet. If you want to be deterministic, you may need to adapt your architecture or wait until Pytorch implements the deterministic version of the function you need.

Adaptive padding in pooling layers

Pooling layers reduce the size of their input feature maps. There are no learnable parameters in this layer, the kernel outputting the maximum value of the part of the feature map its kernels is covering.

Here is a 2D example of the standard layer of pytorch nn.MaxPool2d:

animation of classical max pooling

The last column may not be used depending on the size of the kernel/input and stride value. To avoid this, pooling layers with adaptive padding PadMaxPool3d were implemented in ClinicaDL to exploit information from the whole feature map.

animation of max pooling with adaptive pooling

Adapt the padding... or the input!

To avoid this problem, deep learners often choose to resize their input to have sizes equal to 2n with maxpooling layers of size and stride of 2.

Autoencoders construction from CNN architectures

In ClinicaDL, autoencoders can be derived from a CNN architecture. In this case:

  • the encoder corresponds to the convolutional part of the CNN,
  • the decoder is composed of the transposed version of the operations used in the encoder.

Illustration of a CNN and the corresponding autoencoder

The list of the transposed version of the modules can be found below:

  • Conv3dConvTranspose3d
  • PadMaxPool3dCropMaxUnpool3d (specific module of clinicadl used to reconstruct the feature map produced by pooling layers with adaptive padding)
  • MaxPool3dMaxUnpool3d
  • LinearLinear with an inversion in in_features and out_features,
  • FlattenReshape
  • LeakyReLULeakyReLU with the inverse value of alpha,
  • other → copy of itself

To easily transform a custom CNN architecture (without skip connections) in an autoencoder, you can use the CNN_Transformer class in clinicadl/utils/network/autoencoder/

Transfer learning

Model identity

Weights can be transferred between a source and a target network only if their layers are named the same way (i.e. the models are identical). To ensure transfer learning between a CNN and an Autoencoder, please use the CNN_transformer (see section above).

It is possible to transfer trainable parameters between models. In the following list the weights are transferred from source task to target task:

  • AutoEncoder to CNN: The trainable parameters of the convolutional part of the CNN (convolutions and batch normalization layers) take the values of the trainable parameters of the encoder part of the source autoencoder.
  • CNN to AutoEncoder: The encoder takes the values of the convolutional part of the CNN.
  • between identical networks: All the trainable parameters are transferred between the two models.
  • single to multi: The single network is used to initialize each network of the multi-network framework.
  • multi to multi: Each network is initialized with the weights of the corresponding one in the source experiment.


The optimizer used in clinicadl train is Adam.

Usually, the optimizer updates the weights after one iteration, an iteration corresponding to the processing of one batch of images. In ClinicaDL, it is possible to accumulate the gradients with accumulation_steps during N iterations to update the weights of the network every N iterations. This allows simulating a larger batch size even though the computational resources are not powerful enough to allow it.

virtual_batch_size = batch_size * accumulation_steps


In some frameworks, the training loss may be approximated using the sum of the losses of the last batches of data seen by the network. In ClinicaDL, set (train or validation) performance is always evaluated on all the images of the set.

By default, during training the network performance on train and validation is evaluated at the end of each epoch. It is possible to perform inner epoch evaluations by setting the value of evaluation_steps to the number of weight updates before evaluation. Inner epoch evaluations allow better evaluating the progression of the network during training.

Computation time

Setting evaluation_steps to a small value may considerably increase computation time.

Model selection

The selection of a model is associated to a metric evaluated on the validation set.

At the end of each epoch, if the validation performance of the current state is better than the best one ever seen, the current state of the model is saved in the corresponding best model folder. Such comparison and serialization is only performed at the end of an epoch, even though inner epoch evaluations are performed.

Stopping criterion

By default, early stopping is enabled to save computation time. This method automatically stops training if during patience epochs, the validation loss at the end of an epoch never became smaller than the best validation loss ever seen * (1 - tolerance). Early stopping can be disabled by setting patience to 0.

If early stopping is disabled, or if its stopping criterion was never reached, training stops when the maximum number of epochs epochs is reached.

Image-level results

For classification or regression tasks that take as input a part of the MRI volume (patch, roi or slice), an ensemble operation is needed to obtain the label at the image level.

For classification task, soft-voting was implemented as all inputs are not equally meaningful. For example, patches that are in the corners of the image are mainly composed of background and skull and may be misleading, whereas patches within the brain may be more useful.

Comparison of meaningful and misleading patches

Then the image-level probability of label 1 p1 will be:

$$ p^{1} = {\sum_{i=0}^{35} bacc_i * p_i^{1}}$$


  • pi1 is the probability of label 1 for patch i,
  • bacci is the validation balanced accuracy for patch i.

For the regression task, hard-voting is used, then the value of the output at the image level is simply the average of the values of all image parts.


By default, all images are used as input of a unique network. With the --multi flag, a network is trained per image part.

The flag --multi cannot be used if the number of parts per image is 1 (for example in image mode or in roi mode if there is only one region).


Starting from version 0.2.1, it is possible to use ClinicaDL's functions on several datasets at the same time. In this case, the multi-cohort flag must be given, and the CAPS_DIRECTORY and the TSV_DIRECTORY correspond to TSV files.

The CAPS_DIRECTORY argument must lead to a TSV file with two columns:

  • cohort the name of the cohort (must correspond to the values in TSV_DIRECTORY),
  • path the path to the corresponding CAPS hierarchy.

The TSV_DIRECTORY argument points to a TSV file with two columns:

  • cohort the name of the cohort (must correspond to the values in CAPS_DIRECTORY),
  • path the path to the corresponding labels list, outputs of split or kfold methods.
  • diagnoses the diagnoses that will be used in the cohort. Must correspond to a single string with labels separated by commas.