resume
- Resume a prematurely stopped job¶
This functionality allows to resume a prematurely stopped job trained with
clinicadl train
of clinicadl random-search generate
tasks.
The files that are used by this function are the following:
maps.json
describes the training parameters used to create the model,checkpoint.pth.tar
contains the last version of the weights of the network,optimizer.pth.tar
contains the last version of the parameters of the optimizer,training.tsv
contains the successive values of the metrics during training.
These files are organized in model_path
using the MAPS format.
You should also ensure that the data at tsv_path
and caps_dir
in maps.json
is still present and correspond to the ones used during training.
Prerequisites¶
Please check which preprocessing needs to
be performed in the commandline.json
file in the results folder. If it has
not been performed, execute the preprocessing pipeline as well as clinicadl
extract
to obtain the tensor versions of the images.
Running the task¶
This task can be run with the following command line:
clinicadl resume <model_path>
model_path
(str) is a path to the MAPS folder of the model.
By default, the arguments corresponding to computational resources will be the same
as the ones defined in commandline.json
. However, it is possible to change them
by using the following options:
--nproc
(int) changes the number of workers used by the DataLoader.--use_cpu
(bool) forces using CPUs.--use_gpu
(bool) forces using a GPU.--batch_size
(int) changes the size of the batch used in the DataLoader.--evaluation_steps
(int) changes the number of iterations to perform before computing an evaluation.
Outputs¶
The outputs are formatted according to the MAPS.
Note
The files checkpoint.pth.tar
and optimizer.pth.tar
are automatically removed as soon
as the stopping criterion is reached, and the
performances of the models are evaluated on the training and validation datasets.