clinicadl.callbacks.MonitorCallback¶

class clinicadl.callbacks.MonitorCallback(num_measurements: int = 100, warmup_iterations: int = 10, enabled: bool = True)[source]¶

To monitor some computation statistics during a training phase.

The statistics will then be summarized in the MAPS in <maps>/training/split-<split_idx>/summary.log, but the details can be found in <maps>/training/split-<split_idx>/logs/computational.tsv.

The following statistics are recorded for different phases of the training (GPU statistics will be reported only if GPUs are used):

Time (s): total duration of the phase;
GPU Time (s): duration of GPU computation during the phase;
GPU Max Memory (MB): maximum GPU memory occupied during the phase.

You will also find global statistics:

Throughput (images/s): the number of image processed per second, which is equal to the batch size divided by the iteration time;
GPU throughput (images/s): the number of image processed per second by the GPU, which is equal to the batch size divided by the iteration GPU time.

Parameters:

num_measurements (int, default=100) –
The number of measurements to perform for averaging the statistics.
Note
- Some statistics, like the total training time, are obviously not measured num_measurements times.
- All the epochs will be measured.
warmup_iterations (int, default=10) – The number of batches to wait before starting the monitoring. It is particularly important when working with GPUs, on which the first calculations can take significantly longer and therefore skew the measurement.
enabled (bool, default=True) – Whether to activate monitoring.

property n_iterations: int¶: Number of batches already passed to the neural network.

on_exception(*, maps: Maps, state: TrainerState, exception: Exception, **kwargs) → None[source]¶

Called when an exception interrupts an execution of the Trainer.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
exception (Exception) – The exception that has been raised.

on_train_start(*, split: Split, optimization: OptimizationConfig, computational: ComputationalConfig, **kwargs) → None[source]¶

Called once at the beginning of Trainer.train if resume=False.

If resuming a training, on_resume() will be called instead.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
split (Split) – The clinicadl.split.Split on which training is performed.
optimizers (dict[str, torch.optim.Optimizer]) – The current torch.optim.Optimizer, as returned by by Model.backward_step.
grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
optimization (OptimizationConfig) – The optimization specifications of the training phase.
metrics (MetricsHandler) – The validation metrics to compute.
callbacks (CallbacksHandler) – The callbacks passed to the Trainer.
computational (ComputationalConfig) – The clinicadl.train.ComputationalConfig defining the computational specifications of the training phase.

on_resume(*, split: Split, optimization: OptimizationConfig, computational: ComputationalConfig, **kwargs) → None[source]¶

Called once when Trainer.train is resuming a training.

More precisely, this method will be called just before loading the checkpoints.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
split (Split) – The clinicadl.split.Split on which training is performed.
optimizers (dict[str, torch.optim.Optimizer]) – The current torch.optim.Optimizer, as returned by by Model.backward_step.
grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
optimization (OptimizationConfig) – The optimization specifications of the training phase.
metrics (MetricsHandler) – The validation metrics to compute.
callbacks (CallbacksHandler) – The callbacks passed to the Trainer.
computational (ComputationalConfig) – The clinicadl.train.ComputationalConfig defining the computational specifications of the training phase.

on_epoch_start(**kwargs) → None[source]¶

Called at the beginning of an epoch in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.

on_forward_step_start(**kwargs) → None[source]¶

Called every time Model.forward_step will be called in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
batch (BatchType) – The batch input to Model.forward_step.

on_backward_step_start(**kwargs) → None[source]¶

Called every time Model.backward_step will be called in Trainer.train.

Note

This event is equivalent to on_forward_step_end.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
loss (LossType) – The loss output by Model.forward_step and input by Model.backward_step.
grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.

on_backward_step_end(**kwargs) → None[source]¶

Called every time Model.backward_step has just been called in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.

on_optimization_step_start(**kwargs) → None[source]¶

Called every time Model.optimization_step will be called in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
optimizers (dict[str, torch.optim.Optimizer]) – The current torch.optim.Optimizer, as returned by by Model.backward_step.
grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.

on_optimization_step_end(**kwargs) → None[source]¶

Called every time Model.optimization_step has just been called in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
optimizers (dict[str, torch.optim.Optimizer]) – The current torch.optim.Optimizer, as returned by by Model.backward_step.
grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.

on_batch_end(*, state: TrainerState, **kwargs) → None[source]¶

Called every time the processing of a batch is completed during training, validation, or test phase.

Note

This event may be redundant with other events: e.g., in evaluation phases, it is equivalent to on_evaluation_end().

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.

on_validation_start(**kwargs) → None[source]¶

Called at the beginning of every validation loop in Trainer.train.

Not to be confused with on_validate_start().

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
dataloader (DataLoader) – The dataloader on which validation is performed.
metrics (MetricsHandler) – The validation metrics to compute.

on_evaluation_step_start(*, state: TrainerState, **kwargs) → None[source]¶

Called every time Model.evaluation_step will be called in Trainer.train, Trainer.validate, or Trainer.test.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
batch (BatchType) – The batch input to Model.evaluation_step.

on_metrics_computation_start(*, state: TrainerState, **kwargs) → None[source]¶

Called every time Model.evaluation_step has been called and metrics will now be computed.

Note

This event is equivalent to on_evaluation_step_end.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
output (Batch) – The batch output by Model.evaluation_step.
metrics (MetricsHandler) – The metrics to compute.

on_metrics_computation_end(*, state: TrainerState, **kwargs) → None[source]¶

Called every time metrics have just been computed on a batch.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
detailed_metrics_df (pandas.DataFrame) – The evaluation metrics on the batch.

on_validation_end(**kwargs) → None[source]¶

Called at the end of every validation loop in Trainer.train.

Not to be confused with on_validate_end().

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.
metrics (MetricsHandler) – The validation metrics computed.

on_epoch_end(**kwargs) → None[source]¶

Called at the end of an epoch in Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.

on_train_end(*, maps: Maps, state: TrainerState, **kwargs) → None[source]¶

Called once at the end of Trainer.train.

Parameters:

model (Model) – The model associated to the Trainer.
maps (Maps) – The MAPS associated to the Trainer.
state (TrainerState) – The current state of the Trainer.

state_dict() → Mapping[str, Any][source]¶

To get a checkpoint of the current state of the callback.

Returns:: Mapping[str, Any] – The current state in a dict.

load_state_dict(state_dict: Mapping[str, Any]) → None[source]¶

Sets to callbacks to a given state.

Parameters:: state_dict (Mapping[str, Any]) – The desired state of the Callback, as returned by state_dict().