clinicadl.callbacks.MonitorCallback¶
- class clinicadl.callbacks.MonitorCallback(num_measurements: int = 100, warmup_iterations: int = 10, enabled: bool = True)[source]¶
To monitor some computation statistics during a training phase.
The statistics will then be summarized in the MAPS in
<maps>/training/split-<split_idx>/summary.log, but the details can be found in<maps>/training/split-<split_idx>/logs/computational.tsv.The following statistics are recorded for different phases of the training (GPU statistics will be reported only if GPUs are used):
Time (s): total duration of the phase;
GPU Time (s): duration of GPU computation during the phase;
GPU Max Memory (MB): maximum GPU memory occupied during the phase.
You will also find global statistics:
Throughput (images/s): the number of image processed per second, which is equal to the batch size divided by the iteration time;
GPU throughput (images/s): the number of image processed per second by the GPU, which is equal to the batch size divided by the iteration GPU time.
- Parameters:
num_measurements (int, default=100) –
The number of measurements to perform for averaging the statistics.
Note
Some statistics, like the total training time, are obviously not measured
num_measurementstimes.All the epochs will be measured.
warmup_iterations (int, default=10) – The number of batches to wait before starting the monitoring. It is particularly important when working with GPUs, on which the first calculations can take significantly longer and therefore skew the measurement.
enabled (bool, default=True) – Whether to activate monitoring.
- on_exception(*, maps: Maps, state: TrainerState, exception: Exception, **kwargs) None[source]¶
Called when an exception interrupts an execution of the
Trainer.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.exception (Exception) – The exception that has been raised.
- on_train_start(*, split: Split, optimization: OptimizationConfig, computational: ComputationalConfig, **kwargs) None[source]¶
Called once at the beginning of
Trainer.trainifresume=False.If resuming a training,
on_resume()will be called instead.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.split (Split) – The
clinicadl.split.Spliton which training is performed.optimizers (dict[str, torch.optim.Optimizer]) – The current
torch.optim.Optimizer, as returned by byModel.backward_step.grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
optimization (OptimizationConfig) – The optimization specifications of the training phase.
metrics (MetricsHandler) – The validation metrics to compute.
callbacks (CallbacksHandler) – The callbacks passed to the
Trainer.computational (ComputationalConfig) – The
clinicadl.train.ComputationalConfigdefining the computational specifications of the training phase.
- on_resume(*, split: Split, optimization: OptimizationConfig, computational: ComputationalConfig, **kwargs) None[source]¶
Called once when
Trainer.trainis resuming a training.More precisely, this method will be called just before loading the checkpoints.
- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.split (Split) – The
clinicadl.split.Spliton which training is performed.optimizers (dict[str, torch.optim.Optimizer]) – The current
torch.optim.Optimizer, as returned by byModel.backward_step.grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
optimization (OptimizationConfig) – The optimization specifications of the training phase.
metrics (MetricsHandler) – The validation metrics to compute.
callbacks (CallbacksHandler) – The callbacks passed to the
Trainer.computational (ComputationalConfig) – The
clinicadl.train.ComputationalConfigdefining the computational specifications of the training phase.
- on_epoch_start(**kwargs) None[source]¶
Called at the beginning of an epoch in
Trainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.
- on_forward_step_start(**kwargs) None[source]¶
Called every time
Model.forward_stepwill be called inTrainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.batch (BatchType) – The batch input to
Model.forward_step.
- on_backward_step_start(**kwargs) None[source]¶
Called every time
Model.backward_stepwill be called inTrainer.train.Note
This event is equivalent to
on_forward_step_end.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.loss (LossType) – The loss output by
Model.forward_stepand input byModel.backward_step.grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
- on_backward_step_end(**kwargs) None[source]¶
Called every time
Model.backward_stephas just been called inTrainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.
- on_optimization_step_start(**kwargs) None[source]¶
Called every time
Model.optimization_stepwill be called inTrainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.optimizers (dict[str, torch.optim.Optimizer]) – The current
torch.optim.Optimizer, as returned by byModel.backward_step.grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
- on_optimization_step_end(**kwargs) None[source]¶
Called every time
Model.optimization_stephas just been called inTrainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.optimizers (dict[str, torch.optim.Optimizer]) – The current
torch.optim.Optimizer, as returned by byModel.backward_step.grad_scaler (torch.amp.GradScaler) – The torch.amp.GradScaler used to scale gradients.
- on_batch_end(*, state: TrainerState, **kwargs) None[source]¶
Called every time the processing of a batch is completed during training, validation, or test phase.
Note
This event may be redundant with other events: e.g., in evaluation phases, it is equivalent to
on_evaluation_end().- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.
- on_validation_start(**kwargs) None[source]¶
Called at the beginning of every validation loop in
Trainer.train.Not to be confused with
on_validate_start().- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.dataloader (DataLoader) – The dataloader on which validation is performed.
metrics (MetricsHandler) – The validation metrics to compute.
- on_evaluation_step_start(*, state: TrainerState, **kwargs) None[source]¶
Called every time
Model.evaluation_stepwill be called inTrainer.train,Trainer.validate, orTrainer.test.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.batch (BatchType) – The batch input to
Model.evaluation_step.
- on_metrics_computation_start(*, state: TrainerState, **kwargs) None[source]¶
Called every time
Model.evaluation_stephas been called and metrics will now be computed.Note
This event is equivalent to
on_evaluation_step_end.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.output (Batch) – The batch output by
Model.evaluation_step.metrics (MetricsHandler) – The metrics to compute.
- on_metrics_computation_end(*, state: TrainerState, **kwargs) None[source]¶
Called every time metrics have just been computed on a batch.
- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.detailed_metrics_df (pandas.DataFrame) – The evaluation metrics on the batch.
- on_validation_end(**kwargs) None[source]¶
Called at the end of every validation loop in
Trainer.train.Not to be confused with
on_validate_end().- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.metrics (MetricsHandler) – The validation metrics computed.
- on_epoch_end(**kwargs) None[source]¶
Called at the end of an epoch in
Trainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.
- on_train_end(*, maps: Maps, state: TrainerState, **kwargs) None[source]¶
Called once at the end of
Trainer.train.- Parameters:
model (Model) – The model associated to the
Trainer.state (TrainerState) – The current state of the
Trainer.