Amazon SageMaker Debugger automates the debugging process of machine learning training jobs. From training jobs, Debugger allows you to run your own training script (Zero Script Change experience) using Debugger built-in features—Hook and Rule—to capture tensors, have flexibility to build customized Hooks and Rules for configuring tensors as you want, and make the tensors available for analysis by saving in an Amazon S3 bucket, all through a flexible and powerful API.
The smdebug library powers Debugger by calling the saved tensors from the S3 bucket during the training job. smdebug retrieves and filters the tensors generated from Debugger such as gradients, weights, and biases.
Debugger helps you develop better, faster, and cheaper models by minimally modifying estimator, tracing the tensors, catching anomalies while training models, and iterative model pruning.
Debugger supports TensorFlow, PyTorch, MXNet, and XGBoost frameworks.