Amazon SageMaker autopilot: a white box AutoML solution at scale
We present Amazon SageMaker Autopilot: a fully managed system that provides an automatic machine learning solution. Given a tabular dataset and the target column name, Autopilot identifies the problem type, analyzes the data and produces a diverse set of complete ML pipelines, which are tuned to generate a leaderboard of candidate models that the customer can choose from. The diversity allows users to balance between different needs such as model accuracy vs. latency. By exposing not only the final models but the way they are trained, meaning the pipelines, we allow to customize the generated training pipeline, thus catering the need of users with different levels of expertise. This trait is crucial for users and is the main novelty of Autopilot; it provides a solution that on one hand is not fully black-box and can be further worked on, while on the other hand is not a do it yourself solution, requiring expertise in all aspects of machine learning. This paper describes the different components in the eco-system of Autopilot, emphasizing the infrastructure choices that allow scalability, high quality models, editable ML pipelines, consumption of artifacts of offline meta-learning, and a convenient integration with the entire SageMaker system allowing these trained models to be used in a production setting.