Using machine learning for virtual-machine placement in the cloud
In tests, a new way to allocate virtual machines across servers outperforms baselines by 10%.
In the cloud, load balancing, or distributing tasks evenly across servers, is essential to providing reliable service. It prevents individual servers from getting overloaded, which degrades their performance.
The simplest way to prevent server overloads is to cap the number of tasks assigned to each server. But this may result in inefficient resource use, as tasks can vary greatly in their computational demands. The ideal approach to load-balancing would allocate tasks to the minimum number of servers required to prevent overloads.
Last week, at the Conference on Machine Learning and Systems (MLSys), we presented a new algorithm for optimizing task distribution, called FirePlace. FirePlace is built around a decision-tree machine learning model, which we train using simulations based on historical data.
In experiments, we found that FirePlace outperformed both more-complex models, such as long-short-term-memory models and reinforcement learning models, and simpler baselines that have proved effective in practice, such as the power-of-two algorithm.
The name FirePlace comes from the Firecracker virtual machine (VM), which is used by Amazon Web Services’ (AWS) Lambda service. Lambda provides function execution as a service, sparing customers from provisioning infrastructure themselves and lowering their costs, since they are billed for function execution duration.
In cloud computing, virtual machines enable secure execution of customer code by moderating that code’s access to server operating systems. Traditionally, a cloud computing service might allot one VM to each application running on its servers. Firecracker, however, allots a separate VM to each function.
Firecracker VMs are secure and lightweight and can be packed densely into servers. Their small size gives them efficiency advantages, but it also makes them less predictable: the resource consumption of a large program is easier to estimate than the resource consumption of a single program function. Optimizing the placement of Firecracker VMs required a new approach to load balancing; hence FirePlace.
FirePlace uses a decision tree model that takes as input the resource consumption status of multiple servers in the fleet; to ensure that the model can deliver a decision within milliseconds, those servers are randomly sampled. The model’s output is the assignment of a new VM to one of the input servers.
Training by simulation
To train the model, we use historical data about real Firecracker VMs’ resource consumption, represented as time series. During training, when the model is presented with a new VM to place, each of the currently allocated VMs is at a particular step in its time series. We run a simulation to compute those VMs’ future resource consumption, and on that basis, we can optimize the placement of the new VM. The optimized placement then becomes the training label for the current input.
In our experiments, our baseline was the surprisingly effective power-of-two algorithm, which is widely used in cloud computing. It randomly picks two servers as potential recipients for a new VM, then selects the least loaded of the two.
We also compared our approach to one that used neural networks — a long-short-term-memory network (LSTM) and a temporal convolutional network (TCN) — that were trained to predict the future resource consumption of a given VM, based on its resource consumption up to that time.
Finally, we also compared our system to one that used reinforcement learning to learn optimal placement of a VM, given its previous decisions about VM placement. The learned model performed well for smaller datasets, but as we increase the number of VMs for placement, the complexity of the problem increases, and reinforcement learning models fail to converge to a competitive solution.
We evaluated these approaches according to how many servers they needed to serve a given load, given a fixed limit on server overloads; the lower the number of servers, the better. FirePlace improved upon the power-of-two baseline algorithm by 10%. The LSTM and TCN approaches were too inaccurate to be competitive.
Lambda has begun to introduce the FirePlace approach in production, where in future it can provide real-world validation of our experimental results.