Event stream classification with limited labeled data for e-commerce monitoring
2021
Monitoring and diagnostics of large software systems is crucial to ensure uninterrupted functioning of modern businesses. Reliability engineers have to rely on automatic event processing to identify and mitigate any potential disruptions of the system health from underlying computer networks. As obtaining impact labels for individual events is expensive, systems operators usually maintain only a small representative dataset, making it hard for machine learning practitioners to train models on large-scale data streams. By formulating the problem within the multiple instance learning framework, we propose an approach to event classification that can be effectively trained using this limited information. Our evaluation results show potential 65% reduction in minutes spent by the network reliability engineers on disruption investigations when the proposed model is used. By automatically quantifying the network impact, the proposed approach streamlines the investigation process and reduces the risk of unnecessary wake-up calls among on-call reliability engineers and resolver personnel.
Research areas