CADENCE: Conditional anomaly detection for events Using noise-contrastive estimation
Many forms of interaction between computer systems and users are recorded in the form of event records, such as login events, API call records, bank transaction records, etc. These records are often comprised of high-dimensional categorical variables, such as user name, zip code, autonomous system number, etc. In this work, we consider anomaly detection for such data sets, where each record consists of multi-dimensional, potentially very high-cardinality, categorical variables. Our proposed technique, named Cadence, uses a combination of neural networks, low-dimensional representation learning and noise contrastive estimation. Our approach is based on estimating conditional probability density functions governing observed events, which are assumed to be mostly normal. This conditional modeling approach allows Cadence to consider each event in its own context, thereby significantly improving its accuracy. We evaluate our proposed method using both synthetic and real world data sets. Our results show that Cadence performs significantly better than existing methods at real-world anomaly detection tasks.