An evaluation benchmark for generative AI in security domain

Mina Ghashami; Mikhail Kuznetsov; Vianne Gao; Ganyu Teng; Phil Wallis; Joseph Xie; Ali Torkamani; Baris Coskun; Wei Ding

Publication

An evaluation benchmark for generative AI in security domain

By Mina Ghashami, Mikhail Kuznetsov, Vianne Gao, Ganyu Teng, Phil Wallis, Joseph Xie, Ali Torkamani, Baris Coskun, Wei Ding

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

As computing environments become increasingly complex and distributed, the volume and complexity of security data generated across various systems have grown exponentially. Extracting useful insights from this security data is crucial for effective security analytics, anomaly detection, and threat identification. However, there is a lack of comprehensive evaluation benchmarks for assessing the performance of large language models trained on any security log dataset, hindering progress in this domain. This paper proposes a comprehensive evaluation benchmark for security data, addressing this critical gap. The benchmark is easily adoptable to any security log dataset and comprises four diverse categories of tasks: supervised evaluations, unsupervised evaluations, anomaly detection, and semantic similarity evaluations. By providing a standardized framework for evaluation, the benchmark enables objective comparison and reproducible assessment of state-of-the-art embedding models across various computing environments and security log sources.

An evaluation benchmark for generative AI in security domain

Latest news

Work with us