Detecting robotic and compromised IPs in digital advertising
2023
Irrespective of the intent, malicious or benign, behind the origin of non-human traffic on sponsored advertising pages, failure to detect such unwanted traffic results in deterioration of advertiser performance metrics. Invalid (i.e., robotic) ad traffic is frequently driven by IP addresses (or address ranges) that are exclusively dedicated to VPNs, hosting or proxy services, Tor networks, as well as by unknown or residential IPs that comprise of bot networks set up to inflict maximum damage on a targeted group of advertisers. Sophisticated invalid traffic distributes ad activity across millions of IPs, switches back-and-forth between residential IPs with extremely short-lived dwell time, and disguises behind genuine human traffic to operate from compromised or mixed (sending both human and bot traffic) IPs. In order to mitigate rapidly evolving bot IP traffic, we propose an unsupervised model to generate robust IP embeddings from a mixture of autoencoder network experts, which can be segregated by basic heuristics for flagging entirely invalid IPs. Our contribution further includes the development of a new proxy label and a supervised network harnessing IP, search query and product embeddings, for the purpose of detecting mixed IPs sourcing invalid traffic only to specific sponsored search or product listing pages. Our proposed two-component IP detection system enhances suspicious IP traffic detection rate by 25% over a classical supervised model baseline.