Learning geolocation by accurately matching customer addresses via graph based active learning
We propose a novel adaptation of graph-based active learning for customer address resolution or de-duplication, with the aim to determine if two addresses represent the same physical building or not. For delivery systems, improving address resolution positively impacts multiple downstream systems such as geocoding, route planning and delivery time estimations, leading to an efficient and reliable delivery experience, both for customers as well as delivery agents. Our proposed approach jointly leverages address text, past delivery information and concepts from graph theory to retrieve informative and diverse record pairs to label. We empirically show the effectiveness of our approach on manually curated dataset across addresses from India (IN) and United Arab Emirates (UAE). We achieved 9.3% absolute improvement in recall on average across IN and UAE while preserving 95% precision over the existing production system. We also introduce delivery point (DP) geocode learning for cold-start addresses as a downstream application of address resolution. In addition to offline evaluation, we also performed online A/B experiments which show that when the production model is augmented with active learnt record pairs, the delivery precision improved by 7.84% and delivery defects reduced by 12.32% on an average across shipments from IN and UAE.