AddressBind: Cross-modal alignment of addresses and geocodes
2025
Mapping addresses to geolocations accurately is a challenging and important problem, with many real-world applications such as delivery logistics, map building and path finding. High quality embedding of geospatial data (e.g., addresses, geocodes) which is grounded in real world play an important role in success of modeling tasks such as geocoding and address resolution/matching. Existing state-of-the-art (SOTA) approaches [9] have proposed to transform the address embedding space to mimic real world proximity via a triplet loss, but requires triplet engineering which is error prone and difficult to scale. In this work, we propose to embed addresses and geocodes data in the same embedding space to enable late fusion of cross-modal semantics and remove dependency on triplet creation. Our proposed model outperforms SOTA baselines (including Multilingual-E5-Large-Instruct [32], a top model on MTEB leaderboard) by improving geolocation accuracy and geocode outliers across geographies with diverse writing standards. We also observe significant gains in address embeddings quality intrinsically and the approach supports to jointly align more geospatial modalities.
Research areas