Replication as learning: Scalable knowledge distillation for multimodal enterprise agents

Sabrina Zhang; Elham Alipour; Tom Jin; Alex Moschos; Eugene Kim; Miriam Teng

Publication

Replication as learning: Scalable knowledge distillation for multimodal enterprise agents

By Sabrina Zhang, Elham Alipour, Tom Jin, Alex Moschos, Eugene Kim, Miriam Teng

2026

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Enterprise environments differ fundamentally from the clean settings assumed in LLM research: knowledge is distributed across heterogeneous sources, often incomplete or inconsistent, and key procedural logic is implicitly encoded in artifacts rather than explicitly documented. In such settings, retrieval-based approaches are insufficient, as no single source contains the full workflow. We propose a replication-driven knowledge distillation framework for scalable learning in multimodal agents. The agent learns by reverse-engineering validated artifacts (e.g., Excel workbooks), reconstructing the underlying data pipeline, and distilling the inferred logic into structured knowledge (claims, procedures, and domain patterns). This enables synthesis and validation across noisy sources and supports reuse in future tasks. We evaluate on 120 simulated enterprise environments with multimodal inputs (SQL, spreadsheets, documentation, messaging app, emails, images, PDFs, CSV) and controlled noise. Our method consistently outperforms retrieval-based baselines on both task execution and conceptual understanding, and remains robust under environmental drift.

Replication as learning: Scalable knowledge distillation for multimodal enterprise agents

Latest news

Work with us