-
2025Entity matching (EM), which identifies whether two data records refer to the same real-world entity, is crucial for knowledge base construction and enhancing data-driven AI systems. Recent advances in language models (LMs) have shown great potential in resolving entities with rich textual attributes. However, their performance heavily depends on how structured entities are "talked" through serialized text
-
Web and mobile systems show constant distribution shifts due to the evolvement of services, users, and threats, severely degrading the performance of threat detection models trained on prior distributions. Fast model adaptation with minimal new data is essential for maintaining reliable security measures. A key challenge in this context is the lack of ground truth, which undermines the ability of existing
-
2025A small subset of dimensions within language Transformers’ representation spaces emerge as "outliers" during pretraining, encoding critical knowledge sparsely. We extend previous findings on emergent outliers to Encoder-Decoder Transformers and instruction-finetuned models, and tackle the problem of distilling a student Transformer from a larger teacher Trans-former. Knowledge distillation reduces model
-
As the demand for online A/B testing continues to rises for tech companies, the opportunity cost of conducting these experiments becomes increasingly significant. Consequently, there is a rising need for an efficient continuous monitoring system capable of early terminating experiments when necessary. Existing literature and tools primarily focuses on early terminating experiments with evidently significant
-
Tabular data is one of the most common data formats found in the web and used in domains like finance, banking, e-commerce and medical. Although deep neural networks (DNNs) have demonstrated outstanding performance on homogeneous data such as visual, audio, and textual data, tree ensemble methods such as Gradient Boosted Decision Trees (GBDTs) are often the go-to choice for supervised machine learning problems
Related content
-
July 09, 2023Finding that 70% of attention heads and 20% of feed-forward networks can be excised with minimal effect on in-context learning suggests that large language models are undertrained.
-
July 05, 2023Amazon Research Award recipient Shrikanth Narayanan is on a mission to make inclusive human-AI conversational experiences.
-
June 21, 2023The senior applied science manager envisions machine learning as the path to a better experience for Amazon customers.
-
June 12, 2023The company’s work, supported by the Amazon Alexa Fund, has relevant applications for areas from perfumes to disease detection.
-
June 05, 2023Learn about the science behind the brand-new NHL EDGE IQ stat that debuted in April 2023.
-
June 02, 2023In a plenary talk, the Berkeley professor and Distinguished Amazon Scholar will argue that AI research should borrow concepts from economics and focus on social collectives.