-
AISTATS 20252025Low precision (LP) datatypes such as MXFP4 can accelerate matrix multiplications (GEMMs) and reduce training costs. However, directly using MXFP4 instead of BF16 during training significantly degrades model quality. In this work, we present the first near-lossless training recipe that uses MXFP4 GEMMs, which are 2× faster than FP8 on supported hardware. Our key insight is to compute unbiased gradient estimates
-
2025We present unexpected findings from a large-scale benchmark study evaluating Conditional Average Treatment Effect (CATE) estimation algorithms, i.e., CATE models. By running 16 modern CATE models on 12 datasets and 43,200 sampled variants generated through diverse observational sampling strategies, we find that: (a) 62% of CATE estimates have a higher Mean Squared Error (MSE) than a trivial zero-effect
-
User modeling in large e-commerce platforms aims to optimize user experiences by incorporating various customer activities. Traditional models targeting a single task often focus on specific business metrics, neglecting the comprehensive user behavior, and thus limiting their effectiveness. To develop more generalized user representations, some existing work adopts Multi-task Learning (MTL) approaches.
-
Large Language Models (LLMs) are known to hallucinate and generate non-factual outputs which can undermine user trust. Traditional methods to directly mitigate hallucinations, such as representation editing and contrastive decoding, often require additional training data and involve high implementation complexity. While ensemble-based approaches harness multiple LLMs to tap into the "wisdom of crowds",
-
2025General-purpose language models (LMs) are aligned to diverse user intents, but fall short when it comes to specific applications. While finetuning is the default method for customized alignment, human annotations are often unavailable in various customization scenarios. Based on the observation that one of the main issues of LM customization is constraint adherence, we investigate the feasibility of using
Related content
-
November 27, 2020A privacy-preserving version of the popular XGBoost machine learning algorithm would let customers feel even more secure about uploading sensitive data to the cloud.
-
November 25, 2020Alexa Fund company releases updated and streamlined skill for Alexa that includes "AI Lullaby" soundscape with vocals, music, and voiceovers by Grimes.
-
November 25, 2020Watch the recorded panel discussion that aired the week of NeurIPS 2020.
-
November 19, 2020AI models exceed human performance on public data sets; modified training and testing could help ensure that they aren’t exploiting short cuts.
-
November 18, 2020Team works to address the needs of 600 million people online who together speak more than 22 Indian languages with over 19,500 dialects.
-
October 22, 2020Amazon distinguished scientist, and Max Planck Institute director receives honor from German newspaper WELT.