Customer-obsessed science
Research areas
-
May 15, 20265 min readA new scaling law that relates particular architectural choices to loss helps identify models that improve throughput by up to 47% with no loss of accuracy.
-
May 14, 202616 min read
-
-
April 15, 20268 min read
Featured news
-
CVPR 2023 Workshop on New Frontiers for Zero-Shot Image Captioning Evaluation2023In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project1 and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using
-
EMNLP 2023 Eighth Conference on Machine Translation (WMT23)2023Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood. In this paper, we perform a controlled experiment and compare a baseline metric that has not been trained on human evaluations (Prism) to a trained version of the same metric (Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to machinetranslated references
-
NeurIPS 2023 Workshop on Optimization for Machine Learning (OPT2023)2023Contrastive Language-Image Pre-training (CLIP) has shown remarkable success in the field of multimodal learning by enabling joint understanding of text and images. In this paper, we introduce a novel method called Multi-head CLIP, inspired by Stein Variational Gradient Descent (SVGD) and Sharpness-aware Minimization (SAM). Our approach aims to enhance CLIP’s learning capability by encouraging the model
-
NeurIPS 2023 Workshop on Robustness of Zero/Few-shot Learning in Foundation Models (R0-FoMo)2023Recent advances in multimodal foundational models have demonstrated marvelous in-context learning capabilities for diverse vision-language tasks. However, existing literature have mainly focused on few-shot learning tasks similar to their NLP counterparts. It is unclear whether these foundation models can also address classical vision challenges such as few-shot classification, which in some settings (e.g
-
NeurIPS 20232023This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that human evaluation still remains the most reliable approach. We introduce a new task, Evaluating QA Evaluation (QA-Eval) and the corresponding dataset EVOUNA, designed to
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all