Search - Amazon Science

Evaluating large language models on controlled generation tasks

Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Wieting, Nanyun Peng, Xuezhe Ma

EMNLP 2023

2023

While recent studies have looked into the abilities of large language models in various benchmark tasks, few studies have looked into the controllability of large language models on generation tasks. We present a systematic and extensive analysis of the controllability of large language models on ten benchmarks, including a new simple yet challenging numerical planning benchmark with different granularities

Conversational AI

Dual-stage procurement with forecast updating for seasonal products at Amazon.com

Alvaro Maggiar, Alp Muharremoglu, Ali Sadighian

Social Science Research Network

2023

We present a procurement model for highly seasonal products, such as toys or fashion products, which are usually purchased through a combination of import and domestic buys, months ahead of typical vendor lead times, with potential riskier buys just-in-time. This procurement process diﬀers signiﬁcantly from the more common just-in-time buying, or repeated dual sourcing, since the decision to split orders

Operations research and optimization

Leveraging sparse and shared feature activations for disentangled representation learning

Marco Fumero, Florian Wenzel, Luca Zancato, Alessandro Achille, Emanuele Rodolà, Stefano Soatto, Bernhard Schölkopf, Francesco Locatello

NeurIPS 2023

2023

Research on recovering the latent factors of variation of high dimensional data has so far focused on simple synthetic settings. Mostly building on unsupervised and weakly-supervised objectives, prior work missed out on the positive implications for representation learning on real world data. In this work, we propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common

Computer vision

Formal methods in industry

Maurice H. ter Beek, Rod Chapman, Rance Cleaveland, Hubert Garavel, Rong Gu, Ivo ter Horst, Jeroen J. A. Keiren, Thierry Lecomte, Michael Leuschel, Kristin Y. Rozier, Augusto Sampaio, Cristina Seceleanu, Martyn Thomas, Tim A. C. Willemse, Lijun Zhang

Computer Science Curricula

2023

Formal methods encompass a wide choice of techniques and tools for the specification, development, analysis, and verification of software and hardware systems. Formal methods are widely applied in industry, in activities ranging from the elicitation of requirements and the early design phases all the way to the deployment, configuration, and runtime monitoring of actual systems. Formal methods allow one

Automated reasoning

Are you talking to [‘xem’] or [‘x’, ‘em’]? On tokenization and addressing misgendering in LLMs with pronoun tokenization parity

Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter, Rahul Gupta

NeurIPS 2023

2023

A large body of NLP research has documented the ways gender biases manifest and amplify within large language models (LLMs), though this research has pre- dominantly operated within a gender binary-centric context. A growing body of work has identified the harmful limitations of this gender-exclusive framing; many LLMs cannot correctly and consistently refer to persons outside the gender binary, especially

Conversational AI

Customer long term propensity driven Prime Video page composition

Venkataramana Kini, Ravi Divvela, Devendra Yadav, Zhen Wen, Fei Wang

RecSys 2023

2023

The Prime Video Homepage provides customers with several carousels to explore the diverse catalog. Each of these carousels is constructed around a certain theme. It’s not only important to compose the page with individual carousels relevant to the customer, but also balance different customer and business aspects. The Prime Video business positions itself as an entertainment hub with diverse content types

Search and information retrieval

Enhancing uncertainty-based hallucination detection with stronger focus

Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wan, Luoyi Fu

EMNLP 2023

2023

Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses

Conversational AI

CrossCodeEval: A diverse and multilingual benchmark for cross-file code completion

Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

NeurIPS 2023

2023

Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This oversimplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing and understanding

Machine learning

Faithful model evaluation for model-based metrics

Palash Goyal, Qian Hu, Rahul Gupta

EMNLP 2023

2023

Statistical significance testing is used in natural language processing (NLP) to determine whether the results of a study or experiment are likely to be due to chance or if they reflect a genuine relationship. A key step in significance testing is the estimation of confidence interval which is a function of sample variance. Sample variance calculation is straightforward when evaluating against ground truth

Conversational AI

MeSa: Masked, geometric, and supervised pre-training for monocular depth estimation

Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu (Michael) Lou

NeurIPS 2023 Workshop on Self-Supervised Learning — Theory and Practice

2023

Pre-training has been an important ingredient in developing strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By examining

Computer vision

Efficient toxic content detection by bootstrapping and distilling large language models

Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos Psounis

AAAI 2024

2023

Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs)

Conversational AI

Pre-trained recommender systems: A causal debiasing perspective

Ziqian Lin, Hao Ding, Nghia Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

WSDM 2024

2023

Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such

Machine learning

In-school and/or out-of-school computer science learning influence on CS career interests, mediated by having role-models

Chen Chen, Jonathan Rothwell, Pedrito Maynard-Zhang

Computer Science Education

2023

Background and Context: Both in- and out-of-school computer science (CS) learning opportunities are expanding, but their influences on CS career interests are unclear. Method: To investigate, we applied multinomial propensity score weighting analysis on a 2021 U.S. nationally representative sample of 4,116 5th-to-12th-grade students. Findings: The odds of expressing CS career interest increase by 171%,

Background summarization of event timelines

Adithya Pratapa, Kevin Small, Markus Dreyer

EMNLP 2023

2023

Generating concise summaries of news events is a challenging natural language processing task. While journalists often curate timelines to highlight key sub-events, newcomers to a news event face challenges in catching up on its historical context. In this paper, we address this need by introducing the task of background news summarization, which complements each timeline update with a background summary

Conversational AI

Two-pass endpoint detection for speech recognition

Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Tranh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

ASRU 2023

2023

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified by a 2nd-pass

Conversational AI

A preliminary study on associated learning for ASR

Pin-Jui Ku, Phani Sankar Nidadavolu, Brian King, Pegah Ghahremani, I-Fan Chen

2023 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

2023

In this paper, we propose the first successful implementation of associated learning (AL) to automatic speech recognition (ASR). AL has been shown to provide better label noise robustness, faster training convergence, and flexibility on model complexity than back-propagation (BP) in classification tasks. However, extending the learning approach to autoregressive models such as ASR, where model outputs are

Conversational AI

Exploring the use of contrastive language-image pre-training for human posture classification: insights from yoga pose analysis

Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski, Daniel Sáez-Trigueros

Advanced Methods and Applications with Deep Learning in Object Recognition

2023

Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness

Computer vision

Explainable AI using expressive Boolean formulas

Gili Rosenberg, Kyle Brubaker, Martin Schuetz, Grant Salton, Jason Zhu, Elton Yechao Zhu, Sima E. Borujeni, Serdar Kadıoğlu, Helmut Katzgraber

Machine Learning and Knowledge Extraction

2023

We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that can be

Quantum technologies

Christopher Staley

Applied Scientist

Fortuna: A library for uncertainty quantification

Gianluca Detommaso, Alberto Gasparin, Oleg Smirnov, Thomas Pinder, Christian Leibig, Paul Scemama

2023

Proper estimation of predictive uncertainty is fundamental in applications that involve critical decisions. Uncertainty can be used to assess the reliability of model predictions, trigger human intervention, or decide whether a model can be safely deployed in the wild. We introduce Fortuna, an open-source library for uncertainty quantification. Fortuna provides calibration methods, such as conformal prediction

Machine learning

Search results

Work with us