The 10 most viewed publications of 2022

From a look back at Amazon Redshift to personalized complementary product recommendation, these are the most viewed publications authored by Amazon scientists and collaborators in 2022.

  1. In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

    Related content
    Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

    This cloud service was a significant leap from the traditional on-premise data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Customers embraced Amazon Redshift and it became the fastest growing service in AWS. Today, tens of thousands of customers use Redshift in AWS’s global infrastructure to process exabytes of data daily.

    Read the rest of the abstract and download the paper

  2. In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on one-shot summarization tasks, outperforming a much larger 540B PaLM decoder model.

    Related content
    With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

    AlexaTM 20B also achieves SOTA in one-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

    Read and download the paper

  3. Amazon DynamoDB is a NoSQL cloud database service that provides consistent performance at any scale. Hundreds of thousands of customers rely on DynamoDB for its fundamental properties: consistent performance, availability, durability, and a fully managed serverless experience. In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems - including Alexa, the Amazon.com sites, and Amazon fulfillment centers, made trillions of API calls to DynamoDB, peaking at 89.2 million requests per second, while experiencing high availability with single-digit millisecond performance.

    Related content
    Prioritizing predictability over efficiency, adapting data partitioning to traffic, and continuous verification are a few of the principles that help ensure stability, availability, and efficiency.

    Since the launch of DynamoDB in 2012, its design and implementation have evolved in response to our experiences operating it. The system has successfully dealt with issues related to fairness, traffic imbalance across partitions, monitoring, and automated system operations without impacting availability or performance. Reliability is essential, as even the slightest disruption can significantly impact customers. This paper presents our experience operating DynamoDB at a massive scale and how the architecture continues to evolve to meet the ever-increasing demands of customer workloads.

    Read and download the paper

  4. We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird’s-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map.

    Related content
    Reformulating the mapping problem to take advantage of sequence-to-sequence Transformers improves performance by an average of 15%.

    This lets us formulate map generation from an image as a set of sequence-to-sequence translations. Posing the problem as translation allows the network to use the context of the image when interpreting the role of each pixel. This constrained formulation, based upon a strong physical grounding of the problem, leads to a restricted transformer network that is convolutional in the horizontal direction only. The structure allows us to make efficient use of data when training, and obtains state-of-the-art results for instantaneous mapping of three large-scale datasets, including a 15% and 30% relative gain against existing best performing methods on the nuScenes and Argoverse datasets, respectively.

    Read and download the paper

  5. A/B tests, also known as online controlled experiments, have been used at scale by data-driven enterprises to guide decisions and test innovative ideas. Meanwhile, non-stationarity, such as the time-of-day effect, can commonly arise in various business metrics. We show that inadequately addressing non-stationarity can cause A/B tests to be statistically inefficient or invalid, leading to wrong conclusions. To address these issues, we develop a new framework that provides appropriate modeling and adequate statistical analysis for non-stationary A/B tests. Without changing the infrastructure for any existing A/B test procedure, we propose a new estimator that views time as a continuous covariate to perform post stratification with a sample-dependent number of stratification levels. We prove central limit theorem in a natural limiting regime under non-stationarity, so that valid large-sample statistical inference is available. We show that the proposed estimator achieves the optimal asymptotic variance among all estimators. When the experiment design phase of an A/B test allows, we propose a new time-grouped randomization approach to make a better balance on treatment and control assignments in presence of time non-stationarity. A brief account of numerical experiments are conducted to illustrate the theoretical analysis.

    Read and download the paper

  6. We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction.

    Read and download the paper

  7. Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or number of iterations, is exhausted. While the final performance after tuning heavily depends on the provided budget, it is hard to pre-specify an optimal value in advance.

    Related content
    Paper presents a criterion for halting the hyperparameter optimization process.

    In this work, we propose an effective and intuitive termination criterion for BO that automatically stops the procedure if it is sufficiently close to the global optimum. Our key insight is that the discrepancy between the true objective (predictive performance on test data) and the computable target (validation performance) suggests stopping once the sub-optimality in optimizing the target is dominated by the statistical estimation error. Across an extensive range of real-world HPO problems and baselines, we show that our termination criterion achieves a better trade-off between the test performance and optimization time. Additionally, we find that overfitting may occur in the context of HPO, which is arguably an overlooked problem in the literature, and show how our termination criterion helps to mitigate this phenomenon on both small and large datasets.

    Read and download the paper

  8. Online advertising opportunities are sold through auctions, billions of times every day across the web. Advertisers who participate in those auctions need to decide on a bidding strategy: how much they are willing to bid for a given impression opportunity.

    Related content
    Paper introduces a unified view of the learning-to-bid problem and presents AuctionGym, a simulation environment that enables reproducible validation of new solutions.

    Deciding on such a strategy is not a straightforward task, because of the interactive and reactive nature of the repeated auction mechanism. Indeed, an advertiser does not observe counterfactual outcomes of bid amounts that were not submitted, and successful advertisers will adapt their own strategies based on bids placed by competitors. These characteristics complicate effective learning and evaluation of bidding strategies based on logged data alone.

    Read the rest of the abstract and download the paper

  9. The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including: propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no ‘one-size-fits-all’ causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such datagenerative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework’s novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence’s ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.

    Read and download the paper

  10. Complementary product recommendation aims at providing product suggestions that are often bought together to serve a joint demand. Existing work mainly focuses on modeling product relationships at a population level, but does not consider personalized preferences of different customers. In this paper, we propose a framework for personalized complementary product recommendation capable of recommending products that fit the demand and preferences of the customers. Specifically, we model product relations and user preferences with a graph attention network and a sequential behavior transformer, respectively. The two networks are cast together through personalized re-ranking and contrastive learning, in which the user and product embedding are learned jointly in an end-to-end fashion. The system recognizes different customer interests by learning from their purchase history and the correlations among customers and products. Experimental results demonstrate that our model benefits from learning personalized information and outperforms non-personalized methods on real production data.

    Read and download the paper

Related content

US, WA, Seattle
Amazon is seeking an experienced, self-directed data scientist to support the research and analytical needs of Amazon Web Services' Sales teams. This is a unique opportunity to invent new ways of leveraging our large, complex data streams to automate sales efforts and to accelerate our customers' journey to the cloud. This is a high-visibility role with significant impact potential. You, as the right candidate, are adept at executing every stage of the machine learning development life cycle in a business setting; from initial requirements gathering to through final model deployment, including adoption measurement and improvement. You will be working with large volumes of structured and unstructured data spread across multiple databases and can design and implement data pipelines to clean and merge these data for research and modeling. Beyond mathematical understanding, you have a deep intuition for machine learning algorithms that allows you to translate business problems into the right machine learning, data science, and/or statistical solutions. You’re able to pick up and grasp new research and identify applications or extensions within the team. You’re talented at communicating your results clearly to business owners in concise, non-technical language. Key job responsibilities • Work with a team of analytics & insights leads, data scientists and engineers to define business problems. • Research, develop, and deliver machine learning & statistical solutions in close partnership with end users, other science and engineering teams, and business stakeholders. • Use AWS services like SageMaker to deploy scalable ML models in the cloud. • Examples of projects include modeling usage of AWS services to optimize sales planning, recommending sales plays based on historical patterns, and building a sales-facing alert system using anomaly detection.
US, WA, Seattle
We are a team of doers working passionately to apply cutting-edge advances in deep learning in the life sciences to solve real-world problems. As a Senior Applied Science Manager you will participate in developing exciting products for customers. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the leading edge of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with others teams. Location is in Seattle, US Embrace Diversity Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust Balance Work and Life Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives Mentor & Grow Careers Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future. Key job responsibilities • Manage high performing engineering and science teams • Hire and develop top-performing engineers, scientists, and other managers • Develop and execute on project plans and delivery commitments • Work with business, data science, software engineer, biological, and product leaders to help define product requirements and with managers, scientists, and engineers to execute on them • Build and maintain world-class customer experience and operational excellence for your deliverables
US, Virtual
The Amazon Economics Team is hiring Interns in Economics. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Stata, R, or Python is necessary. Experience with SQL, UNIX, Sawtooth, and Spark would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, data scientists and MBAʼs. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Seattle
Amazon internships are full-time (40 hours/week) for 12 consecutive weeks with start dates in May - July 2023. Our internship program provides hands-on learning and building experiences for students who are interested in a career in hardware engineering. This role will be based in Seattle, and candidates must be willing to work in-person. Corporate Projects (CPT) is a team that sits within the broader Corporate Development organization at Amazon. We seek to bring net-new, strategic projects to life by working together with customers and evolving projects from ZERO-to-ONE. To do so, we deploy our resources towards proofs-of-concept (POCs) and pilot programs and develop them from high-level ideas (the ZERO) to tangible short-term results that provide validating signal and a path to scale (the ONE). We work with our customers to develop and create net-new opportunities by relentlessly scouring all of Amazon and finding new and innovative ways to strengthen and/or accelerate the Amazon Flywheel. CPT seeks an Applied Science intern to work with a diverse, cross-functional team to build new, innovative customer experiences. Within CPT, you will apply both traditional and novel scientific approaches to solve and scale problems and solutions. We are a team where science meets application. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems.
US, WA, Seattle
Amazon internships are full-time (40 hours/week) for 12 consecutive weeks with start dates in May - July 2023. Our internship program provides hands-on learning and building experiences for students who are interested in a career in hardware engineering. This role will be based in Seattle, and candidates must be willing to work in-person. Corporate Projects (CPT) is a team that sits within the broader Corporate Development organization at Amazon. We seek to bring net-new, strategic projects to life by working together with customers and evolving projects from ZERO-to-ONE. To do so, we deploy our resources towards proofs-of-concept (POCs) and pilot programs and develop them from high-level ideas (the ZERO) to tangible short-term results that provide validating signal and a path to scale (the ONE). We work with our customers to develop and create net-new opportunities by relentlessly scouring all of Amazon and finding new and innovative ways to strengthen and/or accelerate the Amazon Flywheel. CPT seeks an Applied Science intern to work with a diverse, cross-functional team to build new, innovative customer experiences. Within CPT, you will apply both traditional and novel scientific approaches to solve and scale problems and solutions. We are a team where science meets application. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems.
US, CA, Palo Alto
The Amazon Search team creates powerful, customer-focused search solutions and technologies. Whenever a customer visits an Amazon site worldwide and types in a query or browses through product categories, Amazon Search services go to work. We design, develop, and deploy high performance, fault-tolerant distributed search systems used by millions of Amazon customers every day. We’re seeking a Principal Scientist with a deep expertise in Search Science. Your responsibilities will include everything from developing and prototyping innovative machine learning, and deep learning algorithms to implementing, testing, and supporting full solutions in a production environment. We are looking for innovators who can contribute to advancing search technology on what’s scientifically possible while remaining committed to creating world-class products. Joining this team, you’ll experience the benefits of working in a dynamic, entrepreneurial environment, while leveraging the resources of Amazon.com (AMZN), Earth's most customer-centric company one of the world's leading internet companies. We provide a highly customer-centric, team-oriented environment in our offices located in Palo Alto, California. Key job responsibilities As a hands-on leader of this team, you’ll be responsible for defining key research questions, identifying relevant data, adopting or proposing innovative machine learning solutions conducting rigorous experiments, publishing results and working with the engineering team to deploy these solutions. As a strategic leader, you will identify investment opportunities, develop long term strategies, and propose, prioritize and deliver on goals. You’ll also participate in organizational planning, hiring, mentorship and leadership development. You will be technically fearless and with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance). About the team Starting in 2009, the Visual Search & Augmented Reality team has thus far launched many visual search solutions on the Amazon App that use computer vision and machine learning/deep learning to help customers complete their shopping missions more easily; multiple internal teams at Amazon (devices, Kindle, Seller services, etc.) also use our libraries and APIs to deliver solutions to their own customers. We are a full stack shop, and our team capabilities cover the whole solution spectrum, ranging across applied science, large scale engineering services, product management, UX design, and mobile app development for iOS and Android.
LU, Luxembourg
&ltHire Relocation Requisition - not for posting> Provides insights to leadership on improving Supply Chain cost and Speed by using Data Science and Analytics techniques. Build Dashboards and models to industrialize these findings at scale.
US, VA, Arlington
The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are looking for economists who are able to work with business partners to hone complex problems into specific, scientific questions, and test those questions to generate insights. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team. Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science and a scientific resource for business teams, so that scientific processes permeate throughout the HR organization to the benefit of Amazonians and Amazon. Ideal candidates will own the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. Key job responsibilities Use causal inference methods to evaluate the impact of policies on employee outcomes. Examine how external labor market and economic conditions impact Amazon's ability to hire and retain talent. Use scientifically rigorous methods to develop and recommend career paths for employees. A day in the life Work with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions. About the team We are a multidisciplinary team that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer.
US, WA, Seattle
The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are looking for economists who are able to apply economic methods to address business problems. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team. Ideal candidates will work in a team setting with individuals from diverse disciplines and backgrounds. They will work with teammates to develop scientific models and conduct the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. Key job responsibilities Use causal inference methods to evaluate the impact of policies on employee outcomes. Examine how external labor market and economic conditions impact Amazon's ability to hire and retain talent. Use scientifically rigorous methods to develop and recommend career paths for employees. A day in the life Work with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions. About the team We are a multidisciplinary team that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer.
US, WA, Seattle
Amazon is looking for talented Postdoctoral Scientists to join our global Science teams for a one-year, full-time research position. Postdoctoral Scientists will innovate as members of Amazon’s key global Science teams, including: AWS, Alexa AI, Alexa Shopping, Amazon Style, CoreAI, Last Mile, and Supply Chain Optimization Technologies. Postdoctoral Scientists will join one of may central, global science teams focused on solving research-intense business problems by leveraging Machine Learning, Econometrics, Statistics, and Data Science. Postdoctoral Scientists will work at the intersection of ML and systems to solve practical data driven optimization problems at Amazon scale. Postdocs will raise the scientific bar across Amazon by diving deep into exploratory areas of research to enhance the customer experience and improve efficiencies. Please note: This posting is one of several Amazon Postdoctoral Scientist postings. Please only apply to a maximum of 2 Amazon Postdoctoral Scientist postings that are relevant to your technical field and subject matter expertise. Key job responsibilities * Work closely with a senior science advisor, collaborate with other scientists and engineers, and be part of Amazon’s vibrant and diverse global science community. * Publish your innovation in top-tier academic venues and hone your presentation skills. * Be inspired by challenges and opportunities to invent cutting-edge techniques in your area(s) of expertise.