Amazon Redshift re-invented research paper and photos of Rahul Pathak, vice president of analytics at AWS, and Ippokratis Pandis, AWS senior principal engineer
The "Amazon Redshift re-invented" research paper will be presented at a leading database conference next month. Two of the paper's authors, Rahul Pathak (top right), vice president of analytics at AWS, and Ippokratis Pandis (bottom right), an AWS senior principal engineer, discuss the origins of Redshift, how the system has evolved in the past decade, and where they see the service evolving in the years ahead.

Amazon Redshift: Ten years of continuous reinvention

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

Nearly ten years ago, in November 2012 at the first-ever Amazon Web Services (AWS) re:Invent, Andy Jassy, then AWS senior vice president, announced the preview of Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse. The service represented a significant leap forward from traditional on-premises data warehousing solutions, which were expensive, inflexible, and required significant human and capital resources to operate.

In a blog post on November 28, 2012, Werner Vogels, Amazon chief technical officer, highlighted the news: “Today, we are excited to announce the limited preview of Amazon Redshift, a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.”

Further in the post, Vogels added, “The result of our focus on performance has been dramatic. Amazon.com’s data warehouse team has been piloting Amazon Redshift and comparing it to their on-premise data warehouse for a range of representative queries against a two billion row data set. They saw speedups ranging from 10x – 150x!”

That’s why, on the day of the announcement, Rahul Pathak, then a senior product manager, and the entire Amazon Redshift team were confident the product would be popular.

“But we didn’t really understand how popular,” he recalls.

“At preview we asked customers to sign up and give us some indication of their data volume and workloads,” Pathak, now vice president of Relational Engines at AWS, said. “Within about three days we realized that we had ten times more demand for Redshift than we had planned for the entire first year of the service. So we scrambled right after re:Invent to accelerate our hardware orders to ensure we had enough capacity on the ground for when the product became generally available in early 2013. If we hadn’t done that preview, we would have been caught short.”

The Redshift team has been sprinting to keep apace of customer demand ever since. Today, the service is used by tens of thousands of customers to process exabytes of data daily. In June a subset of the team will present the paper “Amazon Redshift re-invented ” at a leading international forum for database researchers, practitioners, and developers, the ACM SIGMOD/PODS Conference in Philadelphia.

Related content
Amazon DynamoDB was introduced 10 years ago today; one of its key contributors reflects on its origins, and discusses the 'never-ending journey' to make DynamoDB more secure, more available and more performant.

The paper highlights four key areas where Amazon Redshift has evolved in the past decade, provides an overview of the system architecture, describes its high-performance transactional storage and compute layers, details how smart autonomics are provided, and discusses how AWS and Redshift make it easy for customers to use the best set of services to meet their needs.

Amazon Science recently connected with two of the paper’s authors, Pathak, and Ippokratis Pandis, an AWS senior principal engineer, to discuss the origins of Redshift, how the system has evolved over the past decade, and where they see the service evolving in the years ahead.

  1. Q. 

    Can you provide some background on the origin story for Redshift? What were customers seeking, and how did the initial version address those needs?

    A. 

    Rahul: We had been meeting with customers who in the years leading up to the launch of Amazon Redshift had moved just about every workload they had to the cloud except for their data warehouse. In many cases, it was the last thing they were running on premises, and they were still dealing with all of the challenges of on-premises data warehouses. They were expensive, had punitive licensing, were hard to scale, and customers couldn’t analyze all of their data. Customers told us they wanted to run data warehousing at scale in the cloud, that they didn’t want to compromise on performance or functionality, and that it had to be cost-effective enough for them to analyze all of their data.

    So, this is what we started to build, operating under the code name Cookie Monster. This was at a time when customers’ data volumes were exploding, and not just from relational databases, but from a wide variety of sources. One of our early private beta customers tried it and the results came back so fast they thought the system was broken. It was about 10 to 20 times faster than what they had been using before. Another early customer was pretty unhappy with gaps in our early functionality. When I heard about their challenges, I got in touch, understood their feedback, and incorporated it into the service before we made it generally available in February 2013. This customer soon turned into one of our biggest advocates.

    When we launched the service and announced our pricing at $1000 a terabyte per year, people just couldn’t believe we could offer a product with that much capability at such a low price point. The fact that you could provision a data warehouse in minutes instead of months also caught everyone’s attention. It was a real game-changer for this industry segment.

    Ippokratis: I was at IBM Research at the time working on database technologies there, and we recognized that providing data warehousing as a cloud service was a game changer. It was disruptive. We were working with customers’ on-premises systems where it would take us several days or weeks to resolve an issue, whereas with a cloud data warehouse like Redshift, it would take minutes. It was also apparent that the rate of innovation would accelerate in the cloud.

    In the on-premises world, it was taking months if not years to get new functionality into a software release, whereas in the cloud new capabilities could be introduced in weeks, without customers having to change a single line of code in their consuming applications. The Redshift announcement was an inflection point; I got really interested in the cloud, and cloud data warehouses, and eventually joined Amazon [Ippokratis joined the Redshift team as a principal engineer in Oct. 2015].

  2. Q. 

    How has Amazon Redshift evolved over the past decade since the launch nearly 10 years ago?

    A. 

    Ippokratis: As we highlight in the paper, the service has evolved at a rapid pace in response to customers’ needs. We focused on four main areas: 1) customers’ demand for high-performance execution of increasingly complex analytical queries; 2) our customers’ need to process more data and significantly increase the number of users who need to derive insights from that data; 3) customers’ need for us to make the system easier to use; and 4) our customers’ desire to integrate Redshift with other AWS services, and the AWS ecosystem. That’s a lot, so we’ll provide some examples across each dimension.

    Related publication
    Enterprise companies use spatial data for decision optimization and gain new insights regarding the locality of their business and services. Industries rely on efficiently combining spatial and business data from different sources, such as data warehouses, geospatial information systems, transactional systems, and data lakes, where spatial data can be found in structured or unstructured form. In this demonstration

    Offering the leading price performance has been our primary focus since Rahul first began working on what would become Redshift. From the beginning, the team has focused on making core query execution latency as low as possible so customers can run more workloads, issue more jobs into the system, and run their daily analysis. To do this, Redshift generates C++ code that is highly optimized and then sends it to the distributor in the parallel database and executes this highly optimized code. This makes Redshift unique in the way it executes queries, and it has always been the core of the service.

    We have never stopped innovating here to deliver our customers the best possible performance. Another thing that’s been interesting to me is that in the traditional business intelligence (BI) world, you optimize your system for very long-running jobs. But as we observe the behavior of our customers in aggregate, what’s surprising is that 90 percent of our queries among the billions we run daily in our service execute in less than one second. That’s not what people had traditionally expected from a data warehouse, and that has changed the areas of the code that we optimize.

    Rahul: As Ippokratis mentioned, the second area we focused on in the paper was customers’ need to process more data and to use that data to drive value throughout the organization. Analytics has always been super important, but eight or ten years ago it wasn’t necessarily mission critical for customers in the same way transactional databases were. That has definitely shifted. Today, core business processes rely on Redshift being highly available and performant. The biggest architectural change in the past decade in support of this goal was the introduction of Redshift Managed Storage, which allowed us to separate compute and storage, and focus a lot of innovation in each area.

    Diagram of the Redshift Managed Storage
    The Redshift managed storage layer (RMS) is designed for a durability of 99.999999999% and 99.99% availability over a given year, across multiple availability zones. RMS manages both user data as well as transaction metadata.

    Another big trend has been the desire of customers to query across and integrate disparate datasets. Redshift was the first data warehouse in the cloud to query Amazon S3 data, that was with Redshift Spectrum in 2017. Then we demonstrated the ability to run a query that scanned an exabyte of data in S3 as well as data in the cluster. That was a game changer.

    Customers like NASDAQ have used this extensively to query data that’s on local disk for the highest performance, but also take advantage of Redshift’s ability to integrate with the data lake and query their entire history of data with high performance. In addition to querying the data lake, integrated querying of transactional data stores like Aurora and RDS has been another big innovation, so customers can really have a high-performance analytics system that’s capable of transparently querying all of the data that matters to them without having to manage these complex integration processes that other systems require.

    Illustration of how a query flows through Redshift.
    This diagram from the research paper illustrates how a query flows through Redshift. The sequence is described in detail on pages 2 and 3 of the paper.

    Ippokratis: The third area we focused on in the paper was ease of use. One change that stands out for me is that on-premises data warehousing required IT departments to have a DBA (data base administrator) who would be responsible for maintaining the environment. Over the past decade, the expectation from customers has evolved. Now, if you are offering data warehousing as a service, the systems must be capable of auto tuning, auto healing, and auto optimizing. This has become a big area of focus for us where we incorporate machine learning and automation into the system to make it easier to use, and to reduce the amount of involvement required of administrators.

    Rahul: In terms of ease of use, three innovations come to mind. One is concurrency scaling. Similar to workload management, customers would previously have to manually tweak concurrency or reset clusters of the manually split workloads. Now, the system automatically provisions new resources and scales up and down without customers having to take any action. This is a great example of how Redshift has gotten much more dynamic and elastic.

    The second ease of use innovation is automated table optimization. This is another place where the system is able to observe workloads and data layouts and automatically suggest how data should be sorted and distributed across nodes in the cluster. This is great because it’s a continuously learning system so workloads are never static in time.

    Related publication
    How should we split data among the nodes of a distributed data warehouse in order to boost performance for a forecasted workload? In this paper, we study the effect of different data partitioning schemes on the overall network cost of pairwise joins. We describe a generally-applicable data distribution framework initially designed for Amazon Redshift, a fully-managed petabyte-scale data warehouse in the

    Customers are always adding more datasets, and adding more users, so what was optimal yesterday might not be optimal tomorrow. Redshift observes this and modifies what's happening under the covers to balance that. This was the focus of a really interesting graph optimization paper that we wrote a few years ago about how to analyze for optimal distribution keys for how data is laid out within a multi-node parallel-processing system. We've coupled this with automated optimization and then table encoding. In an analytics system, how you compress data has a big impact because the less data you scan, the faster your queries go. Customers had to reason about this in the past. Now Redshift can automatically determine how to encode data correctly to deliver the best possible performance for the data and the workload.

    The third innovation I want to highlight here is Amazon Redshift Serverless, which we launched in public preview at re:Invent last fall. Redshift Serverless removes all of the management of instances and clusters, so customers can focus on getting to insights from data faster and not spend time managing infrastructure. With Redshift Serverless, customers can simply provision an endpoint and begin to interact with their data, and Redshift Serverless will auto scale and automatically manage the system to essentially remove all of that complexity from customers.

    Customers can just focus on their data, set limits to manage their budgets, and we deliver optimal performance between those limits. This is another massive step forward in terms of ease of use because it eliminates any operations for customers. The early response to the preview has been tremendous. Thousands of customers have been excited to put Amazon Redshift Serverless through its paces over the past few months, and we’re excited about making it generally available in the near future.

    Amazon Redshift architecture diagram
    The Amazon Redshift architecture as presented in the research paper.

    Ippokratis: A fourth area of focus in the paper is on integration with other AWS services, and the AWS ecosystem. Integration is another area where customer behavior has evolved from traditional BI use cases. Today, cloud data warehouses are a central hub with tight integration with a broader set of AWS services. We provided the ability for customers to join data from the warehouse with the data lake. Then customers said they needed access to high-velocity business data in operational databases like Aurora and RDS, so we provided access to these operational data stores. Then we added support for streams, as well as integration with SageMaker and Lambda so customers can run machine learning training and inference without moving their data, and do generic compute. As a result, we’ve converted the traditional BI system into a well-integrated set of AWS services.

    Rahul: One big area of integration has been with our machine-learning ecosystem. With Redshift ML we have enabled anyone who knows SQL to take advantage of all of our machine-learning innovation. We built the ability to create a model from the SQL prompt, which gets the data into Amazon S3 and calls Amazon SageMaker, to use automated machine learning to build the most appropriate model to provide predictions on the data.

    This model is compiled efficiently and brought back into the data warehouse for customers to run very high-performance parallel inferences with no additional compute or no extra cost. The beauty of this integration is that every innovation we make within SageMaker means that Redshift ML gets better as well. This is just another means by which customers benefit from us connecting our services together.

    Related content
    Amazon researchers describe new method for distributing database tables across servers.

    Another big area for integration has been data sharing. Once we separated storage and compute layers with RA3 instances, we could enable data sharing, giving customers the ability to share data with clusters in the same account, and other accounts, or across regions. This allows us to separate consumers from producers of data, which enables things like modern data mesh architectures. Customers can share data without data copying, so they are transactionally consistent across accounts.

    For example, users within a data-science organization can securely work from the shared data, as can users within the reporting or marketing organization. We’ve also integrated data sharing with AWS Data Exchange, so now customers can search for — and subscribe to — third-party datasets that are live, up to date, and can be queried immediately in Redshift. This has been another game changer from the perspective of setting data free, enabling data monetization for third-party providers, and secure and live data access and licensing for subscribers for high-performance analytics within and across organizations. The fact that Redshift is part of an incredibly rich data ecosystem is a huge win for customers, and in keeping with customers’ desire to make data more pervasively available across the company.

  3. Q. 

    You indicate in the paper that Redshift innovation is continuing at an accelerated pace.  How do you see the cloud data warehouse segment evolving – and more specifically Redshift – over the next several years?

    A. 

    Rahul: A few things will continue to be true as we head into the future. Customers will be generating ever more amounts of data, and they’re going to want to analyze that data more cost effectively. Data volumes are growing exponentially, but obviously customers don't want their costs growing exponentially. This requires that we continue to innovate, and find new levels of performance to ensure that the cost of processing a unit of data continues to go down.

    We’ll continue innovating in software, in hardware, in silicon, and in using machine learning to make sure we deliver on that promise for customers. We’ve delivered on that promise for the past 10 years, and we’ll focus on making sure we deliver on that promise into the future.

    I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.
    Ippokratis Pandis

    Also, customers are always going to want better availability, they’re always going to want their data to be secure, and they’re always going to want more integrations with more data sources, and we intend to continue to deliver on all of those. What will stay the same is our ability to offer the-best in-segment price performance and capabilities, and the best-in-segment integration and security because they will always deliver value for customers.

    Ippokratis: It has been an incredible journey; we have been rebuilding the plane as we’ve been flying it with customers onboard, and this would not have happened without the support of AWS leadership, but most importantly the tremendous engineers, managers, and product people who have worked on the team.

    As we did in the paper, I want to recognize the contributions of Nate Binkert and Britt Johnson, who have passed, but whose words of wisdom continue to guide us. We’ve taken data warehousing, what we learned from books in school (Ippokratis earned his PhD in electrical and computer engineering from Carnegie Mellon University) and brought it to the cloud. In the process, we’ve been able to innovate, and write new pages in the book. I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.

Research areas

Related content

US, CA, Palo Alto
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Amazon Ads Response Prediction team is your choice, if you want to join a highly motivated, collaborative, and fun-loving team with a strong entrepreneurial spirit and bias for action. We are seeking an experienced and motivated Machine Learning Applied Scientist who loves to innovate at the intersection of customer experience, deep learning, and high-scale machine-learning systems. Amazon Advertising operates at the intersection of eCommerce and advertising, and is investing heavily in building a world-class advertising business. We are defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long-term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products to improve both shopper and advertiser experience. With a broad mandate to experiment and innovate, we grow at an unprecedented rate with a seemingly endless range of new opportunities. We are looking for a talented Machine Learning Applied Scientist for our Amazon Ads Response Prediction team to grow the business. We are providing advanced real-time machine learning services to connect shoppers with right ads on all platforms and surfaces worldwide. Through the deep understanding of both shoppers and products, we help shoppers discover new products they love, be the most efficient way for advertisers to meet their customers, and helps Amazon continuously innovate on behalf of all customers. Key job responsibilities As a Machine Learning Applied Scientist, you will: * Conduct deep data analysis to derive insights to the business, and identify gaps and new opportunities * Develop scalable and effective machine-learning models and optimization strategies to solve business problems * Run regular A/B experiments, gather data, and perform statistical analysis * Work closely with software engineers to deliver end-to-end solutions into production * Improve the scalability, efficiency and automation of large-scale data analytics, model training, deployment and serving * Conduct research on new machine-learning modeling to optimize all aspects of Sponsored Products and Brands business About the team We are pioneers in applying advanced machine learning and generative AI algorithms in Sponsored Products and Brands business. We empower every customer with a customized discovery experiences from back-end optimization (such as customized response prediction models) to front-end CX innovation (such as widgets), to help shoppers feel understood and shop efficiently on and off Amazon.
US, WA, Seattle
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities We are looking for an Applied Science Manager to lead the Insights & Prompt Generation vertical within the Conversational Discovery Experiences (CAX) team in Sponsored Products and Brands (SPB). This team owns prompt generation, quality, personalization, and coverage for Sponsored Prompts, a new conversational ad format powered by large language models (LLMs) that helps shoppers discover products across Amazon.com. As an Applied Science Manager, you will lead a team of applied scientists and engineers to build and scale the prompt generation pipeline, develop new prompt themes and quality frameworks, and drive coverage expansion across all surfaces. You will own the science roadmap for prompt generation and personalization. You will define the metrics that measure prompt effectiveness and drive experimentation to improve CTR, helpfulness, and advertiser outcomes. This role requires strong technical depth in NLP, LLMs, and information retrieval, combined with the ability to manage and grow a science team, set research direction, and influence product strategy. You will work across organizational boundaries with engineering, product, and business teams to translate science investments into measurable business impact.
US, CA, Pasadena
The Amazon Center for Quantum Computing in Pasadena, CA, is looking to hire an Applied Scientist specializing in Testing of Control Systems hardware. Working alongside other scientists and engineers, you will validate hardware and software systems performing the control and readout functions for Amazon quantum processors. Working effectively within a cross-functional team environment is critical. The ideal candidate will have an established background in test engineering applicable to large mixed-signal systems. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Key job responsibilities Our scientists and engineers collaborate across diverse teams and projects to offer state of the art, cost effective solutions for the control of Amazon quantum processor systems. You’ll bring a passion for innovation and collaboration to: Develop automated test scripts for mid-volume electronics manufacturing, utilizing high-speed test equipment such as Gsps oscilloscopes, logic analyzers, and network analyzers. Design and implement test plans for high-speed, mixed-signal PCAs and instrument assemblies, covering analog/digital interfaces, ADCs/DACs, FPGAs, and power distribution systems. Develop test requirements and coverage matrices with hardware and software stakeholders, including optimization of test coverage vs test time. Analyze test data to identify failure root causes and trends, implement corrective actions, and drive design-for-testability (DFT) enhancements. Drive continuous test improvement to improve test accuracy, improve final product reliability, and adapt to new measurement requirements.
US, WA, Seattle
This role will contribute to developing the Economics and Science products and services in the Fee domain, with specialization in supply chain systems and fees. Through the lens of economics, you will develop causal links for how Amazon, Sellers and Customers interact. You will be a key and senior scientist, advising Amazon leaders how to price our services. You will work on developing frameworks and scalable, repeatable models supporting optimal pricing and policy in the two-sided marketplace that is central to Amazon's business. The pricing for Amazon services is complex. You will partner with science and technology teams across Amazon including Advertising, Supply Chain, Operations, Prime, Consumer Pricing, and Finance. We are looking for an experienced Economist to improve our understanding of seller Economics, enhance our ability to estimate the causal impact of fees, and work with partner teams to design pricing policy changes. In this role, you will provide guidance to scientists to develop econometric models to influence our fee pricing worldwide. You will lead the development of causal models to help isolate the impact of fee and policy changes from other business actions, using experiments when possible, or observational data when not. Key job responsibilities The ideal candidate will have extensive Economics knowledge, demonstrated strength in practical and policy relevant structural econometrics, strong collaboration skills, proven ability to lead highly ambiguous and large projects, and a drive to deliver results. They will work closely with Economists, Data / Applied Scientists, Strategy Analysts, Data Engineers, and Product leads to integrate economic insights into policy and systems production. Familiarity with systems and services that constitute seller supply chains is a plus but not required. About the team The Stores Economics and Sciences team is a central science team that supports Amazon's Retail and Supply Chain leadership. We tackle some of Amazon's most challenging economics and machine learning problems, where our mandate is to impact the business on massive scale.
US, WA, Seattle
WW Amazon Stores Finance Science (ASFS) works to leverage science and economics to drive improved financial results, foster data backed decisions, and embed science within Finance. ASFS is focused on developing products that empower controllership, improve business decisions and financial planning by understanding financial drivers, and innovate science capabilities for efficiency and scale. We are looking for a data scientist to lead high visibility initiatives for forecasting Amazon Stores' financials. You will develop new science-based forecasting methodologies and build scalable models to improve financial decision making and planning for senior leadership up to VP and SVP level. You will build new ML and statistical models from the ground up that aim to transform financial planning for Amazon Stores. We prize creative problem solvers with the ability to draw on an expansive methodological toolkit to transform financial decision-making with science. The ideal candidate combines data-science acumen with strong business judgment. You have versatile modeling skills and are comfortable owning and extracting insights from data. You are excited to learn from and alongside seasoned scientists, engineers, and business leaders. You are an excellent communicator and effectively translate technical findings into business action. Key job responsibilities Demonstrating thorough technical knowledge, effective exploratory data analysis, and model building using industry standard ML models Working with technical and non-technical stakeholders across every step of science project life cycle Collaborating with finance, product, data engineering, and software engineering teams to create production implementations for large-scale ML models Innovating by adapting new modeling techniques and procedures Presenting research results to our internal research community
US, WA, Seattle
As part of the AWS Applied AI Solutions Core Services organization, we're advancing the frontier of geospatial intelligence and AI-powered spatial reasoning. Our vision is to be the trusted foundation for transforming every business with Amazon AI teammates. Our mission is to deliver turnkey, enterprise-grade foundational AI capabilities that create delightful AI powered solutions. We're building sophisticated AI systems that enable intelligent agents to understand and operate effectively in the physical world through advanced geospatial optimization. Key job responsibilities - Develop geospatial optimization models that generalize across diverse customer use cases in logistics, transportation, and spatial planning - Scope optimization projects with multiple customers in mind, abstracting away complex science problems to create scalable solutions - Discover, evaluate, and adapt existing optimization models and geospatial tools for customer deployment - Develop semantic enrichment methods to integrate heterogeneous data sources including open geospatial data, multimodal sensor data, images, videos, satellite imagery, and documents - Research novel approaches combining AI agents with geospatial optimization to solve complex spatial problems - Collaborate with engineering teams to integrate science components into production systems - Conduct rigorous experimentation and establish evaluation frameworks to measure solution performance A day in the life A day in the life As an Applied Scientist, you'll develop optimization algorithms and AI-powered geospatial solutions while maintaining a clear path to customer impact. You'll investigate novel approaches to spatial optimization, develop methods for semantic data enrichment, and validate ideas through rigorous experimentation with real customer data. You'll collaborate with other scientists and engineers to transform research insights into scalable solutions, work directly with enterprise customers to understand requirements, and help shape the future direction. Leveraging and advancing generative AI technology will be a big part of your charter. About the team Our Applied AI Solutions Core Services Science team is tackling fundamental challenges in geospatial optimization and AI-powered spatial reasoning. We're investigating novel approaches to how AI systems can solve complex logistics and transportation problems, reason about spatial relationships, and integrate diverse data sources to create enterprise-grade geospatial intelligence. Working at the intersection of optimization, large language models, and geospatial data science, we're developing practical techniques that advance the state-of-the-art in geospatial AI.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference and/or structural econometrics skillsets to solve real world problems. The intern will work in the area of Economics Intelligence in Amazon Returns and Recommerce Technology and Innovation and develop new, data-driven solutions to support the most critical components of this rapidly scaling team. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The WWRR Economics Intelligence (RREI) team brings together Economists, Data Scientists, and Business Intelligence Engineers experts to delivers economic solutions focused on forecasting, causality, attribution, customer behavior for returns, recommerce, and sustainability domains.
US, CA, San Francisco
AWS is one of Amazon’s largest and fastest growing businesses, serving millions of customers in more than 190 countries. We use cloud computing to reshape the way global enterprises use information technology. We are looking for entrepreneurial, analytical, creative, flexible leaders to help us redefine the information technology industry. If you want to join a fast-paced, innovative team that is making history, this is the place for you. AWS Central Economics & Science (ACES) drives best practices for objectively applying economics and science in decision making across AWS. The team collaborates with AWS science and business teams to identify, frame, and analyze complex and ambiguous problems of the highest priority. Through data-driven insights and modeling, ACES supports strategic decision-making across the AWS global organization, including sales operations and business performance optimization. The ACES Sales Channels team is hiring an Applied Scientist (Senior or below) to advance our mission of providing rigorous, causal-inference-driven recommendations for AWS sales optimization. This role will focus on building ML systems with a causal modeling foundation, designing seller incentive mechanisms, and developing intervention strategies across the entire sales motion. Key job responsibilities • Causal ML System Development: Build and deploy machine learning models that emphasize causal inference, ensuring recommendations are grounded in valid interventions • Incentive Design: Define and model incentives that drive desirable behaviors across AWS sales channels, partner programs, and reseller ecosystems • Stakeholder Collaboration: Work with business stakeholders to understand requirements, validate approaches, and ensure practical applicability of scientific solutions • Scientific Rigor: Promote findings at internal conferences and contribute to the team's reputation for methodological excellence A day in the life The ACES Sales Channels team works on understanding and optimizing AWS's sales channels, both direct (generalist and specialist sellers) and indirect (partners and Marketplace). Our work falls into three core areas: developing rigorous causal measurement and modeling frameworks using cutting-edge economics and statistical methods; designing programs and incentives to improve customer and business outcomes; and building ML-based recommendation systems for sellers, partners, and other AWS stakeholders. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, WA, Bellevue
The Principal Applied Scientist will own the science mission for building next-generation proactive and autonomous agentic experiences across Alexa AI's Personalization, Autonomy and Proactive Intelligence organization. You will technically lead a team of applied scientists to harness state-of-the-art technologies in machine learning, natural language processing, LLM training and application, and agentic AI systems to advance the scientific frontiers of autonomous intelligence and proactive user assistance. The right candidate will be an inventor at heart, provide deep scientific leadership, establish compelling technical direction and vision, and drive ambitious research initiatives that push the boundaries of what's possible with AI agents. You will need to be adept at identifying promising research directions in agentic AI, developing novel autonomous agent solutions, and translating advanced AI research into production-ready agentic systems. You will need to be adept at influencing and collaborating with partner teams, launching AI-powered autonomous agents into production, and building team mechanisms that will foster innovation and execution in the rapidly evolving field of agentic AI. This role represents a unique opportunity to tackle fundamental challenges in how Alexa proactively understands user needs, autonomously takes actions on behalf of users, and delivers intelligent assistance through state-of-the-art agentic AI technologies. As a science leader in Alexa AI, you will shape the technical strategy for making Alexa a truly proactive and autonomous agent that anticipates user needs, takes intelligent actions, and provides seamless assistance without explicit prompting. Your team will be at the forefront of solving complex problems in agentic reasoning, multi-step task planning, autonomous decision-making, proactive intelligence, and context-aware action execution that will fundamentally transform how users interact with Alexa as an intelligent agent. The successful candidate will bring deep technical expertise in machine learning, natural language processing, and agentic AI systems, along with the leadership ability to guide talented scientists in pursuing ambitious research that advances the state of the art in autonomous agents, proactive intelligence, and AI-driven personalization. Experience with multi-agent systems, reinforcement learning, goal-oriented dialogue systems, and production-scale agentic architectures is highly valued. You will lead the development of breakthrough capabilities that enable Alexa to: 1) proactively anticipate user needs through advanced predictive modeling and contextual understanding; 2) autonomously execute complex multi-step tasks with minimal user intervention; 3) reason and plan intelligently across diverse user goals and environmental contexts; 4) learn and adapt continuously from user interactions to improve agentic behaviors; 5) coordinate actions seamlessly across multiple domains and services as a unified intelligent agent. This is a unique opportunity to define the future of conversational AI agents and build technology that will impact hundreds of millions of customers worldwide. Key job responsibilities Technical Leadership - Lead complex research and development projects - Partner closely with the T&C Product and Engineering leaders on the technical strategy and roadmap - Evaluate emerging technologies and methodologies - Make high-level architectural decisions Technical leadership and mentoring: - Mentor and develop technical talent - Set team project goals and metrics - Help with resource allocation and project prioritization from technical side Research & Development - Drive innovation in applied science areas - Translate research into practical business solutions - Author technical papers and patents - Collaborate with academic and industry partners About the team PAPI (Personalization Autonomy and Proactive Intelligence) aims to accelerate personalized and intuitive experiences across Amazon's customer touchpoints through automated, scalable, self-serve AI systems. We leverage customer, device, and ambient signals to deliver conversational, visual, and proactive experiences that delight customers, increase engagement, reduce defects, and enable natural interactions across Amazon touch points including Alexa, FireTV, and Mobile etc. Our systems offer personalized suggestions, comprehend customer inputs, learn from interactions, and propose appropriate actions to serve millions of customers globally.
US, WA, Seattle
Amazon has co-founded and signed The Climate Pledge, a commitment to reach net zero carbon by 2040. As a team, we leverage GenAI, sensors, smart home devices, cloud services, material science, and Alexa to build products that have a meaningful impact for customers and the climate. In alignment with this bold corporate goal, the Amazon Devices & Services organization is looking for a passionate, talented, and inventive Senior Applied Scientist to help build revolutionary products with potential for major societal impact. Great candidates for this position will have expertise in the areas of agentic AI applications, deep learning, time series analysis, LLMs, and multimodal systems. This includes experience designing autonomous AI agents that can reason, plan, and execute multi-step tasks, building tool-augmented LLM systems with access to external APIs and data sources, implementing multi-agent orchestration, and developing RAG architectures that combine LLMs with domain-specific knowledge bases. You will strive for simplicity and creativity, demonstrating high judgment backed by statistical proof. Key job responsibilities As a Senior Applied Scientist on the Energy Science team, you'll design and deploy agentic AI systems that autonomously analyze data, plan solutions, and execute recommendations. You'll build multi-agent architectures where specialized AI agents coordinate to solve complex optimization problems, and develop tool-augmented LLM applications that integrate with external data sources and APIs to deliver context-aware insights. Your work involves creating multimodal AI systems that synthesize diverse data streams, while implementing RAG pipelines that ground large language models in domain-specific knowledge bases. You'll apply advanced machine learning and deep learning techniques to time series analysis, forecasting, and pattern recognition. Beyond technical innovation, you'll drive end-to-end product development from research through production deployment, collaborating with cross-functional teams to translate AI capabilities into customer experiences. You'll establish rigorous experimentation frameworks to validate model performance and measure business impact, building AI-driven products with potential for major societal impact.