Amazon Redshift re-invented research paper and photos of Rahul Pathak, vice president of analytics at AWS, and Ippokratis Pandis, AWS senior principal engineer
The "Amazon Redshift re-invented" research paper will be presented at a leading database conference next month. Two of the paper's authors, Rahul Pathak (top right), vice president of analytics at AWS, and Ippokratis Pandis (bottom right), an AWS senior principal engineer, discuss the origins of Redshift, how the system has evolved in the past decade, and where they see the service evolving in the years ahead.

Amazon Redshift: Ten years of continuous reinvention

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

Nearly ten years ago, in November 2012 at the first-ever Amazon Web Services (AWS) re:Invent, Andy Jassy, then AWS senior vice president, announced the preview of Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse. The service represented a significant leap forward from traditional on-premises data warehousing solutions, which were expensive, inflexible, and required significant human and capital resources to operate.

In a blog post on November 28, 2012, Werner Vogels, Amazon chief technical officer, highlighted the news: “Today, we are excited to announce the limited preview of Amazon Redshift, a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.”

Further in the post, Vogels added, “The result of our focus on performance has been dramatic. Amazon.com’s data warehouse team has been piloting Amazon Redshift and comparing it to their on-premise data warehouse for a range of representative queries against a two billion row data set. They saw speedups ranging from 10x – 150x!”

That’s why, on the day of the announcement, Rahul Pathak, then a senior product manager, and the entire Amazon Redshift team were confident the product would be popular.

“But we didn’t really understand how popular,” he recalls.

“At preview we asked customers to sign up and give us some indication of their data volume and workloads,” Pathak, now vice president of Relational Engines at AWS, said. “Within about three days we realized that we had ten times more demand for Redshift than we had planned for the entire first year of the service. So we scrambled right after re:Invent to accelerate our hardware orders to ensure we had enough capacity on the ground for when the product became generally available in early 2013. If we hadn’t done that preview, we would have been caught short.”

The Redshift team has been sprinting to keep apace of customer demand ever since. Today, the service is used by tens of thousands of customers to process exabytes of data daily. In June a subset of the team will present the paper “Amazon Redshift re-invented ” at a leading international forum for database researchers, practitioners, and developers, the ACM SIGMOD/PODS Conference in Philadelphia.

Related content
Amazon DynamoDB was introduced 10 years ago today; one of its key contributors reflects on its origins, and discusses the 'never-ending journey' to make DynamoDB more secure, more available and more performant.

The paper highlights four key areas where Amazon Redshift has evolved in the past decade, provides an overview of the system architecture, describes its high-performance transactional storage and compute layers, details how smart autonomics are provided, and discusses how AWS and Redshift make it easy for customers to use the best set of services to meet their needs.

Amazon Science recently connected with two of the paper’s authors, Pathak, and Ippokratis Pandis, an AWS senior principal engineer, to discuss the origins of Redshift, how the system has evolved over the past decade, and where they see the service evolving in the years ahead.

  1. Q. 

    Can you provide some background on the origin story for Redshift? What were customers seeking, and how did the initial version address those needs?

    A. 

    Rahul: We had been meeting with customers who in the years leading up to the launch of Amazon Redshift had moved just about every workload they had to the cloud except for their data warehouse. In many cases, it was the last thing they were running on premises, and they were still dealing with all of the challenges of on-premises data warehouses. They were expensive, had punitive licensing, were hard to scale, and customers couldn’t analyze all of their data. Customers told us they wanted to run data warehousing at scale in the cloud, that they didn’t want to compromise on performance or functionality, and that it had to be cost-effective enough for them to analyze all of their data.

    So, this is what we started to build, operating under the code name Cookie Monster. This was at a time when customers’ data volumes were exploding, and not just from relational databases, but from a wide variety of sources. One of our early private beta customers tried it and the results came back so fast they thought the system was broken. It was about 10 to 20 times faster than what they had been using before. Another early customer was pretty unhappy with gaps in our early functionality. When I heard about their challenges, I got in touch, understood their feedback, and incorporated it into the service before we made it generally available in February 2013. This customer soon turned into one of our biggest advocates.

    When we launched the service and announced our pricing at $1000 a terabyte per year, people just couldn’t believe we could offer a product with that much capability at such a low price point. The fact that you could provision a data warehouse in minutes instead of months also caught everyone’s attention. It was a real game-changer for this industry segment.

    Ippokratis: I was at IBM Research at the time working on database technologies there, and we recognized that providing data warehousing as a cloud service was a game changer. It was disruptive. We were working with customers’ on-premises systems where it would take us several days or weeks to resolve an issue, whereas with a cloud data warehouse like Redshift, it would take minutes. It was also apparent that the rate of innovation would accelerate in the cloud.

    In the on-premises world, it was taking months if not years to get new functionality into a software release, whereas in the cloud new capabilities could be introduced in weeks, without customers having to change a single line of code in their consuming applications. The Redshift announcement was an inflection point; I got really interested in the cloud, and cloud data warehouses, and eventually joined Amazon [Ippokratis joined the Redshift team as a principal engineer in Oct. 2015].

  2. Q. 

    How has Amazon Redshift evolved over the past decade since the launch nearly 10 years ago?

    A. 

    Ippokratis: As we highlight in the paper, the service has evolved at a rapid pace in response to customers’ needs. We focused on four main areas: 1) customers’ demand for high-performance execution of increasingly complex analytical queries; 2) our customers’ need to process more data and significantly increase the number of users who need to derive insights from that data; 3) customers’ need for us to make the system easier to use; and 4) our customers’ desire to integrate Redshift with other AWS services, and the AWS ecosystem. That’s a lot, so we’ll provide some examples across each dimension.

    Related publication
    Enterprise companies use spatial data for decision optimization and gain new insights regarding the locality of their business and services. Industries rely on efficiently combining spatial and business data from different sources, such as data warehouses, geospatial information systems, transactional systems, and data lakes, where spatial data can be found in structured or unstructured form. In this demonstration

    Offering the leading price performance has been our primary focus since Rahul first began working on what would become Redshift. From the beginning, the team has focused on making core query execution latency as low as possible so customers can run more workloads, issue more jobs into the system, and run their daily analysis. To do this, Redshift generates C++ code that is highly optimized and then sends it to the distributor in the parallel database and executes this highly optimized code. This makes Redshift unique in the way it executes queries, and it has always been the core of the service.

    We have never stopped innovating here to deliver our customers the best possible performance. Another thing that’s been interesting to me is that in the traditional business intelligence (BI) world, you optimize your system for very long-running jobs. But as we observe the behavior of our customers in aggregate, what’s surprising is that 90 percent of our queries among the billions we run daily in our service execute in less than one second. That’s not what people had traditionally expected from a data warehouse, and that has changed the areas of the code that we optimize.

    Rahul: As Ippokratis mentioned, the second area we focused on in the paper was customers’ need to process more data and to use that data to drive value throughout the organization. Analytics has always been super important, but eight or ten years ago it wasn’t necessarily mission critical for customers in the same way transactional databases were. That has definitely shifted. Today, core business processes rely on Redshift being highly available and performant. The biggest architectural change in the past decade in support of this goal was the introduction of Redshift Managed Storage, which allowed us to separate compute and storage, and focus a lot of innovation in each area.

    Diagram of the Redshift Managed Storage
    The Redshift managed storage layer (RMS) is designed for a durability of 99.999999999% and 99.99% availability over a given year, across multiple availability zones. RMS manages both user data as well as transaction metadata.

    Another big trend has been the desire of customers to query across and integrate disparate datasets. Redshift was the first data warehouse in the cloud to query Amazon S3 data, that was with Redshift Spectrum in 2017. Then we demonstrated the ability to run a query that scanned an exabyte of data in S3 as well as data in the cluster. That was a game changer.

    Customers like NASDAQ have used this extensively to query data that’s on local disk for the highest performance, but also take advantage of Redshift’s ability to integrate with the data lake and query their entire history of data with high performance. In addition to querying the data lake, integrated querying of transactional data stores like Aurora and RDS has been another big innovation, so customers can really have a high-performance analytics system that’s capable of transparently querying all of the data that matters to them without having to manage these complex integration processes that other systems require.

    Illustration of how a query flows through Redshift.
    This diagram from the research paper illustrates how a query flows through Redshift. The sequence is described in detail on pages 2 and 3 of the paper.

    Ippokratis: The third area we focused on in the paper was ease of use. One change that stands out for me is that on-premises data warehousing required IT departments to have a DBA (data base administrator) who would be responsible for maintaining the environment. Over the past decade, the expectation from customers has evolved. Now, if you are offering data warehousing as a service, the systems must be capable of auto tuning, auto healing, and auto optimizing. This has become a big area of focus for us where we incorporate machine learning and automation into the system to make it easier to use, and to reduce the amount of involvement required of administrators.

    Rahul: In terms of ease of use, three innovations come to mind. One is concurrency scaling. Similar to workload management, customers would previously have to manually tweak concurrency or reset clusters of the manually split workloads. Now, the system automatically provisions new resources and scales up and down without customers having to take any action. This is a great example of how Redshift has gotten much more dynamic and elastic.

    The second ease of use innovation is automated table optimization. This is another place where the system is able to observe workloads and data layouts and automatically suggest how data should be sorted and distributed across nodes in the cluster. This is great because it’s a continuously learning system so workloads are never static in time.

    Related publication
    How should we split data among the nodes of a distributed data warehouse in order to boost performance for a forecasted workload? In this paper, we study the effect of different data partitioning schemes on the overall network cost of pairwise joins. We describe a generally-applicable data distribution framework initially designed for Amazon Redshift, a fully-managed petabyte-scale data warehouse in the

    Customers are always adding more datasets, and adding more users, so what was optimal yesterday might not be optimal tomorrow. Redshift observes this and modifies what's happening under the covers to balance that. This was the focus of a really interesting graph optimization paper that we wrote a few years ago about how to analyze for optimal distribution keys for how data is laid out within a multi-node parallel-processing system. We've coupled this with automated optimization and then table encoding. In an analytics system, how you compress data has a big impact because the less data you scan, the faster your queries go. Customers had to reason about this in the past. Now Redshift can automatically determine how to encode data correctly to deliver the best possible performance for the data and the workload.

    The third innovation I want to highlight here is Amazon Redshift Serverless, which we launched in public preview at re:Invent last fall. Redshift Serverless removes all of the management of instances and clusters, so customers can focus on getting to insights from data faster and not spend time managing infrastructure. With Redshift Serverless, customers can simply provision an endpoint and begin to interact with their data, and Redshift Serverless will auto scale and automatically manage the system to essentially remove all of that complexity from customers.

    Customers can just focus on their data, set limits to manage their budgets, and we deliver optimal performance between those limits. This is another massive step forward in terms of ease of use because it eliminates any operations for customers. The early response to the preview has been tremendous. Thousands of customers have been excited to put Amazon Redshift Serverless through its paces over the past few months, and we’re excited about making it generally available in the near future.

    Amazon Redshift architecture diagram
    The Amazon Redshift architecture as presented in the research paper.

    Ippokratis: A fourth area of focus in the paper is on integration with other AWS services, and the AWS ecosystem. Integration is another area where customer behavior has evolved from traditional BI use cases. Today, cloud data warehouses are a central hub with tight integration with a broader set of AWS services. We provided the ability for customers to join data from the warehouse with the data lake. Then customers said they needed access to high-velocity business data in operational databases like Aurora and RDS, so we provided access to these operational data stores. Then we added support for streams, as well as integration with SageMaker and Lambda so customers can run machine learning training and inference without moving their data, and do generic compute. As a result, we’ve converted the traditional BI system into a well-integrated set of AWS services.

    Rahul: One big area of integration has been with our machine-learning ecosystem. With Redshift ML we have enabled anyone who knows SQL to take advantage of all of our machine-learning innovation. We built the ability to create a model from the SQL prompt, which gets the data into Amazon S3 and calls Amazon SageMaker, to use automated machine learning to build the most appropriate model to provide predictions on the data.

    This model is compiled efficiently and brought back into the data warehouse for customers to run very high-performance parallel inferences with no additional compute or no extra cost. The beauty of this integration is that every innovation we make within SageMaker means that Redshift ML gets better as well. This is just another means by which customers benefit from us connecting our services together.

    Related content
    Amazon researchers describe new method for distributing database tables across servers.

    Another big area for integration has been data sharing. Once we separated storage and compute layers with RA3 instances, we could enable data sharing, giving customers the ability to share data with clusters in the same account, and other accounts, or across regions. This allows us to separate consumers from producers of data, which enables things like modern data mesh architectures. Customers can share data without data copying, so they are transactionally consistent across accounts.

    For example, users within a data-science organization can securely work from the shared data, as can users within the reporting or marketing organization. We’ve also integrated data sharing with AWS Data Exchange, so now customers can search for — and subscribe to — third-party datasets that are live, up to date, and can be queried immediately in Redshift. This has been another game changer from the perspective of setting data free, enabling data monetization for third-party providers, and secure and live data access and licensing for subscribers for high-performance analytics within and across organizations. The fact that Redshift is part of an incredibly rich data ecosystem is a huge win for customers, and in keeping with customers’ desire to make data more pervasively available across the company.

  3. Q. 

    You indicate in the paper that Redshift innovation is continuing at an accelerated pace.  How do you see the cloud data warehouse segment evolving – and more specifically Redshift – over the next several years?

    A. 

    Rahul: A few things will continue to be true as we head into the future. Customers will be generating ever more amounts of data, and they’re going to want to analyze that data more cost effectively. Data volumes are growing exponentially, but obviously customers don't want their costs growing exponentially. This requires that we continue to innovate, and find new levels of performance to ensure that the cost of processing a unit of data continues to go down.

    We’ll continue innovating in software, in hardware, in silicon, and in using machine learning to make sure we deliver on that promise for customers. We’ve delivered on that promise for the past 10 years, and we’ll focus on making sure we deliver on that promise into the future.

    I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.
    Ippokratis Pandis

    Also, customers are always going to want better availability, they’re always going to want their data to be secure, and they’re always going to want more integrations with more data sources, and we intend to continue to deliver on all of those. What will stay the same is our ability to offer the-best in-segment price performance and capabilities, and the best-in-segment integration and security because they will always deliver value for customers.

    Ippokratis: It has been an incredible journey; we have been rebuilding the plane as we’ve been flying it with customers onboard, and this would not have happened without the support of AWS leadership, but most importantly the tremendous engineers, managers, and product people who have worked on the team.

    As we did in the paper, I want to recognize the contributions of Nate Binkert and Britt Johnson, who have passed, but whose words of wisdom continue to guide us. We’ve taken data warehousing, what we learned from books in school (Ippokratis earned his PhD in electrical and computer engineering from Carnegie Mellon University) and brought it to the cloud. In the process, we’ve been able to innovate, and write new pages in the book. I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.

Research areas

Related content

AU, VIC, Melbourne
Are you excited about leveraging state-of-the-art Computer Vision algorithms and large datasets to solve real-world problems? Join Amazon as an Applied Scientist Intern and be at the forefront of AI innovation! As an Applied Scientist Intern, you'll work in a fast-paced, cross-disciplinary team of pioneering researchers. You'll tackle complex problems, developing solutions that either build on existing academic and industrial research or stem from your own innovative thinking. Your work may even find its way into customer-facing products, making a real-world impact. Please note: This internship is a duration of 6 months full time with a start date in Jan-March 2027. The successful intern is required to be based in Melbourne and relocation allowance will be provided if you are based outside of Melbourne. Key job responsibilities - Develop novel solutions and build prototypes - Work on complex problems in Computer Vision and Machine Learning - Contribute to research that could significantly impact Amazon's operations - Collaborate with a diverse team of experts in a fast-paced environment - Collaborate with scientists on writing and submitting papers to Tier-1 conferences (e.g., CVPR, ICCV, NeurIPS, ICML) - Present your research findings to both technical and non-technical audiences Key Opportunities - Collaborate with leading machine learning researchers - Access Amazon tools and hardware (large GPU clusters) - Address challenges at an unparalleled scale - Become a disruptor, innovator, and problem solver in the field of computer vision - Potentially deliver solutions to production in customer-facing applications - Opportunities to become an FTE after the internship Join us in shaping the future of AI at Amazon. Apply now and turn your research into real-world solutions!
IN, KA, Bengaluru
Are you passionate about solving complex business problems at scale through Generative AI? Do you want to help build intelligent systems that reason, act, and learn from minimal supervision? If so, we have an exciting opportunity for you on Amazon's Trustworthy Shopping Experience (TSE) team. At TSE, our vision is to guarantee customers a worry-free shopping experience by earning their trust that the products they buy are safe, authentic, and compliant with regulations and policy. We do this in close partnership with our selling partners, empowering them with best-in-class tools and expertise to offer a high-quality, compliant selection that customers trust. As an Applied Scientist I, you will bring subject matter expertise in at least one relevant discipline (e.g., NLP, computer vision, representation learning, agentic architecture) to contribute to next-generation agentic AI solutions that automate complex manual investigation processes at Amazon scale. Working alongside senior scientists, you will map business goals—such as reducing cost-of-serving while maintaining trust and safety standards—to well-defined scientific problems and metrics. You will invent, refine, and experiment with solutions spanning agentic reasoning, self-supervised representation learning, few-shot adaptation, multimodal understanding, and model compression. With guidance from senior scientists, you will stay current on research trends and benchmark your results against the state of the art. You will help design and execute experiments to identify optimal solutions, initiating the development and implementation of small components with team guidance. You will write secure, stable, testable, and well-documented production code at the level of an SDE I, rigorously evaluating models and quantifying performance. You will handle data in accordance with Amazon policies, troubleshoot issues to root cause, and ensure your work does not put the company at risk. Your scope of influence will typically be at the self-level, with the possibility of mentoring interns. You will participate in team design and prioritization discussions, learn the business context behind TSE's products, and escalate problems with proposed solutions. You will publish internal technical reports and may contribute to peer-reviewed publications and external review activities when aligned with business needs. This role offers a unique opportunity to contribute to end-to-end AI development—from research through production—with your contributions serving hundreds of millions of customers within months, not years. Key job responsibilities • Contribute to the design and development of agentic AI systems with multi-step reasoning, autonomous task execution, and multimodal intelligence, including feedback and memory mechanisms, leveraging reinforcement learning techniques for agent decision-making and policy optimization, with input and guidance from senior scientists • Help productionize models built on top of SFT (Supervised Fine-tuning) and RFT (Reinforced Fine-tuning) approaches, as well as few-shot approaches based on multimodal datasets spanning text, images, and structured data, applying mathematical optimization techniques to improve efficiency, resource allocation, and decision-making in complex workflows, working alongside senior scientists to identify optimal solutions • Contribute to building production-ready deep learning and conventional ML solutions, including multimodal fusion and cross-modal alignment techniques that seamlessly connect visual, textual, and relational understanding, to support automation requirements within your team's scope • Help identify customer and business problems; use reasonable assumptions, data, and customer requirements to solve well-defined scientific problems involving multimodal inputs such as unstructured text, documents, product images, and relational data, developing representations that capture complementary signals across modalities and mapping business goals to scientific metrics • May co-author research papers for peer-reviewed internal and/or external venues, including contributions in areas such as multimodal representation learning and vision-language modeling, and contribute to the wider scientific community by reviewing research submissions, when aligned with business needs • Prototype rapidly, iterate based on feedback, and deliver small components at SDE I level—including multimodal data pipelines and inference modules—that integrate into production-scale systems • Write secure, stable, testable, maintainable, and well-documented code, balancing model capability, deployment cost, and resource usage across multimodal architectures while understanding state-of-the-art data structures, algorithms, and performance tradeoffs • Rigorously test code and evaluate models across individual and combined modalities, quantifying their performance; troubleshoot issues, research root causes, and thoroughly resolve defects, leaving systems more maintainable • Participate in team design, scoping, and prioritization discussions through clear verbal and written communication; seek to learn the business context, science, and engineering behind your team's products, including how multimodal signals contribute to trust and safety decisions • Participate in engineering best practices with peer reviews; clearly document approaches and communicate design decisions; publish internal technical reports to institutionalize scientific learning • Help train and mentor scientist interns; identify and escalate problems with proposed solutions, taking ownership or ensuring clear hand-off to the right owner About the team Trustworthy Shopping Experience Product team in TSE is responsible for the human-in-the-loop products and technology used in the risk investigations at Amazon. The team is also responsible for reducing the cost of performing the investigations, by automating wherever possible and optimizing the experience where manual interventions are needed. The team leverages state-of-the art technology and GenAI to deliver the products and associated goals.
US, WA, Redmond
We are searching for a talented candidate with expertise in orbital mechanics and spaceflight navigation, including LEO Satellite Orbit Determination. This position requires experience in simulation and analysis of spacecraft orbital mechanics and sequential orbit determination methods, including Extended Kalman Filters (EKF) and/or Unscented Kalman Filter (UKF). Strong analysis skills are required to develop engineering studies of complex large-scale dynamical systems. This position requires demonstrated expertise in computational analysis automation and tool development. Key job responsibilities - Perform spacecraft maneuver or navigation analysis in support of multi-disciplinary trades within the Amazon Leo team. - Contribute to prototype software development of flight algorithms. - Test and assess navigation software for integration into flight systems. - Assess and trouble-shoot the performance of Leo on-board GNSS hardware and software systems. - Work closely with GNC engineers to manage on-orbit performance and develop flight dynamics operations processes. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. A day in the life - Interacting with GNC teams to evaluate and troubleshoot satellite issues. - Working within the Flight Dynamics Research team to prioritize tasks. - Performing analysis, simulation, testing and documentation to address assigned tasks.
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is looking to hire a Research Scientist with experience in semiconductor process development who will aid in AWS’s effort to bring cloud quantum computing services to its worldwide customer base. You will join a multi-disciplinary team of scientists, and hardware and software engineers working at the forefront of quantum computing. Through your work inside and outside of the cleanroom environment in the fabrication research and development group, you will solve problems related to developing next-generation quantum processors. Candidates must have a demonstrated background in sound scientific and engineering principles, and must have excellent data analysis, bias for action, problem solving, and communication skills, and be highly motivated and curious to research and learn new technical topics as needed. As a research scientist you will be expected to work on new ideas and stay abreast of novel approaches in fabricating and packaging superconducting quantum processors. Working effectively within a team environment is critical. Key job responsibilities Responsibilities include developing novel processes to fabricate high-coherence superconducting qubits; developing advanced 3DI interconnect and routing technologies for integrating superconducting quantum technologies; analyzing inline metrology and electrical test data; writing production standard operating procedures to transfer newly-developed processes to production teams; interacting with project leads to provide feedback that continuously improves different processes. A day in the life The candidate will develop novel technologies using micro-/nano-fabrication techniques inside the cleanroom (independently or in collaboration with other scientists and engineers) for next-generation quantum computing. Outside the cleanroom, the candidate will plan experiments, analyze data, and conceive future innovations. About the team AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
IN, KA, Bengaluru
The Trust CX Innovations team is looking for an Applied Scientist with strong background in Generative AI space to build solutions that help in upholding customer trust for Alexa+. As an Applied Scientist in Trust CX innovations, you will be at the forefront of developing innovative solutions to critical challenges in AI trust and privacy. You'll lead research in trust-preserving machine learning techniques. We are working on revolutionizing the way Amazonians work and collaborate. You will help us achieve new heights of productivity through the power of advanced generative AI technologies. Key job responsibilities - Lead research initiatives in generative AI, focusing on LLMs, multimodal models, and frontier AI capabilities - Develop innovative approaches for model optimization, including prompt engineering, few-shot learning, and efficient fine-tuning - Pioneer new methods for AI safety, alignment, and responsible AI development - Design and execute sophisticated experiments to evaluate model performance and behavior - Lead the development of production-ready AI solutions that scale efficiently - Collaborate with product teams to translate research innovations into practical applications - Guide engineering teams in implementing AI models and systems at scale - Author technical papers for top-tier conferences - File patents for novel AI technologies and applications A day in the life You will be working with a group of talented scientists on researching algorithm and running experiments to test scientific proposal/solutions to improve our trust-preserving experiences. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, and model development. You work closely with partner teams across Alexa to deliver platform features that require cross-team leadership. About the team Who We Are: Trust CX Innovations is a strategic innovation team within Amazon Devices & Services that focuses on advancing AI technology while prioritizing customer trust and experience. Our team operates at the intersection of artificial intelligence, privacy engineering and customer-centric design. Our Mission: To pioneer trustworthy AI innovations that delight customers while setting new standards for privacy and responsible technology development. We aim to transform how Amazon builds AI products by creating solutions that balance innovation with customer trust.
AU, NSW, Sydney
AWS Networking operates one of the largest and most complex networks on the planet. The team you'd join is responsible for the availability of that network — measuring how it performs for customers, predicting where it is most likely to degrade, and reshaping how we operate it as the workload grows. We are in the middle of a significant change in how network operations are run. Lessons from our recent work on automation, AI, and ML — including agentic systems that triage and mitigate incidents alongside engineers — are feeding into a broader rethink of where humans focus, where automation takes over, and how we measure whether either is working. We are looking for a Data Scientist to join the team in Sydney to drive the data science strategy behind that change. You will define the metrics that matter, own the evidence the team uses to make decisions, and measure whether each decision delivered the outcomes we expected. You'll be the data science voice on a team of senior network and software engineers — the person who decides what we measure, how we measure it, and what the numbers actually mean. Concretely, that means setting the analytical bar for the program, designing risk and reliability models against telemetry from millions of network devices, surfacing the patterns that drive customer-impact incidents, and turning that analysis into the dashboards and metrics our leaders use to set priorities. It also means owning the evaluations that tell us when a new piece of automation — including the agents we are rolling out to support engineers on the front line — is actually moving the needle on availability, and not just adding noise. If you are a scientist who wants to shape how a tier-one production network is run — using data to drive program strategy, not just to support it — at a scale no academic lab or startup can match, and you're at your best as the data science voice embedded in a team of engineers, this is the team for you. Key job responsibilities - Define and drive the data science strategy for the program — the metrics, the experiments, and what counts as evidence that a change worked - Lead the design and deployment of predictive risk and reliability models for network availability, using device failures, alarm telemetry, ticket data, and traffic signals - Own the evidence behind program decisions: where availability is at risk, where automation is ready to expand, where engineering effort has the highest leverage. Defend recommendations to senior technical and business audiences - Design and own the operational analytics and dashboards (Amazon QuickSight, Amazon CloudWatch, Python) used by senior leadership to track network health and the impact of operational change - Design and run experiments to evaluate the automation we are rolling out — including agentic systems supporting engineers on incidents — measuring whether each rollout improved availability - Drive data quality and classification improvements — event categorisation, root-cause attribution — so the program's metrics rest on solid ground - Build and own event-driven scoring pipelines (Python, SQL, AWS Lambda, Amazon S3, Amazon Athena) that keep the decide / measure / improve loop running - Bring statistical rigour to the engineers you partner with — review experiment designs, push back on unsupported assumptions, and raise the bar on how the team uses evidence A day in the life You might start the morning defining how the team will measure a new initiative — the success metrics, the counterfactual, the bar for calling it a win. By mid-morning you're with the engineering team turning a proposal into a decision: walking through trade-offs, pushing back where the data doesn't support an assumption. The afternoon is outcome measurement — refining the evaluation pipeline that tracks last week's rollout, updating the CloudWatch dashboard senior leadership uses to gate the next expansion, and prepping the data for an upcoming Director review. About the team We sit inside AWS Networking with a strong Sydney presence and a remit that spans network availability, the data and analytics that support it, and the automation we are building to change how operations are done. You'd be the data science voice in a small, senior team of network and software engineers in Sydney, partnering with the broader network engineering organisation across Seattle and Dublin. Small team, high autonomy, direct line to senior leadership, and a roadmap with real production impact rather than research demos.
IN, KA, Bengaluru
Interested to build the next generation Financial systems that can handle billions of dollars in transactions? Interested to build highly scalable next generation systems that could utilize Amazon Cloud? Massive data volume + complex business rules in a highly distributed and service oriented architecture, a world class information collection and delivery challenge. Our challenge is to deliver the software systems which accurately capture, process, and report on the huge volume of financial transactions that are generated each day as millions of customers make purchases, as thousands of Vendors and Partners are paid, as inventory moves in and out of warehouses, as commissions are calculated, and as taxes are collected in hundreds of jurisdictions worldwide. Key job responsibilities • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. A day in the life • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. About the team The FinAuto TFAW(theft, fraud, abuse, waste) team is part of FGBS Org and focuses on building applications utilizing machine learning models to identify and prevent theft, fraud, abusive and wasteful(TFAW) financial transactions across Amazon. Our mission is to prevent every single TFAW transaction. As a Machine Learning Scientist in the team, you will be driving the TFAW Sciences roadmap, conduct research to develop state-of-the-art solutions through a combination of data mining, statistical and machine learning techniques, and coordinate with Engineering team to put these models into production. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques Key job responsibilities Use machine learning and analytical techniques to create scalable solutions for business problems Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes Design, develop, evaluate and deploy, innovative and highly scalable ML models Work closely with software engineering teams to drive real-time model implementations Work closely with business partners to identify problems and propose machine learning solutions Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model maintenance Work proactively with engineering teams and product managers to evangelize new algorithms and drive the implementation of large-scale complex ML models in production Leading projects and mentoring other scientists, engineers in the use of ML techniques About the team International Machine Learning Team is responsible for building novel ML solutions that attack India first (and other Emerging Markets across MENA and LatAm) problems and impact the bottom-line and top-line of India business. Learn more about our team from https://www.amazon.science/working-at-amazon/how-rajeev-rastogis-machine-learning-team-in-india-develops-innovations-for-customers-worldwide
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the International Emerging Stores organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team Central Machine Learning team works closely with the IES business and engineering teams in building ML solutions that create an impact for Emerging Marketplaces. This is a great opportunity to leverage your machine learning and data mining skills to create a direct impact on millions of consumers and end users.
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced ML systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real-world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning team for India Consumer Businesses. Machine Learning, Big Data and related quantitative sciences have been strategic to Amazon from the early years. Amazon has been a pioneer in areas such as recommendation engines, ecommerce fraud detection and large-scale optimization of fulfillment center operations. As Amazon has rapidly grown and diversified, the opportunity for applying machine learning has exploded. We have a very broad collection of practical problems where machine learning systems can dramatically improve the customer experience, reduce cost, and drive speed and automation. These include product bundle recommendations for millions of products, safeguarding financial transactions across by building the risk models, improving catalog quality via extracting product attribute values from structured/unstructured data for millions of products, enhancing address quality by powering customer suggestions We are developing state-of-the-art machine learning solutions to accelerate the Amazon India growth story. Amazon India is an exciting place to be at for a machine learning practitioner. We have the eagerness of a fresh startup to absorb machine learning solutions, and the scale of a mature firm to help support their development at the same time. As part of the India Machine Learning team, you will get to work alongside brilliant minds motivated to solve real-world machine learning problems that make a difference to millions of our customers. We encourage thought leadership and blue ocean thinking in ML. Key job responsibilities Use machine learning and analytical techniques to create scalable solutions for business problems Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes Design, develop, evaluate and deploy, innovative and highly scalable ML models Work closely with software engineering teams to drive real-time model implementations Work closely with business partners to identify problems and propose machine learning solutions Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model maintenance Work proactively with engineering teams and product managers to evangelize new algorithms and drive the implementation of large-scale complex ML models in production Leading projects and mentoring other scientists, engineers in the use of ML techniques About the team International Machine Learning Team is responsible for building novel ML solutions that attack India first (and other Emerging Markets across MENA and LatAm) problems and impact the bottom-line and top-line of India business. Learn more about our team from https://www.amazon.science/working-at-amazon/how-rajeev-rastogis-machine-learning-team-in-india-develops-innovations-for-customers-worldwide