Amazon Redshift re-invented research paper and photos of Rahul Pathak, vice president of analytics at AWS, and Ippokratis Pandis, AWS senior principal engineer
The "Amazon Redshift re-invented" research paper will be presented at a leading database conference next month. Two of the paper's authors, Rahul Pathak (top right), vice president of analytics at AWS, and Ippokratis Pandis (bottom right), an AWS senior principal engineer, discuss the origins of Redshift, how the system has evolved in the past decade, and where they see the service evolving in the years ahead.

Amazon Redshift: Ten years of continuous reinvention

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

Nearly ten years ago, in November 2012 at the first-ever Amazon Web Services (AWS) re:Invent, Andy Jassy, then AWS senior vice president, announced the preview of Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse. The service represented a significant leap forward from traditional on-premises data warehousing solutions, which were expensive, inflexible, and required significant human and capital resources to operate.

In a blog post on November 28, 2012, Werner Vogels, Amazon chief technical officer, highlighted the news: “Today, we are excited to announce the limited preview of Amazon Redshift, a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.”

Further in the post, Vogels added, “The result of our focus on performance has been dramatic. Amazon.com’s data warehouse team has been piloting Amazon Redshift and comparing it to their on-premise data warehouse for a range of representative queries against a two billion row data set. They saw speedups ranging from 10x – 150x!”

That’s why, on the day of the announcement, Rahul Pathak, then a senior product manager, and the entire Amazon Redshift team were confident the product would be popular.

“But we didn’t really understand how popular,” he recalls.

“At preview we asked customers to sign up and give us some indication of their data volume and workloads,” Pathak, now vice president of Relational Engines at AWS, said. “Within about three days we realized that we had ten times more demand for Redshift than we had planned for the entire first year of the service. So we scrambled right after re:Invent to accelerate our hardware orders to ensure we had enough capacity on the ground for when the product became generally available in early 2013. If we hadn’t done that preview, we would have been caught short.”

The Redshift team has been sprinting to keep apace of customer demand ever since. Today, the service is used by tens of thousands of customers to process exabytes of data daily. In June a subset of the team will present the paper “Amazon Redshift re-invented ” at a leading international forum for database researchers, practitioners, and developers, the ACM SIGMOD/PODS Conference in Philadelphia.

Related content
Amazon DynamoDB was introduced 10 years ago today; one of its key contributors reflects on its origins, and discusses the 'never-ending journey' to make DynamoDB more secure, more available and more performant.

The paper highlights four key areas where Amazon Redshift has evolved in the past decade, provides an overview of the system architecture, describes its high-performance transactional storage and compute layers, details how smart autonomics are provided, and discusses how AWS and Redshift make it easy for customers to use the best set of services to meet their needs.

Amazon Science recently connected with two of the paper’s authors, Pathak, and Ippokratis Pandis, an AWS senior principal engineer, to discuss the origins of Redshift, how the system has evolved over the past decade, and where they see the service evolving in the years ahead.

  1. Q. 

    Can you provide some background on the origin story for Redshift? What were customers seeking, and how did the initial version address those needs?

    A. 

    Rahul: We had been meeting with customers who in the years leading up to the launch of Amazon Redshift had moved just about every workload they had to the cloud except for their data warehouse. In many cases, it was the last thing they were running on premises, and they were still dealing with all of the challenges of on-premises data warehouses. They were expensive, had punitive licensing, were hard to scale, and customers couldn’t analyze all of their data. Customers told us they wanted to run data warehousing at scale in the cloud, that they didn’t want to compromise on performance or functionality, and that it had to be cost-effective enough for them to analyze all of their data.

    So, this is what we started to build, operating under the code name Cookie Monster. This was at a time when customers’ data volumes were exploding, and not just from relational databases, but from a wide variety of sources. One of our early private beta customers tried it and the results came back so fast they thought the system was broken. It was about 10 to 20 times faster than what they had been using before. Another early customer was pretty unhappy with gaps in our early functionality. When I heard about their challenges, I got in touch, understood their feedback, and incorporated it into the service before we made it generally available in February 2013. This customer soon turned into one of our biggest advocates.

    When we launched the service and announced our pricing at $1000 a terabyte per year, people just couldn’t believe we could offer a product with that much capability at such a low price point. The fact that you could provision a data warehouse in minutes instead of months also caught everyone’s attention. It was a real game-changer for this industry segment.

    Ippokratis: I was at IBM Research at the time working on database technologies there, and we recognized that providing data warehousing as a cloud service was a game changer. It was disruptive. We were working with customers’ on-premises systems where it would take us several days or weeks to resolve an issue, whereas with a cloud data warehouse like Redshift, it would take minutes. It was also apparent that the rate of innovation would accelerate in the cloud.

    In the on-premises world, it was taking months if not years to get new functionality into a software release, whereas in the cloud new capabilities could be introduced in weeks, without customers having to change a single line of code in their consuming applications. The Redshift announcement was an inflection point; I got really interested in the cloud, and cloud data warehouses, and eventually joined Amazon [Ippokratis joined the Redshift team as a principal engineer in Oct. 2015].

  2. Q. 

    How has Amazon Redshift evolved over the past decade since the launch nearly 10 years ago?

    A. 

    Ippokratis: As we highlight in the paper, the service has evolved at a rapid pace in response to customers’ needs. We focused on four main areas: 1) customers’ demand for high-performance execution of increasingly complex analytical queries; 2) our customers’ need to process more data and significantly increase the number of users who need to derive insights from that data; 3) customers’ need for us to make the system easier to use; and 4) our customers’ desire to integrate Redshift with other AWS services, and the AWS ecosystem. That’s a lot, so we’ll provide some examples across each dimension.

    Related publication
    Enterprise companies use spatial data for decision optimization and gain new insights regarding the locality of their business and services. Industries rely on efficiently combining spatial and business data from different sources, such as data warehouses, geospatial information systems, transactional systems, and data lakes, where spatial data can be found in structured or unstructured form. In this demonstration

    Offering the leading price performance has been our primary focus since Rahul first began working on what would become Redshift. From the beginning, the team has focused on making core query execution latency as low as possible so customers can run more workloads, issue more jobs into the system, and run their daily analysis. To do this, Redshift generates C++ code that is highly optimized and then sends it to the distributor in the parallel database and executes this highly optimized code. This makes Redshift unique in the way it executes queries, and it has always been the core of the service.

    We have never stopped innovating here to deliver our customers the best possible performance. Another thing that’s been interesting to me is that in the traditional business intelligence (BI) world, you optimize your system for very long-running jobs. But as we observe the behavior of our customers in aggregate, what’s surprising is that 90 percent of our queries among the billions we run daily in our service execute in less than one second. That’s not what people had traditionally expected from a data warehouse, and that has changed the areas of the code that we optimize.

    Rahul: As Ippokratis mentioned, the second area we focused on in the paper was customers’ need to process more data and to use that data to drive value throughout the organization. Analytics has always been super important, but eight or ten years ago it wasn’t necessarily mission critical for customers in the same way transactional databases were. That has definitely shifted. Today, core business processes rely on Redshift being highly available and performant. The biggest architectural change in the past decade in support of this goal was the introduction of Redshift Managed Storage, which allowed us to separate compute and storage, and focus a lot of innovation in each area.

    Diagram of the Redshift Managed Storage
    The Redshift managed storage layer (RMS) is designed for a durability of 99.999999999% and 99.99% availability over a given year, across multiple availability zones. RMS manages both user data as well as transaction metadata.

    Another big trend has been the desire of customers to query across and integrate disparate datasets. Redshift was the first data warehouse in the cloud to query Amazon S3 data, that was with Redshift Spectrum in 2017. Then we demonstrated the ability to run a query that scanned an exabyte of data in S3 as well as data in the cluster. That was a game changer.

    Customers like NASDAQ have used this extensively to query data that’s on local disk for the highest performance, but also take advantage of Redshift’s ability to integrate with the data lake and query their entire history of data with high performance. In addition to querying the data lake, integrated querying of transactional data stores like Aurora and RDS has been another big innovation, so customers can really have a high-performance analytics system that’s capable of transparently querying all of the data that matters to them without having to manage these complex integration processes that other systems require.

    Illustration of how a query flows through Redshift.
    This diagram from the research paper illustrates how a query flows through Redshift. The sequence is described in detail on pages 2 and 3 of the paper.

    Ippokratis: The third area we focused on in the paper was ease of use. One change that stands out for me is that on-premises data warehousing required IT departments to have a DBA (data base administrator) who would be responsible for maintaining the environment. Over the past decade, the expectation from customers has evolved. Now, if you are offering data warehousing as a service, the systems must be capable of auto tuning, auto healing, and auto optimizing. This has become a big area of focus for us where we incorporate machine learning and automation into the system to make it easier to use, and to reduce the amount of involvement required of administrators.

    Rahul: In terms of ease of use, three innovations come to mind. One is concurrency scaling. Similar to workload management, customers would previously have to manually tweak concurrency or reset clusters of the manually split workloads. Now, the system automatically provisions new resources and scales up and down without customers having to take any action. This is a great example of how Redshift has gotten much more dynamic and elastic.

    The second ease of use innovation is automated table optimization. This is another place where the system is able to observe workloads and data layouts and automatically suggest how data should be sorted and distributed across nodes in the cluster. This is great because it’s a continuously learning system so workloads are never static in time.

    Related publication
    How should we split data among the nodes of a distributed data warehouse in order to boost performance for a forecasted workload? In this paper, we study the effect of different data partitioning schemes on the overall network cost of pairwise joins. We describe a generally-applicable data distribution framework initially designed for Amazon Redshift, a fully-managed petabyte-scale data warehouse in the

    Customers are always adding more datasets, and adding more users, so what was optimal yesterday might not be optimal tomorrow. Redshift observes this and modifies what's happening under the covers to balance that. This was the focus of a really interesting graph optimization paper that we wrote a few years ago about how to analyze for optimal distribution keys for how data is laid out within a multi-node parallel-processing system. We've coupled this with automated optimization and then table encoding. In an analytics system, how you compress data has a big impact because the less data you scan, the faster your queries go. Customers had to reason about this in the past. Now Redshift can automatically determine how to encode data correctly to deliver the best possible performance for the data and the workload.

    The third innovation I want to highlight here is Amazon Redshift Serverless, which we launched in public preview at re:Invent last fall. Redshift Serverless removes all of the management of instances and clusters, so customers can focus on getting to insights from data faster and not spend time managing infrastructure. With Redshift Serverless, customers can simply provision an endpoint and begin to interact with their data, and Redshift Serverless will auto scale and automatically manage the system to essentially remove all of that complexity from customers.

    Customers can just focus on their data, set limits to manage their budgets, and we deliver optimal performance between those limits. This is another massive step forward in terms of ease of use because it eliminates any operations for customers. The early response to the preview has been tremendous. Thousands of customers have been excited to put Amazon Redshift Serverless through its paces over the past few months, and we’re excited about making it generally available in the near future.

    Amazon Redshift architecture diagram
    The Amazon Redshift architecture as presented in the research paper.

    Ippokratis: A fourth area of focus in the paper is on integration with other AWS services, and the AWS ecosystem. Integration is another area where customer behavior has evolved from traditional BI use cases. Today, cloud data warehouses are a central hub with tight integration with a broader set of AWS services. We provided the ability for customers to join data from the warehouse with the data lake. Then customers said they needed access to high-velocity business data in operational databases like Aurora and RDS, so we provided access to these operational data stores. Then we added support for streams, as well as integration with SageMaker and Lambda so customers can run machine learning training and inference without moving their data, and do generic compute. As a result, we’ve converted the traditional BI system into a well-integrated set of AWS services.

    Rahul: One big area of integration has been with our machine-learning ecosystem. With Redshift ML we have enabled anyone who knows SQL to take advantage of all of our machine-learning innovation. We built the ability to create a model from the SQL prompt, which gets the data into Amazon S3 and calls Amazon SageMaker, to use automated machine learning to build the most appropriate model to provide predictions on the data.

    This model is compiled efficiently and brought back into the data warehouse for customers to run very high-performance parallel inferences with no additional compute or no extra cost. The beauty of this integration is that every innovation we make within SageMaker means that Redshift ML gets better as well. This is just another means by which customers benefit from us connecting our services together.

    Related content
    Amazon researchers describe new method for distributing database tables across servers.

    Another big area for integration has been data sharing. Once we separated storage and compute layers with RA3 instances, we could enable data sharing, giving customers the ability to share data with clusters in the same account, and other accounts, or across regions. This allows us to separate consumers from producers of data, which enables things like modern data mesh architectures. Customers can share data without data copying, so they are transactionally consistent across accounts.

    For example, users within a data-science organization can securely work from the shared data, as can users within the reporting or marketing organization. We’ve also integrated data sharing with AWS Data Exchange, so now customers can search for — and subscribe to — third-party datasets that are live, up to date, and can be queried immediately in Redshift. This has been another game changer from the perspective of setting data free, enabling data monetization for third-party providers, and secure and live data access and licensing for subscribers for high-performance analytics within and across organizations. The fact that Redshift is part of an incredibly rich data ecosystem is a huge win for customers, and in keeping with customers’ desire to make data more pervasively available across the company.

  3. Q. 

    You indicate in the paper that Redshift innovation is continuing at an accelerated pace.  How do you see the cloud data warehouse segment evolving – and more specifically Redshift – over the next several years?

    A. 

    Rahul: A few things will continue to be true as we head into the future. Customers will be generating ever more amounts of data, and they’re going to want to analyze that data more cost effectively. Data volumes are growing exponentially, but obviously customers don't want their costs growing exponentially. This requires that we continue to innovate, and find new levels of performance to ensure that the cost of processing a unit of data continues to go down.

    We’ll continue innovating in software, in hardware, in silicon, and in using machine learning to make sure we deliver on that promise for customers. We’ve delivered on that promise for the past 10 years, and we’ll focus on making sure we deliver on that promise into the future.

    I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.
    Ippokratis Pandis

    Also, customers are always going to want better availability, they’re always going to want their data to be secure, and they’re always going to want more integrations with more data sources, and we intend to continue to deliver on all of those. What will stay the same is our ability to offer the-best in-segment price performance and capabilities, and the best-in-segment integration and security because they will always deliver value for customers.

    Ippokratis: It has been an incredible journey; we have been rebuilding the plane as we’ve been flying it with customers onboard, and this would not have happened without the support of AWS leadership, but most importantly the tremendous engineers, managers, and product people who have worked on the team.

    As we did in the paper, I want to recognize the contributions of Nate Binkert and Britt Johnson, who have passed, but whose words of wisdom continue to guide us. We’ve taken data warehousing, what we learned from books in school (Ippokratis earned his PhD in electrical and computer engineering from Carnegie Mellon University) and brought it to the cloud. In the process, we’ve been able to innovate, and write new pages in the book. I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.

Research areas

Related content

US, CA, San Diego
Do you want to join an innovative team of scientists who use deep learning, natural language processing, large language models to help Amazon provide the best seller experience across the entire Seller life cycle, including recruitment, growth, support and provide the best customer and seller experience by automatically mitigating risk? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer interactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Are you excited by the opportunity to leverage GenAI and innovate on top of the state-of-the-art large language models to improve customer and seller experience? Do you like to build end-to-end business solutions and directly impact the profitability of the company? Do you like to innovate and simplify processes? If yes, then you may be a great fit to join the Machine Learning Accelerator team in the Amazon Selling Partner Services (SPS) group. Key job responsibilities The scope of an Applied Scientist III in the Selling Partner Services (SPS) Machine Learning Accelerator (MLA) team is to research and prototype Machine Learning applications that solve strategic business problems across SPS domains. Additionally, the scientist collaborates with engineers and business partners to design and implement solutions at scale when they are determined to be of broad benefit to SPS organizations. They develop large-scale solutions for high impact projects, introduce tools and other techniques that can be used to solve problems from various perspectives, and show depth and competence in more than one area. They influence the team’s technical strategy by making insightful contributions to the team’s priorities, approach and planning. They develop and introduce tools and practices that streamline the work of the team, and they mentor junior team members and participate in hiring. We are open to hiring candidates to work out of one of the following locations: San Diego, CA, USA
US, WA, Seattle
Amazon is looking for a strategic, innovative science leader within the Global Talent and Compensation (GTMC) organization to lead an interdisciplinary team charged with developing data-driven solutions to model, automate, and inform high judgement decision making by bringing together science and technology in consumer grade internal talent products. GTMC delivers employee-focused experiences by providing scalable and responsive mechanisms for employees, as well as listening and signaling mechanisms for managers and leaders. They do this through intelligent, flexible, and extensible products and scalable data and science services. They set out to deliver a singular experience supporting multiple employee talent journeys (e.g., onboarding, evaluation, compensation, movement, promotion, exit), to generate and capture signals from product data, surface outliers, increase personalization, and improve the efficacy of “next best action” recommendations, for 1.6 million Amazonians around the world. In this role you will lead multiple research teams across the disciplines of Talent Management, Diversity Equity and Inclusion, and Compensation. You will interface with the most senior leaders at Amazon to develop and deliver on a strategic research roadmap that crosses all lines of Amazon businesses (e.g., Consumer, AWS, Devices, Advertising). This role will then partner with engineering and product management leader to deliver the outcomes of this research in production environments. Successful candidates will have an established background expertise in machine learning with some experience in applying this expertise to the fields of talent management, product management and/or software development. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
IN, KA, Bangalore
Are you interested in changing the Digital Reading Experience? We are from Kindle Books Team looking for a set of Scientists to take the reading experience in Kindle to next level with a set of innovations! We envision Kindle as the place where readers find the best manifestation of all written content optimized with features that enable them to get the most out of reading, and creators are able to realize their vision to customers quickly and at scale. Every time customers open their content, regardless of surface, they start or restart their reading in a familiar, useful and engaging place. We achieve this by building a strong foundation of core experiences and act as a force multiplier and partner for content creators (directly or indirectly) to easily innovate on top of Kindle's purpose built content experience stack in a simple and extensible way. We will achieve this by providing a best-in-class reading experience, unique content experiences, and remaining agile in meeting the evolving needs and preferences of our users. Our goal is to foster long-lasting reading habits and make us the preferred destination for enriching literary experiences. We are building a In The Book Science team and looking for Scientists, who are passionate about Reading and are willing to take Reading to the next level. Every Book is a complex structure with different entities, layout, format and semantics, with more than 17MM eBooks in our catalog. We are looking for experts in all domains like core NLP, Generative AI, CV and Deep Learning Techniques for unlocking capabilities like analysis, enhancement, curation, moderation, translation, transformation and generation in Books based on Content structure, features, Intent & Synthesis. Scientists will focus on Inside the book content and semantically learn the different entities to enhance the Reading experience overall (Kindle & beyond). They have an opportunity to influence in 2 major phases of life-cycle - Publishing (Creation of Books process) and Reading experience (building engaging features & representation in the book thereby driving reading engagement). Key job responsibilities - 5+ years of building machine learning models for business application experience - PhD, or Master's degree and 6+ years of applied research experience - Knowledge of programming languages such as C/C++, Python, Java or Perl - Experience programming in Java, C++, Python or related language - You have expertise in one of the applied science disciplines, such as machine learning, natural language processing, computer vision, Deep learning - You are able to use reasonable assumptions, data, and customer requirements to solve problems. - You initiate the design, development, execution, and implementation of smaller components with input and guidance from team members. - You work with SDEs to deliver solutions into production to benefit customers or an area of the business. - You assume responsibility for the code in your components. You write secure, stable, testable, maintainable code with minimal defects. - You understand basic data structures, algorithms, model evaluation techniques, performance, and optimality tradeoffs. - You follow engineering and scientific method best practices. You get your designs, models, and code reviewed. You test your code and models thoroughly - You participate in team design, scoping and prioritization discussions. You are able to map a business goal to a scientific problem and map business metrics to technical metrics. - You invent, refine and develop your solutions to ensure they are meeting customer needs and team goals. You keep current with research trends in your area of expertise and scrutinize your results. - Experience in mentoring junior scientists A day in the life You will be working with a group of talented scientists on researching algorithm and running experiments to test solutions to improve our experience. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, model development and productionizing the same. You will mentor other scientists, review and guide their work, help develop roadmaps for the team. We are open to hiring candidates to work out of one of the following locations: Banagalore, KA, IND | Bangalore, IND | Bangalore, KA, IND
IN, KA, Bangalore
Are you interested in changing the Digital Reading Experience? We are from Kindle Books Team looking for a set of Scientists to take the reading experience in Kindle to next level with a set of innovations! We envision Kindle as the place where readers find the best manifestation of all written content optimized with features that enable them to get the most out of reading, and creators are able to realize their vision to customers quickly and at scale. Every time customers open their content, regardless of surface, they start or restart their reading in a familiar, useful and engaging place. We achieve this by building a strong foundation of core experiences and act as a force multiplier and partner for content creators (directly or indirectly) to easily innovate on top of Kindle's purpose built content experience stack in a simple and extensible way. We will achieve this by providing a best-in-class reading experience, unique content experiences, and remaining agile in meeting the evolving needs and preferences of our users. Our goal is to foster long-lasting reading habits and make us the preferred destination for enriching literary experiences. We are building a In The Book Science team and looking for Scientists, who are passionate about Reading and are willing to take Reading to the next level. Every Book is a complex structure with different entities, layout, format and semantics, with more than 17MM eBooks in our catalog. We are looking for experts in all domains like core NLP, Generative AI, CV and Deep Learning Techniques for unlocking capabilities like analysis, enhancement, curation, moderation, translation, transformation and generation in Books based on Content structure, features, Intent & Synthesis. Scientists will focus on Inside the book content and semantically learn the different entities to enhance the Reading experience overall (Kindle & beyond). They have an opportunity to influence in 2 major phases of life-cycle - Publishing (Creation of Books process) and Reading experience (building engaging features & representation in the book thereby driving reading engagement). Key job responsibilities - 3+ years of building machine learning models for business application experience - PhD, or Master's degree and 2+ years of applied research experience - Knowledge of programming languages such as C/C++, Python, Java or Perl - Experience programming in Java, C++, Python or related language - You have expertise in one of the applied science disciplines, such as machine learning, natural language processing, computer vision, Deep learning - You are able to use reasonable assumptions, data, and customer requirements to solve problems. - You initiate the design, development, execution, and implementation of smaller components with input and guidance from team members. - You work with SDEs to deliver solutions into production to benefit customers or an area of the business. - You assume responsibility for the code in your components. You write secure, stable, testable, maintainable code with minimal defects. - You understand basic data structures, algorithms, model evaluation techniques, performance, and optimality tradeoffs. - You follow engineering and scientific method best practices. You get your designs, models, and code reviewed. You test your code and models thoroughly - You participate in team design, scoping and prioritization discussions. You are able to map a business goal to a scientific problem and map business metrics to technical metrics. - You invent, refine and develop your solutions to ensure they are meeting customer needs and team goals. You keep current with research trends in your area of expertise and scrutinize your results. A day in the life You will be working with a group of talented scientists on researching algorithm and running experiments to test solutions to improve our experience. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, model development and productionizing the same. You will mentor other scientists, review and guide their work, help develop roadmaps for the team. We are open to hiring candidates to work out of one of the following locations: Bangalore, IND | Bangalore, KA, IND
IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques We are open to hiring candidates to work out of one of the following locations: Bengaluru, KA, IND
IN, KA, Bengaluru
How to use the world’s richest collection of e-commerce data to improve payments experience for our customers? Amazon Payments Global Data Science team seeks a Senior Data Scientist for building analytical and scientific solutions that will address increasingly complex business questions in the Gift-Cards space. Amazon.com has a culture of data-driven decision-making and demands intelligence that is timely, accurate, and actionable. This team operates at WW level and provides a fast-paced environment where every day brings new challenges and opportunities. As a Senior Data Scientist in this team, you will be driving the Data Science/ML roadmap for business continuity & growth. You will develop statistical and machine learning models to solve for complex business problems in Gift-Cards space, design and run global experiments, and find new ways to optimize the customer experience. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards. You will explore GenAI use-cases within Gift-Cards space and also work on cross-disciplinary efforts with other scientists within Amazon. Key job responsibilities - You should be detail-oriented and must have an aptitude for solving unstructured and ambiguous problems. You should work in a self-directed environment, own tasks and drive them to completion - You should be passionate about working with huge data sets and be someone who loves to bring datasets together to answer business questions - You should demonstrate thorough technical expertise on feature engineering of massive datasets, exploratory data analysis, and model building using state-of-art ML algorithms - Random Forest, Gradient Boosting, SVM, Neural Nets, DL, Reinforcement Learning etc. You should be aware of automating feedback loops for algorithms in production - You should work closely with internal stakeholders like the business teams, engineering teams and partner teams and align them with respect to your focus areas - You should have excellent business and communication skills to be able to work with business owners to develop and define key business questions and build mechanisms that answer those questions We are open to hiring candidates to work out of one of the following locations: Bengaluru, KA, IND
US, NY, New York
The Automated Reasoning Group in AWS Platform is looking for an Applied Scientist with experience in building scalable solver solutions that delight customers. You will be part of a world-class team building the next generation of automated reasoning tools and services. AWS has the most services and more features within those services, than any other cloud provider–from infrastructure technologies like compute, storage, and databases–to emerging technologies, such as machine learning and artificial intelligence, data lakes and analytics, and Internet of Things. You will apply your knowledge to propose solutions, create software prototypes, and move prototypes into production systems using modern software development tools and methodologies. In addition, you will support and scale your solutions to meet the ever-growing demand of customer use. You will use your strong verbal and written communication skills, are self-driven and own the delivery of high quality results in a fast-paced environment. Each day, hundreds of thousands of developers make billions of transactions worldwide on AWS. They harness the power of the cloud to enable innovative applications, websites, and businesses. Using automated reasoning technology and mathematical proofs, AWS allows customers to answer questions about security, availability, durability, and functional correctness. We call this provable security, absolute assurance in security of the cloud and in the cloud. See https://aws.amazon.com/security/provable-security/ As an Applied Scientist in AWS Platform, you will play a pivotal role in shaping the definition, vision, design, roadmap and development of product features from beginning to end. You will: - Define and implement new solver applications that are scalable and efficient approaches to difficult problems - Apply software engineering best practices to ensure a high standard of quality for all team deliverables - Work in an agile, startup-like development environment, where you are always working on the most important stuff - Deliver high-quality scientific artifacts - Work with the team to define new interfaces that lower the barrier of adoption for automated reasoning solvers - Work with the team to help drive business decisions The AWS Platform is the glue that holds the AWS ecosystem together. From identity features such as access management and sign on, cryptography, console, builder & developer tools, to projects like automating all of our contractual billing systems, AWS Platform is always innovating with the customer in mind. The AWS Platform team sustains over 750 million transactions per second. Learn and Be Curious. We have a formal mentor search application that lets you find a mentor that works best for you based on location, job family, job level etc. Your manager can also help you find a mentor or two, because two is better than one. In addition to formal mentors, we work and train together so that we are always learning from one another, and we celebrate and support the career progression of our team members. Inclusion and Diversity. Our team is diverse! We drive towards an inclusive culture and work environment. We are intentional about attracting, developing, and retaining amazing talent from diverse backgrounds. Team members are active in Amazon’s 10+ affinity groups, sometimes known as employee resource groups, which bring employees together across businesses and locations around the world. These range from groups such as the Black Employee Network, Latinos at Amazon, Indigenous at Amazon, Families at Amazon, Amazon Women and Engineering, LGBTQ+, Warriors at Amazon (Military), Amazon People With Disabilities, and more. Key job responsibilities Work closely with internal and external users on defining and extending application domains. Tune solver performance for application-specific demands. Identify new opportunities for solver deployment. About the team Solver science is a talented team of scientists from around the world. Expertise areas include solver theory, performance, implementation, and applications. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. We are open to hiring candidates to work out of one of the following locations: New York, NY, USA
US, WA, Bellevue
Amazon’s Automated Inventory Management (AIM) Planning Organization is looking for a Data Scientist to help invent the next generation of Amazon's Capacity and Constraint Management system - Automated Planning System (APS). APS will herald a a new era in Sales and Operations Planning (S&OP). APS emerges as a next-generation decision-making framework for Amazon's Worldwide (WW) fulfillment networks. In an industry first, APS seamlessly aligns Amazon's business controls by uniting cutting-edge supply and demand forecasts with a state-of-the-art coordination framework – respecting the distributed ownership of business logic and outcomes. As the centralized planning system, APS takes charge of coordinating all fulfillment, inventory, and operational decisions, maximizing WW Long Term Free Cash Flow (LTFCF) over a 1-year horizon The AIM team is part of the Supply Chain Optimization Technology (SCOT) Team within the Operations Organization. The charter of the SCOT team is to maximize Amazon’s return on our inventory investment in terms of Free Cash Flow and customer satisfaction. The planning organization within Amazon leads the S&OP, IPE and Capacity Planning functions. As a Data Scientist on the this team, you will build a deep understanding of Amazon's supply chain systems, lead innovation in our forecasting capabilities and build principled solutions to identify improvement opportunities in our supply chain using the latest machine learning techniques. You will also work with a team of Product Managers, Business Intelligence Engineers and Software Engineers to research and build accurate predictive models and deploy automated software solutions to provide insights to business leaders at the most senior levels throughout the company. You will build models that make our data more actionable and help us make complex business decisions at scale. To help describe some of our challenges, we created a short video about Supply Chain Optimization at Amazon - http://bit.ly/amazon-scot Key job responsibilities - Implement statistical and machine learning methods to solve complex business problems - Research new ways to improve predictive and explanatory models - Directly contribute to the design and development of automated prediction systems and ML infrastructure - Build models that can detect supply chain defects and explain variance to the optimal state - Collaborate with other researchers, software developers, and business leaders to define the scientific roadmap for this team We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning to help Amazon provide the best experience to our Selling Partners by automatically understanding and addressing their challenges, needs and opportunities? Do you want to build advanced algorithmic systems that are powered by state-of-art ML, such as Natural Language Processing, Large Language Models, Deep Learning, Computer Vision and Causal Modeling, to seamlessly engage with Sellers? Are you excited by the prospect of analyzing and modeling terabytes of data and creating cutting edge algorithms to solve real world problems? Do you like to build end-to-end business solutions and directly impact the profitability of the company and experience of our customers? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Selling Partner Experience Science team. Key job responsibilities - Use statistical and machine learning techniques to create the next generation of the tools that empower Amazon's Selling Partners to succeed. - Design, develop and deploy highly innovative models to interact with Sellers and delight them with solutions. - Work closely with teams of scientists and software engineers to drive real-time model implementations and deliver novel and highly impactful features. - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. - Research and implement novel machine learning and statistical approaches. - Participate in strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment. About the team Selling Partner Experience Science is a growing team of scientists, engineers and product leaders engaged in the research and development of the next generation of ML-driven technology to empower Amazon's Selling Partners to succeed. We draw from many science domains, from Natural Language Processing to Computer Vision to Optimization to Economics, to create solutions that seamlessly and automatically engage with Sellers, solve their problems, and help them grow. Focused on collaboration, innovation and strategic impact, we work closely with other science and technology teams, product and operations organizations, and with senior leadership, to transform the Selling Partner experience. We are open to hiring candidates to work out of one of the following locations: Denver, CO, USA | Seattle, WA, USA
US, WA, Seattle
Amazon is investing heavily in building a world class advertising business and developing a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses for driving long-term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. Key job responsibilities Search Supply and Experiences, within Sponsored Products, is seeking a Senior Data Scientist to join a fast growing team with the mandate of creating new ads experience that elevates the shopping experience for our hundreds of millions customers worldwide. We are looking for a top analytical mind capable of understanding our complex ecosystem of advertisers participating in a pay-per-click model– and leveraging this knowledge to help turn the flywheel of the business. As a Senior Data Scientist on this team you will: - Lead Data Science solutions from beginning to end. - Deliver with independence on challenging large-scale problems with ambiguity. - Manage and drive the technical and analytical aspects of Advertiser segmentation; continually advance approach and methods. - Write code (Python, R, Scala, etc.) to analyze data and build statistical models to solve specific business problems - Retrieve, synthesize, and present critical data in a format that is immediately useful to answering specific questions or improving system performance. - Analyze historical data to identify trends and support decision making. - Improve upon existing methodologies by developing new data sources, testing model enhancements, and fine-tuning model parameters. - Provide requirements to develop analytic capabilities, platforms, and pipelines. - Apply statistical and machine learning knowledge to specific business problems and data. - Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers. Work out why such examples are outliers and define if any actions needed. - Given anecdotes about anomalies or generate automatic scripts to define anomalies, deep dive to explain why they happen, and identify fixes. - Build decision-making models and propose solution for the business problem you defined - Conduct written and verbal presentation to share insights and recommendations to audiences of varying levels of technical sophistication. - Write code (python or another object-oriented language) for data analyzing and modeling algorithms. A day in the life The Senior Data Scientist will have the opportunity to use one of the world's largest eCommerce and advertising data sets to influence the evolution of our products. This role requires an individual with excellent business, communication, and technical skills, enabling collaboration with various functions, including product managers, software engineers, economists and data scientists, as well as senior leadership. This role will create and enhance performance monitoring reports to find insights that product and business team should focus on. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail, and with an ability to work in a fast-paced, high-energy and ever-changing environment. The drive and capability to shape the direction is a must. This role will influence the direction of the business by leveraging our data to deliver insights that drive decisions and actions. The role will involve translating broad business problems into specific analytics projects, conducting deep quantitative analyses, and communicating results effectively. The role will help the organization identify, evaluate, and evangelize new techniques and tools to continue to improve our ability to deliver value to Amazon’s customers. About the team We are a customer-obsessed team of engineers, technologists, product leaders, and scientists. We are focused on continuous exploration of contexts and creatives where advertising delivers value to customers and advertisers. We specifically work on new ads experiences globally with the goal of helping shoppers make the most informed purchase decision. We obsess about our customers and we are continuously innovating on their behalf to enrich their shopping experience on Amazon We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA