Scaling graph-neural-network training with CPU-GPU clusters

In tests, new approach is 15 to 18 times as fast as predecessors.

Graphs are a useful way to represent data, since they capture connections between data items, and graph neural networks (GNNs) are an increasingly popular way to work on graphs. Common industry applications of GNNs include recommendation, search, and fraud detection.

The graphs used in industry applications are usually massive, with billions of nodes and hundreds of billions or even trillions of edges. Training GNNs on graphs of this scale requires massive memory storage and computational power, with correspondingly long training times and large energy footprints.

Related content
Information extraction, drug discovery, and software analysis are just a few applications of this versatile tool.

In a paper we’re presenting at this year’s KDD, my colleagues and I describe a new approach to distributed training of GNNs that uses both CPUs and GPUs, optimizing the allocation of tasks to different processor types to minimize training times.

In tests, our approach — DistDGLv2 — offered an 18-fold speedup over Euler, another distributed GNN training framework, on the same hardware. DistDGLv2 also achieves a speedup of up to 15-fold over distributed CPU training in a cluster of the same size.

Graph neural networks

In the GNN setting, graph nodes typically represent objects, and the graph edges represent relationships between objects. Both nodes and edges may have associated features — data such as object properties or types of relationships between objects.

Related content
Amazon’s George Karypis will give a keynote address on graph neural networks, a field in which “there is some fundamental theoretical stuff that we still need to understand.”

For each node in a graph, a GNN produces a vector representation (an embedding) that encodes information about the node and its neighborhood — often its one- or two-hop neighborhood, but sometimes larger regions. With the large graphs common in industrial applications, it can be time consuming to factor in all of a node’s one-hop neighbors, let alone its more distant neighbors. So when producing node embeddings, GNNs will often use minibatches of nodes sampled from the target node’s neighborhood.

There are many research works on minibatch sampling — for example, our global-neighbor-sampling technique, presented at KDD 2021. In our new paper, we implement a popular minibatch-sampling algorithm proposed by GraphSage, shown in the figure below. It first samples the target nodes (such as the blue node) and then samples their neighbor nodes (such as the red nodes and orange nodes). DistDGLv2, however, has the flexibility to implement other sampling algorithms.

Minibatch sampling procedure.png
An example of the minibatch sampling procedure.

DistDGLv2

DistDGLv2 has three main components:

  • a distributed key-value database (KVStore) to store node/edge features and learnable embeddings;
  • a distributed graph store to keep the partitioned graphs for minibatch sampling; and
  • a set of trainers to run forward and backward computation on minibatches to estimate the gradients of the model parameters.

To optimize the use of computational resources and scale to very large graphs, we divide these components between CPUs and GPUs. The distributed KVStore and graph store use CPU memory, and CPUs generate the minibatches. The trainers read the minibatch data into GPUs for minibatch computations.

Method overview.png
Method overview.

The key to accelerating minibatch training in DistDGLv2 is efficiently moving minibatches from CPU to GPU. To do this, DistDGLv2 deploys three strategies:

Related content
New method enables two- to 14-fold speedups over best-performing predecessors.

  • First, it uses the METIS graph-partitioning program (codeveloped by Amazon senior principal scientist George Karypis) to generate graph partitions with minimal edge cuts, and it collocates data with computation to reduce network communication;
  • It builds an asynchronous minibatch training pipeline to overlap computation and data movement in all hardware;
  • It moves as many computations to GPU as possible to take advantage of GPUs’ computational power.

To collocate data with computation, DistDGLv2 runs KVStore servers, distributed graph store servers, and trainers on the same set of machines. When a graph partition is loaded, its node and edge features go to the KVStore, and the graph structure goes to the graph store server. Each trainer is assigned a training set, where most training nodes and edges belong to the graph partition assigned to the same machine. In this way, most of the data associated with a minibatch will come from the local machine during the minibatch training.

Related content
Novel cross-graph-attention and self-attention mechanisms enable state-of-the-art performance.

DistDGLv2 implements the second and the third strategies by splitting the minibatch pipeline into seven stages, five of which help prepare a minibatch. We keep as many stages as possible on GPU to take advantage of GPUs’ computational power, while placing the minibatch sampling stages in CPU in another thread. This allows us to overlap minibatch computation in GPU and minibatch sampling in CPU.

As illustrated in the figure below, we run the last four stages in GPU; some of those stages are still involved in minibatch preparation.

In addition to this, we further overlap network communication and CPU computation. We have the sampling pipeline “look ahead” and sample multiple minibatches simultaneously. Thus, when a minibatch is being generated, while a given CPU is waiting for remote neighbor sampling (from another machine) or feature copy (to a GPU), it can move to another minibatch to sample neighbors or copying data locally. In this way, we can effectively hide network communication latency.

GNN training pipeline_.jpeg
The minibatch training pipeline, with a blowup of the minibatch generation step.

With these optimizations, DistDGLv2 can effectively perform distributed GNN training in a cluster of CPUs and GPUs. We demonstrate the efficiency of DistDGLv2 on a cluster of g4dn.metal instances with various GNN workloads. DistDGLv2’s performance relative to CPU-only methods indicates that GPUs can be more effective for distributed GNN minibatch training on massive graphs than CPUs.

Performance graph.png
A comparison of minibatch and full-graph training on the same hardware.

Researchers have also proposed using full graph training for GNN models. This method runs forward and backward computation on the entire graph. We did a comparison between minibatch training and full-graph training on the same graph datasets with the same hardware. We show that minibatch training is much more efficient to train GNN models, and the speed gap gets larger the larger the graphs grow.

On a graph built from the OGBN-papers100M dataset, which has 100 million nodes, minibatch training is about 100 times as fast. After six day’s training, full-graph training still cannot reach the same accuracy as minibatch training, while minibatch training takes 1.5 hours to reach the state-of-the-art performance on the same CPU.

Related content

US, TX, Austin
AWS is seeking an experienced, self-directed Data Scientist to support Sales, Strategy, and Operations. They will be responsible for finding new ways of leveraging our large, complex data streams to help us serve our customers in their journey to the cloud. A successful candidate will collaborate closely with business stakeholders, product managers, and data engineers on high visibility and high impact initiatives. They will invent, implement, and deploy state of the art machine learning/AI algorithms and systems to understand our data using tools and techniques such as causal inference models. They will build prototypes and explore conceptually large-scale ML solutions. Beyond mathematical understanding, they have a deep intuition for machine learning that allows them to discover new insights and optimize our sales intelligence offerings. They are able to pick up and grasp new research and identify applications or extensions within the team. They are a superb written and verbal communicator. Key job responsibilities Work with business stakeholders, product managers, data scientists, and engineers to translate business problems into the right machine learning, data science, and/or statistical solutions. Execute every stage of the machine learning development life cycle; researching, developing, deploying, scheduling in production, measuring adoption, improving, and maintaining. Build state of the art causal inference models to help the business understand its key drivers Work with large volumes of structured and unstructured data spread across multiple databases. Design and implement data pipelines to clean and merge these data for research and modeling. Use AWS services (AWS Redshift, S3, EC2, Glue, etc) to deploy scalable ML models in the cloud. Communicate insights to business owners in concise, non-technical language. Examples of projects include: propensity-to-buy prediction and explanation, product recommendation, forecasting, anomaly detection, text classification, generative AI content generation About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. About Sales, Marketing and Global Services (SMGS) SMGS is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector.
ZA, Cape Town
AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. We are a new team in AWS' Kumo organisation - a combination of software engineers and AI/ML experts. Kumo is the software engineering organization that scales AWS’s support capabilities. Amazon’s mission is to be earth’s most customer-centric company and this also applies when it comes to helping our own Amazon employees with their everyday IT Support needs. Our team is innovating for the Amazonian, making the interaction with IT Support as smooth as possible. We achieve this through multiple mechanisms which eliminate root causes altogether, automate issue resolution or point customers towards the optimal troubleshooting steps for their situation. We deliver the support solutions plus the end-user content with instructions to help them self-serve. We employ machine learning solutions on multiple ends to understand our customer's behavior, predict customer's intent, deliver personalized content and automate issue resolution through chatbots. As an applied scientist on our team, you will help to build the next generation of case routing using artificial intelligence to optimize business metric targets addressing the business challenge of ensuring that the right case gets worked by the right agent within the right time limit whilst meeting the target business success metric. You will develop machine learning models and pipelines, harness and explain rich data at Amazon scale, and provide automated insights to improve case routing that impact millions of customers every day. You will be a pragmatic technical leader comfortable with ambiguity, capable of summarizing complex data and models through clear visual and written explanations. Amazon knows that a diverse, inclusive culture empowers us all to deliver the best results for our customers. We celebrate diversity in our workforce and in the ways we work. As part of our inclusive culture, we offer accommodations during the interview and onboarding process. If you’d like to discuss your accommodation options, please contact your recruiter, who will partner you with the Applicant-Candidate Accommodation Team (ACAT). You may also contact ACAT directly by emailing acat-africa@amazon.com. We want all Amazonians to have the best possible Day 1 experience. If you’ve already completed the interview process, you can contact ACAT for accommodation support before you start to ensure all your needs are met Day 1. Key job responsibilities - Analyze complex support case datasets and metrics to drive insight - Design, build, and deploy effective and innovative ML solutions to optimize case routing - Evaluate the proposed solutions via offline benchmark tests as well as online A/B tests in production. - Drive collaborative research and creative problem solving across science and engineering team - Propose and validate hypothesis to deliver and direct our product road map - Work with engineers to deliver low latency model predictions to production About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, WA, Seattle
How can we improve the shopping experience on Amazon.com by tailoring what we display on our pages based on customer interests and preferences? How do we use generative models to help us innovate in different ways to enhance the shopping experience? How do we generate personalized content that helps customers shop at scale? Our team's stated missions is to "grow each customer’s relationship with Amazon by leveraging our deep understanding of them to provide relevant and timely product, program, and content recommendations." Recommendations at Amazon is a way to help customers discover products. Our team strives to better understand how customers shop on Amazon (and elsewhere) and build recommendations models to streamline customers' shopping experience by showing the right products at the right time. Understanding the complexities of customers' shopping needs and helping them explore the depth and breadth of Amazon's catalog is a challenge we take on every day. Using Amazon’s large-scale computing resources, you will build and deploy text and generation models that help customers shop on Amazon. You will ask research questions about customer behavior, build state-of-the-art models to generate recommendations, and run these models directly on the retail website. You will participate in the Amazon ML community and mentor Applied Scientists and software development engineers with a strong interest in and knowledge of ML and generative AI. Your work will directly benefit customers and the retail business and you will measure the impact using scientific tools. We are looking for a passionate, hard-working, and talented Applied Scientist who has experience building mission critical, high volume applications that customers love. You will have an opportunity to make an enormous impact on the design, architecture, and implementation of cutting edge products used everyday by people you know.
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians on a mission to develop a fault-tolerant quantum computer. You will be joining a team located in Pasadena, CA that conducts materials research to improve the performance of quantum processors. We are looking to hire a Quantum Research Scientist who will apply their expertise in materials characterization to the optimization of fabricated superconducting quantum devices. In this role, you are expected to lead and assist research projects that are aligned with our Center’s technical roadmap. You will develop new ideas and design experiments aimed at identifying the most promising material systems, characterization techniques, and integration processes for superconducting circuit applications. Key job responsibilities Conduct experimental studies on the fundamental properties of superconducting, semiconducting, and dielectric thin films Develop and implement multi-technique materials characterization workflows for thin films and devices, with a focus on the surfaces and interfaces Work closely with other research scientists on the Materials team to develop material processes directed toward optimizing thin film properties, controlling the surface chemistry and morphology, and impacting device performance Identify materials properties (chemical, structural, electronic, electrical) that can be a reliable proxy for the performance of superconducting qubits and microwave resonators Communicate engineering and scientific findings to teammates, the broader CQC and, when appropriate, publish findings in scientific journals. A day in the life AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. About the team Our team contributes to the fabrication of processors and other hardware that enable quantum computing technologies. Doing that necessitates the development of materials with tailored properties for superconducting circuits. Research Scientists and Engineers on the Materials team operate deposition and characterization systems in order to develop and optimize thin film processes for use in these devices. They work alongside other Research Scientists and Engineers to help deliver fabricated devices for quantum computing experiments.
US, NY, New York
Amazon AI is looking for world class scientists and engineers to join its AWS AI Labs to develop groundbreaking generative AI technologies in Amazon Q, which is an interactive, AI-powered assistant for developers. You will be part of the Q Code Analysis team that works at the intersection of code analysis, logical reasoning and machine learning to build and enhance capabilities, safety and security of AI-powered developer tools in Amazon Q. You will invent, implement, and deploy state-of-the-art algorithms and systems, and be at the heart of a growing and exciting focus area for AWS. Your work will directly impact millions of our customers in the form of products and services that are based on large language models, retrieval-augmented generation, code analysis, responsible AI, and a lot more. You will make breakthroughs that challenge the limits of code analysis, machine learning and AI while collaborating with academics and interacting directly with customers to bring new research rapidly to production. Diverse Experiences: AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture: Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance: We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work: We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our Amazon offices. About the team The Amazon Web Services (AWS) Next Gen DevX (NGDE) team uses generative AI and foundation models to reimagine the experience of all builders on AWS. From the IDE to web-based tools and services, AI will help engineers work on large and small applications. We explore new technologies and find creative solutions. Curiosity and an explorative mindset can find a place here to impact the life of engineers around the world. If you are excited about this space and want to enlighten your peers with new capabilities, this is the team for you.
US, WA, Redmond
Project Kuiper is an initiative to increase global broadband access through a constellation of 3,236 satellites in low Earth orbit (LEO). Its mission is to bring fast, affordable broadband to unserved and underserved communities around the world. Project Kuiper will help close the digital divide by delivering fast, affordable broadband to a wide range of customers, including consumers, businesses, government agencies, and other organizations operating in places without reliable connectivity. As an Applied Scientist on the team you will responsible for building out and maintaining the algorithms and software services behind one of the world’s largest satellite constellations. You will be responsible for developing algorithms and applications that provide mission critical information derived from past and predicted satellite orbits to other systems and organizations rapidly, reliably, and at scale. You will be focused on contributing to the design and analysis of software systems responsible across a broad range of areas required for automated management of the Kuiper constellation. You will apply knowledge of mathematical modeling, optimization algorithms, astrodynamics, state estimation, space systems, and software engineering across a wide variety of problems to enable space operations at an unprecedented scale. You will develop features for systems to interface with internal and external teams, predict and plan communication opportunities, manage satellite orbits determination and prediction systems, develop analysis and infrastructure to monitor and support systems performance. Your work will interface with various subsystems within Project Kuiper and Amazon, as well as with external organizations, to enable engineers to safely and efficiently manage the satellite constellation. The ideal candidate will be detail oriented, strong organizational skills, able to work independently, juggle multiple tasks at once, and maintain professionalism under pressure. You should have proven knowledge of mathematical modeling and optimization along with strong software engineering skills. You should be able to independently understand customer requirements, and use data-driven approaches to identify possible solutions, select the best approach, and deliver high-quality applications. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. About the team The Constellation Management & Space Safety team maintains and builds the software services responsible for maintaining situational awareness of Kuiper satellites through their entire lifecycle in space. We coordinate with internal and external organizations to maintain the nominal operational state of the constellation. We build automated systems that use satellite telemetry and other relevant data to predict future orbits, plan maneuvers to avoid high risk close approaches with other objects in space, keep satellites in the desired locations, and exchange data with external organizations. We provide visibility information that is used to predict and establish communication channels for Kuiper satellites.
IN, KA, Bengaluru
IN Consumer BI Reporting and Analytics (COBRA) team is looking for a highly driven, customer-obsessed Data Scientist who will be responsible for scaling Insights & Science charter across Category & Seller orgs and enable effective business decision making across IN Stores. You’ll analyze large amounts of data, discover and solve real world problems, build science-driven solutions around key projects and, most of all, be an integral part of creating a better customer and seller experience. We are looking for customer obsessed, data driven entrepreneurs to join our growing team and solve some of the hardest problems for our customers. If you want operate at start up speed, solve some of the hardest problems and build a service which customers love, this role might just be the place for you. The Data Scientist is responsible for unlocking business value across IN Stores through advanced analytics (ML, Science/Statistics). The Data Scientist's responsibilities include, but are not limited to the following points: -Collaborate with business stakeholders and product/program managers to innovate on behalf of customers leveraging data science methodologies, and partner with Data engineers and BiE's to design, develop, and scale machine learning models -Use computational methods to identify relationships between business data and outcomes, define outliers and anomalies, and justify those outcomes to business customers -Develop code to analyze data (SQL, PySpark, Scala, etc.) and build statistical and machine learning models and algorithms -Work with distributed machine learning and statistical algorithms to harness enormous volumes of data at scale to serve our customers -End-to-end ownership of operational and technical aspects of the insights you are building for the business, and play a critical role in enabling science based decision making for business - Have excellent business and communication skills to be able to work with business owners to develop and define key business questions and build mechanisms that answer those questions
US, WA, Seattle
** This position is open to all candidates in Palo Alto, CA, Seattle, WA, NYC and Arlington, VA ** Amazon Ads Response Prediction team is your choice, if you want to join a highly motivated, collaborative, and fun-loving team with a strong entrepreneurial spirit and bias for action. We are seeking an experienced and motivated Machine Learning Applied Scientist who loves to innovate at the intersection of customer experience, deep learning, and high-scale machine-learning systems. Amazon Advertising operates at the intersection of eCommerce and advertising, and is investing heavily in building a world-class advertising business. We are defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long-term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products to improve both shopper and advertiser experience. With a broad mandate to experiment and innovate, we grow at an unprecedented rate with a seemingly endless range of new opportunities. We are looking for a talented Machine Learning Applied Scientist for our Amazon Ads Response Prediction team to grow the business. We are providing advanced real-time machine learning services to connect shoppers with right ads on all platforms and surfaces worldwide. Through the deep understanding of both shoppers and products, we help shoppers discover new products they love, be the most efficient way for advertisers to meet their customers, and helps Amazon continuously innovate on behalf of all customers. Key job responsibilities As a Machine Learning Applied Scientist, you will: * Conduct deep data analysis to derive insights to the business, and identify gaps and new opportunities * Develop scalable and effective machine-learning models and optimization strategies to solve business problems * Run regular A/B experiments, gather data, and perform statistical analysis * Work closely with software engineers to deliver end-to-end solutions into production * Improve the scalability, efficiency and automation of large-scale data analytics, model training, deployment and serving * Conduct research on new machine-learning modeling to optimize all aspects of Sponsored Products business About the team We are pioneers in applying advanced machine learning and generative AI algorithms in Sponsored Products business. We develop customized CTR and CVR prediction models to help shopper feel understood and shop efficiently on off search placements (including product detail pages, thank you pages, home pages, etc.). We develop generative models to understand shopper intent and preference and predict next action.
US, CA, Palo Alto
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. As a core product offering within our advertising portfolio, Sponsored Products (SP) helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The SP team's primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! The Response Prediction team builds machine-learning models and infrastructure to support the Sponsored Products Ads business. Through precise estimation of shoppers' response to ads (e.g. clicks or product purchases), this team helps deliver the most relevant ads experience to shoppers, improves advertisers' ROI, and optimizes Amazon's long-term monetization. The team builds and operates one of the largest ML workflows in WW Advertising, serving Search and Detail Pages. Additionally, it also owns the horizontal ML infrastructure to support various ML use cases - from offline ML pipelines to online model inferencing and model management services. Team video https://youtu.be/zD_6Lzw8raE Key job responsibilities As a Sr. Applied Scientist on this team, you will: - Drive end-to-end Machine Learning projects that have a high degree of ambiguity, scale, complexity. - Perform hands-on analysis and modeling of enormous data sets to develop insights that increase traffic monetization and merchandise sales, without compromising the shopper experience. - Build machine learning models, perform proof-of-concept, experiment, optimize, and deploy your models into production; work closely with service engineers to bring your ML models in production - Run A/B experiments, gather data, and perform statistical analysis. - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. - Research new and innovative machine learning approaches - Publish papers in internal and external conferences
US, WA, Seattle
The Private Brands Discovery team designs innovative machine learning solutions to drive customer awareness for Amazon’s own brands and help customers discover products they love. Private Brands Discovery is an interdisciplinary team of Scientists and Engineers, who incubate and build disruptive solutions using cutting-edge technology to solve some of the toughest science problems at Amazon. To this end, the team employs methods from Natural Language Processing, Deep learning, multi-armed bandits and reinforcement learning, Bayesian Optimization, causal and statistical inference, and econometrics to drive discovery across the customer journey. Our solutions are crucial for the success of Amazon’s own brands and serve as a beacon for discovery solutions across Amazon. This is a high visibility opportunity for someone who wants to have business impact, dive deep into large-scale problems, enable measurable actions on the consumer economy, and work closely with scientists and engineers. As a scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions.. With a focus on bias for action, this individual will be able to work equally well with Science, Engineering, Economics and business teams. Key job responsibilities - Drive applied science projects in machine learning end-to-end: from ideation over prototyping to launch. For example, starting from deep scientific thinking about new ways to support customers’ journeys through discovery, you analyze how customers discover, review and purchase Private Brands to innovate marketing and merchandising strategies. - Propose viable ideas to advance models and algorithms, with supporting argument, experiment, and eventually preliminary results. - Invent ways to overcome technical limitations and enable new forms of analyses to drive key technical and business decisions. - Present results, reports, and data insights to both technical and business leadership. - Constructively critique peer research and mentor junior scientists and engineers. - Innovate and contribute to Amazon’s science community and external research communities.