Long-form-video understanding and synthesis

Four CVPR papers from Prime Video examine a broad set of topics related to efficient model training for understanding and synthesizing long-form cinematic content.

At this year’s Conference on Computer Vision and Pattern Recognition (CVPR), Prime Video presented four papers that indicate the broad range of cutting-edge problems we work on.

In one paper, “Movies2Scenes: Using movie metadata to learn scene representation", we present a novel contrastive-learning approach that uses only commonly available movie metadata to learn a general-purpose scene representation. On a diverse set of tasks evaluated using multiple benchmark datasets, models that use our representations consistently outperform models using existing state-of-the-art representations.

Notably, our learned representation offers an average improvement of 7.9% on the seven classification tasks and 9.7% on the two regression tasks in the Long-Form Video Understanding (LVU) dataset. This effort is an important step toward the first foundation model for general-purpose movie understanding.

In another paper, “Selective structured state-spaces for long-form video understanding”, we expand on the recently proposed S4 model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Our approach is consistently more accurate than the previous state-of-the-art model, by as much as 9.6%, while reducing the memory footprint by 23%.

Related content
Detectors for block corruption, audio artifacts, and errors in audio-video synchronization are just three of Prime Video’s quality assurance tools.

Similarly, our paper "Dynamic inference with grounding based vision and language models" explores the problem of computational redundancy in large vision-and-language models, addressing this challenge by dynamically skipping network layers, dropping input tokens, and fusing multimodal tokens, conditioned on the input image-text pair. Our results show that we can improve the run-time efficiency of the state-of-the-art models by up to 50% on multiple downstream tasks with an accuracy drop of only 0.3%.

Lastly, our paper "LEMaRT: Label-efficient masked region transform for image harmonization" addresses the problem of requiring large amounts of labeled data to train image harmonization models, which modify content from different source images so that they blend together better in composite images. To this end, our method automatically generates training data by simulating defects in appearance that image harmonization models are expected to remove. Our method outperforms previous state-of-the-art approaches by a margin of 0.4dB (mean square error improvement = ~9%) when it is fine-tuned on only 50% of the training data from one of the standard benchmarks (iHarmony4) and by 1.0 dB (MSE improvement = ~21%) when it is trained on the full training dataset.

Toward a foundation model for movie understanding

The term “foundation model” generally relates to (i) a single large model that is (ii) trained on large amounts of mostly unlabeled data and can (iii) drive a number of downstream tasks. While several general-purpose visual-and-textual foundation models exist (e.g., BERT, GPT-4, CLIP, DALL-E 2, etc.), no foundation model particularly geared for movie understanding has been proposed before our work.

This is partly because directly applying existing visual or textual foundation models for movie understanding has limited effectiveness, given the large domain gap between cinematic content and the web-crawled images and text used to train those models. Factors such as the inaccessibility of much large-scale cinematic content, the computational resources required to process it, and the lack of benchmark datasets for evaluation on downstream applications add to the challenge of building a foundation model for movie understanding.

Related content
CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

To address these challenges, we proposed a novel model trained on over five million scenes automatically identified from thousands of movies and comprising more than 45 million frames. Our model does not require any manual annotations and relies only on commonly available movie-level information (genre, synopsis, etc.). The scene representations from our model can be applied to improve the performance of a diverse set of downstream tasks, which is a key step toward building a foundation model for movie understanding.

We use movie metadata to define a measure of movie similarity and use that similarity measure to identify data pairs for contrastive learning. In contrastive learning, a model is trained on both positive pairs — examples that are similar in the relevant way — and negative pairs. During training, the model learns to produce data representations that pull positive pairs together and push negative pairs apart.

Often, the positive pairs are created by augmenting existing examples — say, re-cropping them, reversing them, or re-coloring them. By instead using movies that are considered similar to each other (see below), we ensure that our positive scene-pairs are not only visually similar but also semantically coherent, providing us with a much richer set of geometric and thematic data augmentations that enhance the training objective beyond traditional augmentation approaches.

Overview of approach.png
Overview of our approach.

As can be seen in the video below, our learned scene representation is able to effectively put thematically similar scenes close to each other.

Qualitative examples of similar-scene pairs found using our approach.

In the examples below, we compare our representation with the commonly used CLIP visual representation for scene retrieval using place-labeled scenes in the Long-Form Video Understanding (LVU) dataset. Given a query scene, our representation can capture appearance as well as semantic concepts to retrieve similar scenes more effectively, while CLIP can capture only local appearance-based patterns. For overall retrieval precision on six categories of places, our representation offers a 22.7% improvement over CLIP.

Video representation comparison.png
A comparison of our video representation method and one of its predecessors, CLIP, on the task of place retrieval using the Long-Form Video Understanding (LVU) dataset.

Quantitatively, our learned representation exhibits an average improvement of 7.9% and 9.7% on the seven classification tasks and two regression tasks of the LVU dataset, respectively. Furthermore, using our newly collected MCD dataset in Prime Video, we compare our learned scene representation with state-of-the-art models pretrained on action recognition and image classification datasets. Our scene representation outperforms the alternatives by margins ranging from 3.8% to 50.9% across different models and tasks.

Reducing model complexity for long-form-video understanding

At Prime Video, we’re developing state-of-the-art AI models for cinematic-content understanding to facilitate a variety of downstream use cases. One of the key technical problems to this end is effective modeling of complex spatiotemporal dependencies, particularly in long-form videos such as movies and TV episodes.

Spatiotemporal dependencies.png
Various shots from the movie Stuart Little, showing the complex spatiotemporal dependencies of cinematic content.

Previously proposed convolutional and recurrent neural networks struggle to learn long-term dependencies. In part this is because of exploding or vanishing gradients — where cascading adjustments to model weights grow too small or too large — as information is incorporated over long durations. Vision transformers can use self-attention to address this challenge, attending to particular, prior frames of video when interpreting the current frame. But this is computationally expensive, as it requires pairwise computations between the current frame and its predecessors.

Related content
Prime Video beats previous state of the art on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.

The recently proposed structured-state-space-sequence (S4) model, with its linear complexity, offers a promising direction in this space; however, we empirically demonstrate that treating all image tokens equally, as the S4 model does, can adversely affect a model’s efficiency and accuracy.

To address this challenge, we present a novel selective S4 (i.e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Unlike previous methods, which used mask-based token reduction in transformers, our S5 model avoids the dense self-attention calculation by following the guidance of the momentum-updated S4 model. This enables our model to efficiently discard less informative tokens and adapt to various long-form-video-understanding tasks more effectively.

S5 model.png
At left is an illustration of our S5 model (a). We introduce a “mask generator” that enacts a selective token-picking strategy, leveraging the feature representations from the momentum S4 model. The momentum S4 model is updated by the S4 model in the moving-average manner. At right is an illustration of the proposed pretraining framework using long-short masked contrastive learning (b), which initializes our S5 model to enhance robustness.

However, as is the case with most token reduction methods, the informative image tokens may be dropped incorrectly. To improve the robustness and the temporal horizon of our model, we propose a novel long-short masked contrastive-learning (LSMCL) approach that enables our model to predict longer temporal contexts using shorter input videos.

We present extensive comparative results using three challenging long-form video-understanding datasets (LVU, COIN, and Breakfast), demonstrating that our approach is consistently more accurate than the previous state-of-the-art S4 model, by as much as 9.6% on one dataset, with a memory footprint that’s 23% smaller.

Dynamic inference of multimodal models using reinforcement learning

The availability of transformer models operating over multiple data modalities as well as large-scale pretraining approaches has led to significant progress on joint image-and-language models. However, these models impose high computational costs and therefore offer low run-time efficiency, making them difficult to apply to Prime Video’s large catalogue.

Although approaches such as pruning, knowledge distillation, and quantization can help address this challenge, they can incur significant drops in accuracy (e.g., ≥ 1% at ≥ 50% model compression rates), as they are primarily designed for model-parameter reduction, not improving run-time efficiency.

Related content
The switch to WebAssembly increases stability, speed.

To address this challenge, we propose a model that saves computation by dynamically skipping layers of a multimodal network; pruning input tokens from either the language backbone, the image backbone, or both; and fusing tokens from the separate backbones, conditioned on the input image-text pair.

Most multimodal transformer models include multihead self-attention and feed-forward network layers, which can be skipped for some inputs. Additionally, we remove redundant tokens at different levels of the backbones and fuse the image tokens with the language tokens in an adaptive manner. To learn policies for dynamic inference, we train agents using reinforcement learning.

Our results demonstrate that we can improve the run-time efficiency of the state-of-the-art models MDETR and GLIP by up to 50% on the tasks of referring-expression comprehension, segmentation, and visual question-answering, with a maximum accuracy drop of only 0.3%.

Accuracy vs FPS:FLOPS.png
Accuracy-vs.-frames-per-second (a and b) and accuracy-vs.-GFLOPS (c and d) comparisons of the evaluated models. As shown, our proposed method comfortably outperforms multiple alternative approaches on both metrics while maintaining high accuracy.

Improving label efficiency of image harmonization models

Image harmonization is an important component of the broader problem of image composition, where new images are created by extracting foreground regions from one image and transferring them to another image in a photorealistic manner.

Related content
Two papers at WACV propose neural models for enhancing video-streaming experiences.

The main technical challenge for image harmonization is the appearance mismatch between the foreground extracted from the source image and the background of the destination image. Image harmonization aims to adjust the appearance of the foreground to make it compatible with the background. However, training traditional models for image harmonization requires a large amount of labeled data, which is costly and time-consuming to obtain.

To address this challenge, we introduce a novel approach to pretraining image harmonization models, LEMaRT, which automatically generates training data by simulating the types of defects that image harmonization models are expected to remove. LEMaRT takes an image as input, selects a region in that image, and applies a set of appearance transformations to it. We use these modified images, along with the original images, to pretrain our image harmonization model. Furthermore, we introduce an image harmonization model, SwinIH, by retrofitting the previously proposed Swin Transformer with a combination of local and global self-attention mechanisms.

Image transformations.png
Given an image, our approach applies a set of transformations (e.g., brightness, hue adjustment) to obtain a transformed image that is combined with the original image to form a composite. These composite images are used to pretrain our image harmonization transformer model. As shown in the figure, our model is capable of reconstructing photorealistic outputs.

Pretraining our SwinIH model with our LEMaRT approach results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on the iHarmony4 dataset, SwinIH outperforms the state of the art, i.e., SCS-Co by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data and by 1.0 dB when it is trained on the full training dataset.

LeMART performance.png
Using our LEMaRT pretraining scheme, our image harmonization model (SwinIH) surpasses state-of-the-art (SOTA) counterparts with less than 40% of the training data from iHarmony4 for fine-tuning. Qualitatively, LEMaRT is better than competing methods at color correction, thanks to the distribution of photorealistic images that it learns from a large amount of unlabeled data during self-supervised pretraining.

Qualitative comparisons suggest that LEMaRT is better at color correction than prior methods, thanks to the pretraining process, during which LEMaRT learns the distribution of photorealistic images.

Qualitative comparison.png
Qualitative comparison between our method, LEMaRT (SwinIH), and three state-of-the-art methods (RainNet, iS2AM, DHT+) on the iHarmony4 dataset.

Research areas

Related content

US, WA, Bellevue
We are building a world-class last mile delivery ecosystem with Amazon Flex as a cornerstone of this strategy. Amazon Flex works directly with independent contractors, to make deliveries to our customers. With Amazon Flex, delivery partners are their own boss, build their own schedule, and choose from different types of delivery opportunities (e.g. Amazon Fresh, Whole Foods Market, and Amazon Logistics). Amazon Flex is powered by a mobile app that works in sync with our advanced systems and processes, allowing delivery partners to secure delivery offers, track their delivery progress, and more. Economists at Amazon Flex partner closely with senior management, business stakeholders, scientists and engineers, and economist leadership to solve key business problems including pricing, promotions, offer optimization, recruiting, capacity planning, and beyond. Amazon Flex Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical labor, or related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of a cross-functional team that supports all of Amazon Last Mile Delivery Tech. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems across the business.
US, WA, Bellevue
The Fulfillment by Amazon (FBA) team is looking for a passionate, curious, and creative Research Scientist, with expertise and experience in operations research, operations management, supply chains, and revenue management, to join our top-notch cross-domain FBA science team. As a research scientist you will be responsible for designing and implementing cutting edge optimization models and machine learning models and building automated inventory management system to solve key challenges facing the worldwide FBA Seller business, including 1) improving FBA Seller inventory efficiency, 2) efficiently balancing the supply and demand of FBA Seller capacity, 3) closing worldwide selection gap by enabling global selling profitability, and 4) driving out costs across the FBA supply chain to spin the flywheel. Unlike many companies who buy existing off-the-shelf planning systems, we are responsible for studying, designing, and building systems to suit Amazon’s needs. Our team members have an opportunity to be on the forefront of thought leadership by working on some of the most difficult problems in the industry with some of the best product managers, research scientists/statisticians/economists and software developers in the business. This role will work with other senior and principal scientists, and partner with engineering and product teams to integrate scientific work into production systems. Key job responsibilities • Interact with engineering, operations, science and business teams to develop an understanding and domain knowledge of processes, system structures, and business requirements • Apply domain knowledge and business judgment to identify opportunities and quantify the impact aligning research direction to business requirements and make the right judgment on research project prioritization • Develop scalable mathematical models to derive optimal or near-optimal solutions to existing and new inventory planning challenges • Create prototypes and simulations to test devised solutions • Advocate technical solutions to business stakeholders, engineering teams, as well as executive level decision makers • Work closely with engineers to integrate prototypes into production systems • Create policy evaluation methods to track the actual performance of devised solutions in production systems, identify areas with potential for improvement and work with internal teams to improve the solution with new features A day in the life As a Research Scientist, you will solve real world large inventory problems by analyzing large amounts of business data, defining new metrics and business cases, designing simulations and experiments, applying supply chain modeling techniques, creating optimization models, and collaborating with teammates in business, software, and research. The successful candidate has solid research experience in Operations Research preferably with focus on Operations Management or other closely related areas or in area of Machine Learning. He or she will lead the research where we are responsible for developing solutions to better manage and optimize worldwide FBA inventory capacity, while providing the best experience to our Sellers to growth their business. About the team Fulfillment by Amazon (FBA) is a service that allows sellers to outsource order fulfillment to Amazon, allowing sellers to leverage Amazon’s world-class facilities to provide customers Prime delivery promise. Sellers gain access to Prime members worldwide, see their sales lift, and are free to focus their time and resources on what they do best while Amazon manages fulfillment. Over the last several years, sellers have enjoyed strong business growth with FBA shipping more than half of all products offered by Amazon. FBA focuses on helping sellers with automating and optimizing the third-party supply chain. FBA sellers leverage Amazon’s expertise in machine learning, optimization, data analytics, econometrics, and market design to deliver the best inventory management experience to sellers. We work full-stack, from foundational backend systems to future-forward user interfaces. Our culture is centered on rapid prototyping, rigorous experimentation, and data-driven decision-making.
US, GA, Atlanta
Machine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations. The Generative AI team helps AWS customers accelerate the use of Generative AI to solve business and operational challenges and promote innovation in their organization. As an applied scientist, you are proficient in designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data to solve real- world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for talented scientists capable of applying ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others. AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global Support team interacts with leading companies and believes that world-class support is critical to customer success. AWS Support also partners with a global list of customers that are building mission-critical applications on top of AWS services. Key job responsibilities The primary responsibilities of this role are to: Design, develop, and evaluate innovative ML models to solve diverse challenges and opportunities across industries Interact with customer directly to understand their business problems, and help them with defining and implementing scalable Generative AI solutions to solve them Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new solution A day in the life N/A About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, WA, Seattle
At Amazon we strive to be earth’s most customer centric company and we are one of the core science teams in CS org tasked with the mission to making this happen. Are you curious about new platforms and technologies and have a desire to deliver world-class customer service? Are you excited by the idea of owning a problem and innovating on behalf of customers? Can you deal with ambiguity and keep up with the pace of a company whose cycles are measured in weeks, not years? D2 Science & Analytics team is seeking an Applied Scientist to develop cutting-edge AI solutions leveraging LLM, ML and NLP techniques to understand and resolve customer issues and provide earth's best customer service. We are looking for individuals with a passion for learning, researching, and deploying production-ready science solutions in a highly collaborative environment. We like to ideate, experiment, iterate, optimize and scale quickly, while thoughtfully balancing speed and quality. If you have experience with two or more of the following, · Research and implementation of multi-turn task-oriented dialogue systems · Natural Language Understanding / Spoken Language Understanding · Information Retrieval, Question Answering, Semantic Representation · Zero-shot or Few-shot Learning · Crowdsourced NLU annotation best practices · Reinforcement Learning (preferably in an NLP application) · Computational Linguistics · Large scale text classification then we want to hear from you! Responsibilities: - Drive collaborative research and creative problem solving - Apply scientific concepts to real-world business problems - Create experiments and prototype implementations of new learning algorithms and prediction techniques - Collaborate with engineering teams to design and implement software solutions for science problems - Constructively critique peer research and mentor junior s and engineers A day in the life If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skill sets. If you’re passionate about this role and want to make an impact on a global scale, please apply!”Benefits Summary: “Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan”
US, WA, Bellevue
We are a part of Amazon Alexa Devices organization with the mission “delight customers through contextual and personalized proactive experiences that keep customers informed, engaged, and productive without cognitive burden”. We are developing an advanced system using Large Language Model (LLM) technologies to deliver engaging, intuitive, and adaptive content recommendations across all Amazon surfaces. We aim to facilitate seamless reasoning and customer experiences, surpassing the capabilities of previous machine learning models. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware speech assistant. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, shipping solutions via rapid experimentation and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist on the team, you will collaborate with other applied scientists and engineers to develop novel algorithms to enable timely, relevant and delightful recommendations and conversations. Your work will directly impact our customers in the form of products and services that make use of various machine learning, deep learning and language model technologies. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in the state of art.
CA, BC, Vancouver
Technology is giving the beauty industry a makeover! Are you interested to disrupt and redefine the way customers buy Beauty products online? Are you interested in using the latest advances in machine learning, computer vision, and big-data technologies to build online customer experiences for Beauty products that can equal or even surpass an in-store experience? Amazon Beauty is reinventing the shopping experience for all beauty customers across the largest selection of brands to become the most trusted beauty destination. Beauty is unique in retail with a diverse customer set along with products that are emotional, fun, and creative. This is your chance to get in on the ground floor to build something entirely new and transform an industry! To achieve our vision, we think big and tackle technological challenges every day. We need builders and disruptors who are not afraid to innovate! Our architecture and development processes support rapid experimentation, global deployments, and self-service capabilities that allow us to scale better. We build: - Amazon scale systems: All our technology needs to work at Amazon scale, serving millions of customers with millisecond-level latency. - Immersive customer experiences: We will create elevated and immersive customer experiences that using cutting-edge UI-technologies and user-centric design patterns. - Computer Vision and augmented reality (AR) experiences: We bring exciting experiences directly to the customer's mobile phone using their cameras and combinations of computer vision and AR. - Personalization using machine learning: We use latest advances in ML and GenAI to provide better-personalized shopping experiences. - Data & analytics pipelines: Amazon is data-driven, and a robust data backbone is necessary for our systems. We build on core AWS services such as EC2, S3, DynamoDB, SageMaker, StepFunctions, etc. - Multi-device support: We build for all traditional surfaces - desktop browsers, mobile browsers, and mobile applications. Key job responsibilities We are looking for talented and innovation-driven scientists who are passionate about leveraging the latest advances in Generative AI, Diffusion Models, Computer Vision (CV), Graphics, AR/VR, Virtual Try-On, Image Processing, and related technologies, to solve customer problems in the Beauty space. You will have an opportunity to revolutionize the customer shopping experience across the world's most extensive catalog of beauty products. You will be directly responsible for leading the ideation, design, prototyping, development, and launch of innovative scientific solutions that address customer problem in the beauty and shopping space. You will closely partner with product managers, UX designers, engineers, and the broader Amazon scientific community to pioneer state-of-the-art solutions to extremely challenging problems in machine learning and CV. You will be our organization's Tech Evangelist and represent our organization in key internal and external AI, ML, or Vision conferences. About the team Amazon Beauty Tech is a key and essential part of the Consumables organization and North America Stores. We are a passionate group of engineers, scientists, product managers, and designers who drive technological innovation to improve the customer shopping experience. We have a startup-like work culture where innovation is encouraged; we are never afraid to propose big ideas for fear of failing!
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of cutting-edge algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative Artificial Intelligence (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
US, WA, Bellevue
Why this job is awesome? - This is SUPER high-visibility work: Our mission is to provide consistent, accurate, and relevant delivery information to every single page on every Amazon-owned site. - MILLIONS of customers will be impacted by your contributions: The changes we make directly impact the customer experience on every Amazon site. This is a great position for someone who likes to leverage Machine learning technologies to solve the real customer problems, and also wants to see and measure their direct impact on customers. - We are a cross-functional team that owns the ENTIRE delivery experience for customers: From the business requirements to the technical systems that allow us to directly affect the on-site experience from a central service, business and technical team members are integrated so everyone is involved through the entire development process. - Do you want to join an innovative team of scientists and engineers who use machine learning and statistical inference techniques to deliver the best delivery experience on every Amazon-owned site? - Are you excited by the prospect of analyzing and modeling terabytes of data on the cloud and create state-of-art algorithms to solve real world problems? - Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? - Do you like to innovate and simplify? If yes, then you may be a great fit to join the Delivery Experience Machine Learning team. Major responsibilities: · Research and implement ML techniques to create scalable and effective models in Delivery Experience (DEX) systems · Solve business problems and identify business opportunities to provide the best delivery experience on all Amazon-owned sites. · Design and develop search ranking, recommendation and personalization models to improve Amazon customer experience · Design and develop machine learning framework to measure the long-term effect of all models in DEX systems · Analyze and understand large amounts of Amazon’s historical business data to detect patterns, to analyze trends and to identify correlations and causalities · Establishing scalable, efficient, automated processes for large scale data analysis and causal inference
US, WA, Seattle
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. Amazon's advertising portfolio helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! The Creative X org within Amazon Advertising aims to democratize access to high-quality creative assets, including copy, images and video, by building and productizing generative AI-driven tools for advertisers. We are investing in latent-diffusion and DiT models, LLMs, computer vision, reinforcement learning, and image + video synthesis. The solutions we develop will be deployed for use by self-service advertisers and agencies, as well as available to premium brands that advertise on Amazon. We are seeking an experienced science leader who is adept at a variety of skills; especially in generative AI, computer vision, and large language models that will accelerate our plans to generate high-quality creatives on behalf of advertisers. The right candidate will be an inventor at heart, provide science leadership, establish the right direction and vision, build team mechanisms, foster the spirit of collaboration and innovation within the org, and execute against a roadmap. The leader will provide both technical direction as well as manage a sizable team of scientists. They will need to be adept at recruiting, launching AI models into production, writing vision/direction documents, and building team mechanisms that will foster innovation and execution. Key job responsibilities * Drive end-to-end applied science projects that have a high degree of ambiguity, scale, complexity * Provide technical / science leadership related to computer vision, large language models, and generative image + video. * Research new and innovative machine learning approaches. * Recruit high performing Applied Scientists to the team and provide mentorship. * Establish team mechanisms, including team building, planning, and document reviews.
US, CA, Santa Clara
Amazon AI is looking for world class scientists to join its Amazon Q Builder CodeGen team. Amazon Q Builder CodeGen is an LLM-based AWS service that makes developers more productive by providing them code recommendations. Amazon Q Builder CodeGen leverages large language models, program analysis, responsible AI, robustness, efficient inference techniques and a lot more in building this technology. You will invent, implement, and deploy state of the art algorithms and systems, and be at the heart of a growing and exciting focus area for AWS. Candidate experiences of interest include but are not limited to: LLM, RAG, model training and inference, trustworthy AI, responsible AI, program analysis and program synthesis in general. The Amazon Web Services (AWS) Next Gen DevX (NGDE) team uses generative AI and foundation models to reimagine the experience of all builders on AWS. From the IDE to web-based tools and services, AI will help engineers work on large and small applications. We explore new technologies and find creative solutions. Curiosity and an explorative mindset can find a place here to impact the life of engineers around the world. If you are excited about this space and want to enlighten your peers with new capabilities, this is the team for you. About the team AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.