How computer vision will help Amazon customers shop online

Three papers at CVPR present complementary methods to improve product discovery.

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is the premier conference in the field of computer vision, and the Amazon papers accepted there this year range in topic from neural-architecture search to human-pose tracking to handwritten-text generation.

But retail sales are still at the heart of what Amazon does, and three of Amazon’s 10 CVPR papers report ways in which computer vision could help customers shop for clothes.

One paper describes a system that lets customers sharpen a product query by describing variations on a product image. The customer could, for instance, alter the image by typing or saying “I want it to have a light floral pattern”.

A second paper reports a system that suggests items to complement those the customer has already selected, based on features such as color, style, and texture.

The third paper reports a system that can synthesize an image of a model wearing clothes from different product pages, to demonstrate how they would work together as an ensemble. All three systems use neural networks.

Outfit composite.png
A query image (left) is combined with images from different product pages to produce a synthetic composite (right).

Visiolinguistic product discovery

Using text to refine an image that matches a product query poses three main challenges. The first is finding a way to fuse textual descriptions and image features into a single representation. The second is performing that fusion at different levels of resolution: the customer should be able to say something as abstract as “Something more formal” or as precise as “change the neck style”. And the third is training the network to preserve some image features while following customers' instructions to change others.

Yanbei Chen, a graduate student at Queen Mary University of London, who was an intern at Amazon when the work was done; Chen’s advisor, professor of visual computation Shaogang Gong; and Loris Bazzani, a senior computer vision scientist at Amazon, address these challenges with a neural network that’s trained on triples of inputs: a source image, a textual revision, and a target image that matches the revision.

Essentially, the three inputs pass through three different neural networks in parallel. But at three distinct points in the pipeline, the current representation of the source image is fused with the current representation of the text, and the fused representation is correlated with the current representation of the target image.

Because the lower levels of a neural network tend to represent lower-level features of the input (such as textures and colors) and higher levels higher-level features (such as sleeve length or tightness of fit), using this “hierarchical matching” objective to train the model ensures that it can handle textual modifications of different resolutions.

Visiolinguistic architecture.png
A new system that enables textual modification of product images fuses visual and linguistic information at three different levels of a neural network, to accommodate different degrees of textual granularity.
Apparel images from the Fashion IQ data set (Xiaoxiao Guo, et al.), used with permission under the Community Data License Agreement.

Each fusion of linguistic and visual representations is performed by a neural network with two components. One component uses a joint attention mechanism to identify visual features that should be the same in the source and target images. The other is a transformer network that uses self-attention to identify features that should change.

In tests, the researchers found that the new system could find a valid match to a textual modification 58% more frequently than its best-performing predecessor.

Complementary-item retrieval

In the past, researchers have developed systems that took outfit items as inputs and predicted their compatibility, but these systems were not optimized for large-scale data retrieval.

Amazon applied scientist Yen-Liang Lin and his colleagues wanted a system that would enable product discovery at scale, and they wanted it to take multiple inputs, so that a customer could, for instance, select shirt, pants, and jacket and receive a recommendation for shoes.

The network they devised takes as inputs any number of garment images, together with a vector indicating the category of each — such as shirt, pants, or jacket. It also takes the category vector of the item the customer seeks.

The images pass through a convolutional neural network that produces a vector representation of each. Each representation then passes through a set of “masks”, which attenuate some representation features and amplify others.

The masks are learned during training, and the resulting representations encode product information (such as color and style) relevant to only a subset of complementary items. That is, some of the representations that result from the masking — called subspace representations — will be relevant to shoes, others to handbags, others to hats, and so on.

Complementarity network.png
The architecture of the neural network used for complementary-item retrieval. From vectors representing the product categories of both input items and a target item, the network produces a set of weights (w1 – wk) that indicate which input-item features should be prioritized in selecting a complementary item.

In parallel, another network takes as input the category for each input image and the category of the target item. Its output is a set of weights, for prioritizing the subspace representations.

The network is trained using an evaluation criterion that operates on the entire outfit. Each training example includes an outfit, an item that goes well with that outfit, and a group of items that do not.

Once the network has been trained, it can produce a vector representation of every item in a catalogue. Finding the best complement for a particular outfit is then just a matter of looking up the corresponding vectors.

In experiments that used two standard measures in the literature on garment complementarity — fill-in-the-blank accuracy and compatibility area under the curve — the researchers’ system outperformed its three top predecessors, while enabling much more efficient item retrieval.

Virtual try-on network

Previously, researchers have trained machine learning systems to synthesize images of figures wearing clothes from different sources by using training data that featured the same garment photographed from different perspectives. But that kind of data is extremely labor intensive to produce.

Senior applied scientist Assaf Neuberger and his colleagues at Amazon’s Lab126 instead built a system that can be trained on single images, using generative adversarial networks, or GANs. A GAN has a component known as a discriminator, which, during training, learns to distinguish network-generated images from real images. Simultaneously, the generator learns to fool the discriminator.

The researchers’ system has three components. The first is the shape generation network, whose inputs are a query image, which will serve as the template for the final image, and any number of reference images, which depict clothes that will be transferred to the model from the query image.

Complementarity system.png
Amazon researchers’ “virtual try-on network” uses a three-step process to synthesize an image of a model wearing garments from different sources.

In preprocessing, established techniques segment all the input images and compute the query figure’s body model, which represents pose and body shape. The segments selected for inclusion in the final image pass to the shape generation network, which combines them with the body model and updates the query image’s shape representation. That shape representation passes to a second network, called the appearance generation network.

The architecture of the appearance generation network is much like that of the shape generation network, except that it encodes information about texture and color rather than shape. The representation it produces is combined with the shape representation to produce a photorealistic visualization of the query model wearing the reference garments.

The third component of the network fine-tunes the parameters of the appearance generation network to preserve features such as logos or distinctive patterns without compromising the silhouette of the model.

The outputs of the new system are more natural looking than those of previous systems. In the figure below, the first column is the query image, the second the reference image, the third the output of the best-performing previous system, and the fourth and fifth the outputs of the new system, without and with appearance refinement, respectively.

Logos.png
From left to right: query samples, reference samples, the previous system’s output, and the new system’s outputs, without and with the appearance refinement network.

Research areas

Related content

DE, Berlin
AWS AI is looking for passionate, talented, and inventive Applied Scientists with a strong machine learning background to help build industry-leading Conversational AI Systems. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Understanding (NLU), Dialog Systems including Generative AI with Large Language Models (LLMs) and Applied Machine Learning (ML). As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services that make use language technology. You will gain hands on experience with Amazon’s heterogeneous text, structured data sources, and large-scale computing resources to accelerate advances in language understanding. We are hiring in all areas of human language technology and code generation. We are open to hiring candidates to work out of one of the following locations: Berlin, DEU
US, MA, North Reading
Working at Amazon Robotics Are you inspired by invention? Is problem solving through teamwork in your DNA? Do you like the idea of seeing how your work impacts the bigger picture? Answer yes to any of these and you’ll fit right in here at Amazon Robotics. We are a smart, collaborative team of doers that work passionately to apply cutting-edge advances in robotics and software to solve real-world challenges that will transform our customers’ experiences in ways we can’t even imagine yet. We invent new improvements every day. We are Amazon Robotics and we will give you the tools and support you need to invent with us in ways that are rewarding, fulfilling and fun. Position Overview The Amazon Robotics (AR) Software Research and Science team builds and runs simulation experiments and delivers analyses that are central to understanding the performance of the entire AR system. This includes operational and software scaling characteristics, bottlenecks, and robustness to “chaos monkey” stresses -- we inform critical engineering and business decisions about Amazon’s approach to robotic fulfillment. We are seeking an enthusiastic Data Scientist to design and implement state-of-the-art solutions for never-before-solved problems. The DS will collaborate closely with other research and robotics experts to design and run experiments, research new algorithms, and find new ways to improve Amazon Robotics analytics to optimize the Customer experience. They will partner with technology and product leaders to solve business problems using scientific approaches. They will build new tools and invent business insights that surprise and delight our customers. They will work to quantify system performance at scale, and to expand the breadth and depth of our analysis to increase the ability of software components and warehouse processes. They will work to evolve our library of key performance indicators and construct experiments that efficiently root cause emergent behaviors. They will engage with software development teams and warehouse design engineers to drive the evolution of the AR system, as well as the simulation engine that supports our work. Inclusive Team Culture Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have 12 affinity groups (employee resource groups) with more than 87,000 employees across hundreds of chapters around the world. We have innovative benefit offerings and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which reminds team members to seek diverse perspectives, learn and be curious, and earn trust. Flexibility It isn’t about which hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We offer flexibility and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We care about your career growth too. Whether your goals are to explore new technologies, take on bigger opportunities, or get to the next level, we'll help you get there. Our business is growing fast and our people will grow with it. A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! We are open to hiring candidates to work out of one of the following locations: North Reading, MA, USA
LU, Luxembourg
Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz Pooling Req - JKU Linz We are open to hiring candidates to work out of one of the following locations: Luxembourg, LUX
US, WA, Bellevue
Are you excited about developing generative AI, reinforcement learning and foundation models? Are you looking for opportunities to build and deploy them on real problems at truly vast scale? At Amazon Fulfillment Technologies and Robotics, we are on a mission to build high-performance autonomous decision systems that perceive and act to further improve our world-class customer experience - at Amazon scale. We are looking for an Applied Scientist who will help us build next level simulation and optimization systems with the help of generative AI and LLMs. Together, we will be pushing beyond the state of the art in simulation and optimization of one of the most complex systems in the world: Amazon's Fulfillment Network. Key job responsibilities In this role, you will dive deep into our fulfillment network, understand complex processes and channel your insights to build large scale machine learning models (LLMs, graph neural nets and reinforcement learning) that will be able to understand and optimize the state and future of our buildings, network and orders. You will face a high level of research ambiguity and problems that require creative, ambitious, and inventive solutions. You will work with and in a team of applied scientists to solve cutting edge problems going beyond the published state of the art that will drive transformative change on a truly global scale. A day in the life In this role, you will dive deep into our fulfillment network, understand complex processes and channel your insights to build large scale machine learning models (LLMs, graph neural nets and reinforcement learning) that will be able to understand and optimize the state and future of our buildings, network and orders. You will face a high level of research ambiguity and problems that require creative, ambitious, and inventive solutions. You will work with and in a team of applied scientists to solve cutting edge problems going beyond the published state of the art that will drive transformative change on a truly global scale. A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! About the team Amazon Fulfillment Technologies (AFT) powers Amazon’s global fulfillment network. We invent and deliver software, hardware, and data science solutions that orchestrate processes, robots, machines, and people. We harmonize the physical and virtual world so Amazon customers can get what they want, when they want it. The AFT AI team has deep expertise developing cutting edge AI solutions at scale and successfully applying them to business problems in the Amazon Fulfillment Network. These solutions typically utilize machine learning and computer vision techniques, applied to text, sequences of events, images or video from existing or new hardware. We influence each stage of innovation from inception to deployment, developing a research plan, creating and testing prototype solutions, and shepherding the production versions to launch. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
CN, Shanghai
亚马逊云科技上海人工智能实验室OpenSearch 研发团队正在招募应用科学实习生-多模态检索与生成方向实习生。OpenSearch是一个开源的搜索和数据分析套件, 它旨在为数据密集型应用构建解决方案,内置高性能、开发者友好的工具,并集成了强大的机器学习、数据处理功能,可以为客户提供灵活的数据探索、丰富和可视化功能,帮助客户从复杂的数据中发现有价值的信息。OpenSearch是现有AWS托管服务(AWS OpenSearch)的基础,OpenSearch核心团队负责维护OpenSearch代码库,他们的目标是使OpenSearch安全、高效、可扩展、可扩展并永远开源。 点击下方链接查看申请手册获得更多信息: https://amazonexteu.qualtrics.com/CP/File.php?F=F_55YI0e7rNdeoB6e Key job responsibilities 在这个实习期间,你将有机会: 1. 研究最新的搜索相关性人工智能算法。 2. 探索大模型技术在数据分析与可视化上的应用。 3. 了解主流搜索引擎Lucene的原理和应用。深入了解前沿自然语言处理技术和底层索引性能调优的结合。 4. 学习亚马逊云上的各种云服务。 5. 参与产品需求讨论,提出技术实现方案。 6. 与国内外杰出的开发团队紧密合作,学习代码开发和审查的流程。 We are open to hiring candidates to work out of one of the following locations: Shanghai, CHN
CN, Shanghai
亚马逊云科技上海人工智能实验室OpenSearch 研发团队正在招募应用科学家实习,方向是服务器端开发。OpenSearch是一个开源的搜索和数据分析套件, 它旨在为数据密集型应用构建解决方案,内置高性能、开发者友好的工具,并集成了强大的机器学习、数据处理功能,可以为客户提供灵活的数据探索、丰富和可视化功能,帮助客户从复杂的数据中发现有价值的信息。OpenSearch是现有AWS托管服务(AWS OpenSearch)的基础,OpenSearch核心团队负责维护OpenSearch代码库,他们的目标是使OpenSearch安全、高效、可扩展、可扩展并永远开源。 点击下方链接查看申请手册获得更多信息: https://amazonexteu.qualtrics.com/CP/File.php?F=F_55YI0e7rNdeoB6e Key job responsibilities 在这个实习期间,你将有机会: 1. 使用Java/Kotlin等服务器端技术编写高质量,高性能,安全,可维护和可测试的代码。 2. 了解主流搜索引擎Lucene的原理和应用。 3. 学习亚马逊云上的各种云服务。 4. 参与产品需求讨论,提出技术实现方案。 5. 与国内外杰出的开发团队紧密合作,学习代码开发和审查的流程。 6. 应用先进的人工智能和机器学习技术提升用户体验。 We are open to hiring candidates to work out of one of the following locations: Shanghai, CHN
CN, Shanghai
亚马逊云科技上海人工智能实验室OpenSearch 研发团队正在招募应用科学家实习,方向是服务器端开发。OpenSearch是一个开源的搜索和数据分析套件, 它旨在为数据密集型应用构建解决方案,内置高性能、开发者友好的工具,并集成了强大的机器学习、数据处理功能,可以为客户提供灵活的数据探索、丰富和可视化功能,帮助客户从复杂的数据中发现有价值的信息。OpenSearch是现有AWS托管服务(AWS OpenSearch)的基础,OpenSearch核心团队负责维护OpenSearch代码库,他们的目标是使OpenSearch安全、高效、可扩展、可扩展并永远开源。 点击下方链接查看申请手册获得更多信息: https://amazonexteu.qualtrics.com/CP/File.php?F=F_55YI0e7rNdeoB6e Key job responsibilities 在这个实习期间,你将有机会: • 使用HTML、CSS和TypeScript/Javascript等前端技术开发用户界面。 • 学习使用Node.js 为用户界面提供服务接口。 • 了解并实践工业级前端产品的开发/部署/安全审查/发布流程。 • 了解并实践前端框架React的使用。 • 参与产品需求讨论,提出技术实现方案。 • 与国内外杰出的开发团队紧密合作,学习代码开发和审查的流程。 • 编写高质量,高性能,安全,可维护和可测试的代码。 • 应用先进的人工智能和机器学习技术提升用户体验。 We are open to hiring candidates to work out of one of the following locations: Shanghai, CHN
US, WA, Seattle
Amazon is one of the most popular sites in the US. Our product search engine, one of the most heavily used services in the world, indexes billions of products and serves hundreds of millions of customers world-wide. Our team leads the science and analytics efforts for the search page and we own multiple aspects of understanding how we can measure customer satisfaction with our experiences. This include building science based insights and novel metrics to define and track customer focused aspects. We are working on a new measurement framework to better quantify and qualify the quality of the search customer experience and are looking for a Senior Applied Scientist to lead the development and implementation of different signals for this framework and tackle new and uncharted territories for search engines using LLMs. Key job responsibilities We are looking for an experienced Sr. Applied Scientist to lead LLM based signals development and data analytics and drive critical product decisions for Amazon Search. In a fast-paced and ambiguous environment, you will perform multiple large, complex, and business critical analyses that will inform product design and business priorities. You will design and build AI based science solutions to allow routine inspection and deep business understanding as the search customer experience is being transformed. Keeping a department-wide view, you will focus on the highest priorities and constantly look for scale and automation, while making technical trade-offs between short term and long-term needs. With your drive to deliver results, you will quickly analyze data and understand the current business challenges to assess the feasibility of different science projects as well as help shape the analytics roadmap of the Science and Analytics team for Search CX. Your desire to learn and be curious will help us look around corners for improvement opportunities and more efficient metrics development. In this role, you will partner with data engineers, business intelligence engineers, product managers, software engineers, economists, and other scientists. A day in the life You are have expertise in Machine learning and statistical models. You are comfortable with a higher degree of ambiguity, knows when and how to be scrappy, build quick prototypes and proofs of concepts, innate ability to see around corners and know what is coming, define a long-term science vision, and relish the idea of solving problems that haven’t been solved at scale. As part of our journey to learn about our data, some opportunities may be a dead end and you will balancing unknowns with delivering results for our customers. Along the way, you’ll learn a ton, have fun and make a positive impact at scale. About the team Joining this team, you’ll experience the benefits of working in a dynamic, entrepreneurial environment, while leveraging the resources of Amazon.com (AMZN), Earth's most customer-centric company and one of the world's leading internet companies. We provide a highly customer-centric, and team-oriented environment. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, MA, Westborough
The Research Team at Amazon Robotics is seeking a passionate Applied Scientist, with a strong track record of industrial research, innovation leadership, and technology transfer, with a focus on ML Applications. At Amazon Robotics, we apply cutting edge advancements in robotics, software development, Big Data, ML and AI to solve real-world challenges that will transform our customers’ experiences in ways we can’t even imagine yet. We operate hundreds of buildings that employ hundreds of thousands of robots teaming up to perform sophisticated, large-scale missions. There are a lot of exciting opportunities ahead of us that can be unlocked by scientific research. Amazon Robotics has a dedicated focus on research and development to continuously explore new opportunities to extend its product lines into new areas. As you could imagine, data is at the heart of our innovation. This role will be participating in creating the ML and AI roadmap, leading science initiatives, and shipping ML products. Key job responsibilities You will be responsible for: - Thinking Big and ideating with Data Science team, other Science teams, and stakeholders across the organization to co-create the ML roadmap. - Collaborating with customers and cross-functional stakeholder teams to help the team identify, disambiguate, and define key problems. - Independently innovating, creating, and iterating ML solutions for given business problems. Especially, using techniques such as Computer Vision, Deep Learning, Causal Inference, etc. - Collaborating with other Science, Tech, Ops, and Business leaders to ship and iterate ML products. - Promoting best practices and mentoring junior team members on problem solving and communication. - Leading state-of-the-art research work and pursuing internal/external scientific publications. A day in the life You will co-create ML/AI roadmap. You will help team identify business opportunities. You will prototype, iterate ML/AI solutions. You will drive communication with stakeholders to implement and ship ML solutions. e.g., computer vision, deep learning, explainable AI, causal inference, reinforcement learning, etc. You will mentor and guide junior team members in delivering projects and business impact. You will work with the team and lead scientific publications. Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! About the team You will join a scientifically and demographically diverse research/science team. Our multi-disciplinary team includes scientists with backgrounds in planning/scheduling, grasping/manipulation, machine learning, statistical analysis, and operations research. We develop novel algorithms and machine learning models and apply them to real-word robotic warehouses, including: - Planning/coordinating the paths of thousands of robtos - Dynamic task allocation to thousands of robots. - Learning how to manipulate products sold by Amazon. - Co-designing an optimizing robotic logistics processes. Our team also serves as a hub to foster innovation and support scientists across Amazon Robotics. In addition, we coordinate research engagements with academia. We are open to hiring candidates to work out of one of the following locations: Westborough, MA, USA
US, CA, Sunnyvale
Amazon is looking for a passionate, talented, and inventive Applied Scientists with a strong machine learning background to help build industry-leading Speech and Language technology. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Automatic Speech Recognition (ASR), Machine Translation (MT), Natural Language Understanding (NLU), Machine Learning (ML) and Computer Vision (CV). As part of our AI team in Amazon AGI, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services that make use of speech and language technology. You will gain hands on experience with Amazon’s heterogeneous speech, text, and structured data sources, and large-scale computing resources to accelerate advances in spoken language understanding. We are hiring in all areas of human language technology: ASR, MT, NLU, text-to-speech (TTS), and Dialog Management, in addition to Computer Vision. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA