Converting text to images for product discovery

New AI model enables iterative refinement of results and better color matching.

Generative adversarial networks (GANs), which were first introduced in 2014, have proven remarkably successful at generating synthetic images. A GAN consists of two networks, one that tries to produce convincing fakes, and one that tries to distinguish fakes from real examples. The two networks are trained together, and the competition between them can converge quickly on a useful generative model.

In a paper that was accepted to IEEE’s Winter Conference on Applications of Computer Vision, we describe a new use of GANs to generate examples of clothing that match textual product descriptions. The idea is that a shopper could use a visual guide to refine a text query until it reliably retrieved the product for which she or he was looking.

So, for instance, a shopper could search on “women’s black pants”, then add the word “petite”, then the word “capri”, and with each new word, the images on-screen would adjust accordingly. The ability to retain old visual features while adding new ones is one of the novelties of our system. The other is a color model that yields images whose colors better match the textual inputs.

ReStGAN.gif
The output of our image generator (bottom) and that of the traditional StackGAN model. Ours better preserves existing visual features when new ones are added and renders color more accurately.
Stacy Reilly

We tested our model’s performance against those of four different baseline systems that use a popular text-to-image GAN called StackGAN. We used two metrics that are common in studies of image-generating GANs, inception score and Fréchet inception distance. On different image attributes, our model’s inception scores were between 22% and 100% higher than those of the best-performing baselines, while its Fréchet inception distance was 81% lower. (Lower is better.)

Our model is in fact a modification of StackGAN. StackGAN simplifies the problem of synthesizing an image by splitting it into two parts: first, it generates a low-res image directly from text; second, it upsamples that image to produce a higher-res version, with added texture and more-natural coloration. Each of these procedures has its own GAN, and stacking the two GANs gives the model its name.

We add another component to this model: a long short-term memory, or LSTM. LSTMs are neural networks that process sequential inputs in order. The output corresponding to a given input factors in both the inputs and the outputs that preceded it. Training an LSTM together with a GAN in an adversarial setting enables our network to refine images as successive words are added to the text inputs. Because an LSTM is an example of a recurrent neural network, we call our system ReStGAN, for recurrent StackGAN.

Synthesizing an image from a text description is a difficult challenge, and to make it more manageable, we restricted ourselves to three similar product classes: pants, jeans, and shorts. We also standardized the images used to train our model, removing backgrounds and cropping and re-sizing images so that they were alike in shape and scale.

Auxiliaries

The training of our model was largely unsupervised, meaning that the training data consisted mainly of product titles and standardized images, which didn’t require any additional human annotation. But to increase the stability of the system, we use an auxiliary classifier to classify images generated by our model according to three properties: apparel type (pants, jeans, or shorts), color, and whether they depict men’s, women’s, or unisex clothing. The auxiliary classifier provides additional feedback during training and helps the model handle the complexity introduced by sequential inputs.

In most AI systems that process text — including ours — textual inputs are embedded, or mapped to points in a representational space such that words with similar meanings tend to cluster together. Traditional word embeddings group color terms together, but not in a way that matches human perceptual experience. The way we encode color is another innovation of our work.

Color accuracy.png
Six different images, all generated from the text string “women’s black pants”. The three on the left were produced by our model, the three on the right by a standard StackGAN model.
Shiv Surya

We cluster or group colors in a representational space called LAB, which was explicitly designed so that the distance between points corresponds to perceived color differences. Using that clustering, we create a lookup table that maps visually similar colors to the same features of the textual descriptions. On one hand, this mapping ensures that the images we generate will yield slightly different shades of the same color, rather than completely different colors. It also makes the training of the model more manageable by reducing the number of color categories that it needs to learn.

Inception score — one of the two metrics we used in our experiments — evaluates images according to two criteria: recognizability and diversity. The recognizability score is based on the confidence of an existing computer vision model in classifying the image. We used three different inception scores, based on the three characteristics a classifier is trained to identify: type, color, and gender.

On the type and gender inception scores, ReStGAN yielded 22% and 27% improvements, respectively, over the scores of the best-performing StackGAN models. But on the color inception score, the improvement was 100%, indicating the utility of our color model.

Research areas
About the Author
Arijit Biswas is a senior applied scientist in Amazon Alexa AI's Natural Understanding group.
About the Author
Shiv Surya is an applied scientist in the Amazon Performance Advertising Technology group.

Related content

US, WA, Seattle
Job summaryAmazon brings buyers and sellers together. Our retail customers depend on us to give them access to every product at the best possible price. Our sellers depend on us to give them a platform to launch their business into every home and marketplace. Making this happen is the mission of every engineer in Amazon's North America Consumer (NAC) organization.To this end, the Science team is tasked with:· Organizing available data sources, and creating detailed dictionaries of data that can be used in future analyses.· Partnering with product teams in evaluating the financial and operational impact of new product offerings.· Conducting research into optimization and machine learning algorithms which can be applied to solve business problems.· Partnering with other scientists in evaluating algorithms and suggestions from a business view point.· Carrying out independent data-backed initiatives that can be leveraged later on in the fields of network organization, costing and financial modeling of processes.In order to execute the above mandate we are on the look out for smart and qualified Data Scientists who will own projects in partnership with product and research teams as well as operate autonomously on independent initiatives that are expected to unlock benefits in the future. A past background in Statistics is necessary, along with advanced proficiency in languages such as Python and R.Key job responsibilitiesAs a Data Scientist, you are able to use a range of advanced analytical methodologies to solve challenging business problems when the solution is unclear. You have a combination of business acumen, broad knowledge of statistics, deep understanding of ML algorithms, and an analytical mindset. You thrive in a collaborative environment, and are passionate about learning. Our team utilizes a variety of AWS tools such as Redshift, Sagemaker, Lambda, S3, and EC2 with a variety of skillsets in Linear and Discrete Optimization, ML, NLP, Forecasting, Probabilistic ML and Causal ML. You will bring knowledge in many of these domains along with your own specialties and skillsets.
US, CA, Pasadena
Job summaryThe Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is hiring a Quantum Research Scientist to join a multi-disciplinary, fast-paced team of theoretical and experimental physicists, materials scientists, and hardware and software engineers pushing the forefront of quantum computing. The candidate should demonstrate a thorough knowledge of experimental measurement techniques as well as quantum mechanics theory.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.Key job responsibilities* Contribute to fast-paced and agile research to help close the many orders of magnitude gap in gate error rates required for fault tolerant quantum computation* Design and perform experiments to characterize quantum devices in close collaboration with software and engineering teams* Develop models to understand and improve device performance* Effectively document results and communicate to a broad audience* Create robust software for implementation, automation, and analysis of measurements* Specify technical requirements in a cross-team collaboration using analytical arguments derived from physics theoryA day in the life* Analyze experimental data* Develop software to test and run new experiments on existing devices; collaborate with software engineers to achieve high code standard* Debug test setups to achieve high-quality data* Present results and cross-collaborate with others’ work* Perform code review for a colleague’s merge request
US, CA, Pasadena
Job summaryThe Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is looking to hire a Quantum Research Scientist in the Test and Measurement group. You will join a multi-disciplinary team of theoretical and experimental physicists, materials scientists, and hardware and software engineers working at the forefront of quantum computing. You should have a deep and broad knowledge of experimental measurement techniques.Candidates with a track record of original scientific contributions will be preferred. We are looking for candidates with strong engineering principles, resourcefulness and a bias for action, superior problem solving, and excellent communication skills. Working effectively within a team environment is essential. As a research scientist you will be expected to work on new ideas and stay abreast of the field of experimental quantum computation.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.Key job responsibilitiesIn this role, you will drive improvements in qubit performance by characterizing the impact of environmental and material noise on qubit dynamics. This will require designing experiments to assess the role of specific noise sources, ensuring the collection of statistically significant data, analyzing the results, and preparing clear summaries for the team. Finally, you will work with hardware engineers, material scientists, and circuit designers to implement changes which mitigate the impact of the most significant noise sources.
US, MA, Cambridge
Job summaryThe Alexa Artificial Intelligence (AI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background, to help build industry-leading Speech and Language technology.Key job responsibilitiesAs an Applied Scientist with the Alexa AI team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art in spoken language understanding. Your work will directly impact our customers in the form of products and services that make use of speech and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in spoken language understanding.About the teamThe Alexa AI team has a mission to push the envelope in Natural Language Understanding (NLU). Specifically, we focus on incremental learning, continual learning and fairness, in order to provide the best-possible experience for our customers.
US, WA, Seattle
Job summaryThe Alexa Artificial Intelligence (AI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading Speech and Language technology. Our mission is to push the envelope in Natural Language Understanding (NLU), Audio Signal Processing, text-to-speech (TTS), and Dialog Management, in order to provide the best-possible experience for our customers.Key job responsibilitiesAs an Applied Scientist, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art in spoken language understanding. Your work will directly impact our customers in the form of products and services that make use of speech and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in spoken language understanding.
US, MA, Cambridge
Job summaryWant to transform the way people enjoy music, video, and radio? Come join the team that made Amazon Music, Spotify, Hulu, Netflix, Pandora, available to Alexa customers. We are innovating the way our customers interact with entertainment in the living room, on the go, and in the car. We are at the epicenter of the future of entertainment.Alexa Entertainment is looking for an Applied Scientist as we build a team of talented and passionate scientists for ASR (automatic speech recognition) and NLU (natural language understanding). As a Research Scientist, you will participate in the design, development, and evaluation of models and ML (machine learning) technology so that customers have the magical experience of entertainment via Alexa. You will help lay the foundation to move from directed interactions to learned behaviors that enable Alexa to proactively take action on behalf of the customer. And, you will have the satisfaction of working on a product your friends and family can relate to, and want to use every day. Like the world of smart phones less than 10 years ago, this is a rare opportunity to have a giant impact on the way people live.You will be part of a team delivering features that are highly anticipated by media and well received by our customers.
US, VA, Arlington
Job summaryThe People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.We are looking for a research scientist with expertise in applying causal inference, experimental design, or causal machine learning techniques to topics in labor, personnel, education, health, public, or behavioral science. We are particularly interested in candidates with experience applying these skills to strategic problems with significant business and/or social policy impact.Candidates will work with economists, scientists and engineers to estimate and validate their models on large scale data, and will help business partners turn the results of their analysis into policies, programs, and actions that have a major impact on Amazon’s business and its workforce. We are looking for creative thinkers who can combine a strong scientific toolbox with a desire to learn from others, and who know how to execute and deliver on big ideas.You will conduct, direct, and coordinate all phases of research projects, including defining key research questions, developing models, designing and implementing appropriate data collection methods, executing analysis plans, and communicating results. You will earn trust from our business partners by collaborating with them to define key research questions, communicate scientific approaches and findings, listen to and incorporate their feedback, and deliver successful solutions.
US, WA, Seattle
Job summaryWant to work on one of Amazon’s most ambitious efforts? Time and Attendance (TAA) is leading the charge to build products that support our global workforce of passionate Amazonians!At Amazon we take seriously our commitment to pay employees accurately and on-time. While each line of business is responsible for knowing and driving down pay defects for their own employees, the centralized Perfect Pay team manages data stores and analytics, program oversight, cross-org technical and non-technical projects, and drives accountability across leaders.TAA is looking for a strong Data Scientist, Machine Learning for the Perfect Pay program to drive and own design and development of Machine Learning products to detect anomalies and risks to prevent pay errors before they happen. You will lead the team in designing anomaly and risk detection models to identify and prevent defects for Amazonians in their HR and pay data. You will work on all aspects of the product development life cycle, with a focus on the hardest problems around building scalable machine learning models with native AWS solutions that leverage tools like SageMaker, Glue, and Redshift to grow with Amazon. You will build high quality, scalable models which create immediate and impactful value for our Amazonians worldwide, while also ensuring that our products are evolving in a sustainable long-term direction.Who are we looking for to join our team?We are looking for a Data Science, machine learning specialist to build new and innovative systems that can predict pay defects before they happen and drive operational excellence across businesses. The HR systems and tools have never been analyzed together in context. The opportunity to automate improving the Amazonian experience using ML and AI span from improving the pay experience, to building risk prevention, to automatically triggering internal HR systems to correct anomalies. Getting the opportunity to cross-functionally explore data sets which support 1.4 million Amazonians for the first time is a unique opportunity. The ideal candidate will be experienced in innovating in domains without current ML/AI products. Domain experience in time and attendance and payroll, or Amazon operations field experience is useful but not required.Key job responsibilitiesMain responsibilities• Use statistical and machine learning techniques to create scalable anomaly detection and risk management systems• Analyzing and understanding large amounts of Amazon’s historical HR data for specific instances of defects or broader risk trends• Design, development, and evaluation of highly innovative models for anomaly detection and risk assessment• Working closely with data engineering team to scope scalable data architecture solutions that support your ML models• Working closely with software engineering teams to drive real-time model implementations and new feature creations• Working closely with operations staff to optimize defect prevention and model implementations• Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation• Research and implement novel machine learning and statistical approaches• Working closely with HR Business Partners to understand their use-cases for anomaly and risk detection as well as to define the data needed to carry out the work
US, WA, Bellevue
Job summaryAmazon relies on the latest technology to deliver millions of packages every day to our customers – on time, at low cost, and safely. The Middle Mile Planning Research & Optimization Science team builds complex science models and solutions that work across our vendors, warehouses and carriers to optimize both time & cost of getting the packages delivered. Our models are state-of-the-art, make business decisions impacting billions of dollars a year, and improve ordering and delivery experience for millions of online shoppers. That said, this remains a fast growing business and our journey has only started. Our mission is to build the most efficient and transportation network on the planet, using our science and technology as our biggest advantage. We aim to leverage cutting edge technologies in machine learning and operations research to grow our businesses.As a Machine Learning Applied Scientist, you’ll design, model, develop and implement state-of-the-art machine learning models and solutions used by Amazon worldwide. You will need to collaborate effectively with internal stakeholders and cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards. As part of your role you will regularly interact with software engineering teams and business leadership. The focus of this role is to research, develop, and deploy predictive models that will inform and support our business, primarily in the areas of carrier safety.Tasks/ Responsibilities:· Lead and partner with the engineering and operations teams to drive modeling and technical design for complex business problems.· Develop accurate and scalable machine learning models and methods to solve our hardest predictive problems in transportation.· Lead complex modeling analyses to aid management in making key business decisions and set new policies.
US, NJ, Newark
Job summaryGood storytelling starts with great listening. At Audible, that means each role and every project has our audience in mind. Because the same people who design, develop, and deploy our products also happen to use them. To us, that speaks volumes.ABOUT THIS ROLEAudible is searching for an exceptional data scientist to join our economics team and drive the development of models at the intersection of machine learning and econometrics at scale. The Audible economics organization works across the business to measure and maximize the value Audible delivers to customers, creators, and communities globally. In this role, there will be a focus on partnering with our content and product teams to build a groundbreaking catalog of audiobooks and spoken-word entertainment, develop innovative tools to generate value for creators, and optimize content distribution and monetization.We are looking for someone experienced in building ML models at scale for complex prediction and optimization problems, who also has a background (or burgeoning interest!) in causal inference or interpretable machine learning. In addition to working with our staff economists and data scientists, you will also collaborate closely with scientists across Audible and partner teams at Amazon on problems pertinent to subscription businesses and the production of original media content.As a Data Scientist, you will...· Work with leadership in our content and product organizations to identify key analytical problems and opportunities – your work is expected to be a key input to our future content strategy.· Develop and maintain scalable, innovative data science and machine learning models that deliver actionable insights and results.· Collaborate with other data scientists, economists, and analysts at Audible to build data-driven solutions to key business problems.