Adversarial training produces synthetic data for machine learning

One-sided label smoothing and techniques to prevent mode collapse improve performance of sentiment analysis models.

Sentiment analysis is the attempt, computationally, to determine from someone’s words how he or she feels about something. It has a host of applications, in market research, media analysis, customer service, and product recommendation, among other things.

Sentiment classifiers are typically machine learning systems, and any given application of sentiment analysis may suffer from a lack of annotated data for training purposes. In a paper I’m presenting at the International Conference on Acoustics, Speech, and Signal Processing, I describe my efforts to build a system that will generate synthetic training data for applications of sentiment analysis where real data is scarce.

Although the results I report are modest — augmenting training sets with my synthetic data improved sentiment classifiers’ accuracy by around 2% — they do demonstrate the viability of the approach. And the paper includes an analysis of the data generated by my system that could point toward techniques for improving its quality.

One of my first design decisions was not to try to generate actual text. Instead, the generator produces embeddings, a uniform way of representing text strings that is ubiquitous in natural-language-understanding applications.

Embeddings represent texts as points — vectors — in a high-dimensional space, such that texts with similar meanings are grouped together. Most common embeddings are based on analyses of huge bodies of text, in which words are judged to have similar meanings if they commonly co-occur with the same groups of other words.

tsne.png._CB469614397_.png
A 2-D projection of the high-dimensional embeddings used to train a sentiment classifier, indicating that the synthetic data (orange) does not yet capture the diversity of the natural data (blue).

The embeddings of any number of words can be averaged to produce a new point in the embedding space, so the length of the embedding vector is fixed, no matter the length of the corresponding text string.

To train my embedding generator, I used a generative adversarial network (GAN), an instance of an increasingly popular machine learning technique called adversarial training. The standard GAN consists of two neural networks, a generator and a discriminator. The discriminator is trained to distinguish between real data and the fake data produced by the generator; at the same time, the generator is trained to fool the discriminator. Hence the adversarial relationship.

In my case, the inputs to the discriminator, both real and fake, had two components: an embedding vector and a one-hot vector. A one-hot vector is a string of zeroes with, somewhere among them, a single one. The location of the one corresponds to a particular property — in this case, the sentiment of the text that the embedding vector (supposedly) represents.

My goal was to train the generator to produce synthetic data that could augment the training data for another neural network, a sentiment classifier. The addition of data produced by this type of plain-vanilla GAN, however, did not improve the sentiment classifier’s performance.

So I made several modifications to the system. The first was to equip it with a simple sentiment classifier, trained only on the real data that the generator was intended to augment. During training, the generator tried not only to fool the discriminator but also to produce one-hot vectors that matched the outputs of the simple classifier. This ensured greater consistency between the semantic content of the synthetic embeddings and the associated sentiments.

The other modifications addressed a problem called mode collapse, common in GANs. If the generator stumbles across a type of output that will reliably fool the discriminator, it has an incentive to restrict itself to outputs of that type. But this leads to very homogenous outputs, and homogenous data is not useful for training neural networks.

In my experiments, I was using two types of data, for both training and testing. One data set consisted of product reviews, the other of movie reviews. The training sets were small, to mimic the case in which training data is scarce.

To combat mode collapse, I first trained the GAN on a much larger set of texts, labeled according to sentiment — a set of Twitter posts, commonly used as a benchmark in the field. The tweets were shorter than the reviews and covered a wider range of topics, but they primed the generator to produce more diverse embeddings and the discriminator to recognize more subtle distinctions. After training the GAN on the tweets, I then fine-tuned it on review data.

As is typical in machine learning, I retrained the GAN several times on the same training data, until further training no longer improved its performance. With each pass through the training data, I added a different, random noise pattern to each training example; the data looked somewhat different each time around. That discouraged the generator from keying in on a single trick for fooling the discriminator.

Finally, I also used a technique called one-sided label smoothing. During training, instead of labeling the inputs to the GAN as 0 or 1 — fake or real — I label them as 0 or .9 — fake or 90% likely to be real. If the discriminator is never more than 90% confident in its classification of real inputs, the generator will keep exploring new options, in an attempt to wring out that extra 10% of certainty.

With these modifications, data produced by the generator led to slight improvements in the sentiment classifier’s performance, 1.6% on the movie reviews and 1.7% on the product reviews.

After each experiment, I used a technique called t-SNE (t-distributed stochastic neighbor embedding) to project the high-dimensional embeddings into a two-dimensional space. As can be seen in the figure above, the fake data never exhibited as much diversity as the real data.

However, after my modifications to the GAN, the data diversity did improve, which suggests a correlation between the diversity of the synthetic data and the performance of the sentiment classifier. GANs were first introduced in 2014, and some more recent GAN architectures appear to do a better job at preventing mode collapse than the architecture I used. In future work, my colleagues and I will explore some of those architectures, as well as experimenting with other techniques for diversifying the generator’s outputs.

Related content

US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.