Computer vision

A user-controllable framework that unifies style transfer methods

A diversity of outputs ensures that style transfer model can satisfy any user’s tastes.

February 14, 2023

4 min read

Neural style transfer is the use of neural networks to transfer the style of one input image — say, a famous painting — to another input image — say, a backyard photograph.

Researchers have proposed a number of different techniques for doing style transfer, but which one works best? There’s no right answer to that question; viewers’ opinions differ. In the results reported in prior papers on style transfer, the most-preferred methods rarely receive more than two-thirds of reviewers’ votes, while the least-preferred methods rarely receive less than 5%.

By plotting nonlinear trajectories through a GAN’s latent space, the method enables certain image attributes to vary while others are held fixed.

In a paper we presented at this year’s meeting of the Association for the Advancement of Artificial Intelligence (AAAI), my colleagues and I describe a new style transfer model that can output multiple options, controlled by a model parameter that the user selects.

We show that most prior approaches to style transfer can be rewritten in a standardized form that we call the assign-and-mix model. The model’s “assign” step involves an assignment matrix, which maps features of one input image to features of the other. In the paper, we show that the differences between style transfer techniques generally come down to the entropy of the assignment matrix, or the diversity of the matrix’s values.

Swing candidates.png — Top: a content image (a swing) and a style image (van Gogh's *Starry Night*); bottom: four candidate images generated by the Amazon researchers' new style transfer model.

Finally, we show that, given a user-specified setting of the input parameter, an algorithm called Sinkhorn-Knopp can efficiently calculate the associated assignment matrix, enabling a diversity of outputs from the same style transfer model.

In a series of experiments, we compared our approach to its predecessors. We found that, according to standard metrics, our method did a better job of preserving the content of the content input and the style of the style input, and it produced more diverse outputs. We also conducted a study with 10 human evaluators and found that — at a particular setting of our diversity parameter — subjects preferred images generated by our method to those produced by other methods.

Assign and mix

In style transfer, the first step is to pass both the content example and the style example to the same visual encoder, which is typically pretrained on a broad object recognition task. The encoder produces a representation of each image, in which each image region has an associated feature vector.

Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

The assignment for a particular point in the new image may be a single vector from the style encoding, or it may be a weighted combination of vectors. In the first case, the assignment matrix is binary: every matrix entry is either 0 or 1. This is a minimal-entropy assignment.

By contrast, if every point in the new content image consists of a weighted combination of every vector in the style image, the assignment matrix has higher entropy. There are existing style transfer approaches with binary assignment matrices, and there are existing approaches with high-entropy matrices, and our method can approximate both.

After the assignment step, we proceed to the mixing phase, which corresponds to approach (2), above. In this phase, we step through the encoding of the new, synthetic image, and for each image region, we measure the distance between its encoding and that of the original content example. Then we mix in the feature vectors from the original content encoding, in proportion to the degree of divergence. This ensures that the new image preserves the content of the original.

The proposed approach. Epsilon is the parameter used to control the range of entropy values for the assignment matrix; *f_s→c* is the reconstruction of the content image, using features of the style image, produced by the assignment matrix.

The computational bottleneck in this process is the creation of multiple assignment matrices, with different degrees of entropy. But we show in our paper that the Sinkhorn-Knopp algorithm, which enables matrices to be rewritten in a standardized form that enables efficient solution, can be applied to the problem of constructing assignment matrices.

In the paper, we rewrite three prior style transfer methods using the assign-and-mix format. We selected those methods because their assignment matrices cover the full spectrum of entropies. Our method should be able to approximate the outputs of any style transfer models whose assignment matrix entropies fall within a more limited range as well.

About the Author

Yue (Rex) Wu

Yue Wu is a senior applied scientist in the Alexa AI organization.

A user-controllable framework that unifies style transfer methods

A diversity of outputs ensures that style transfer model can satisfy any user’s tastes.

Assign and mix

Related content

Work with us