GANs that work well empirically

Ömer Sümer

Department of Computer Science

University of Tübingen, Germany

ICERM, Generative Models Discussion Group

Ömer Sümer GANs that work well empirically 28 September 2019 1 / 37

Outline

Improved Techniques for Training GANs

DC-GAN and Improved Techniques

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein GAN

(1) Progressive GAN

(2) Style GAN

Paired and Unpaired Image-to-Image Translation

Paired Translation

Unpaired Translation

More Recent Examples

Conclusion

Ömer Sümer GANs that work well empirically 28 September 2019 2 / 37

Improved Techniques for Training GANs

Plan

Improved Techniques for Training GANs

DC-GAN and Improved Techniques

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein GAN

(1) Progressive GAN

(2) Style GAN

Paired and Unpaired Image-to-Image Translation

Paired Translation

Unpaired Translation

More Recent Examples

Conclusion

Ömer Sümer GANs that work well empirically 28 September 2019 3 / 37

Improved Techniques for Training GANs

How did they come to that point?

Increasingly realistic faces generated by GAN methods.

Brundage et al., “The Malicious Use of Artiﬁcial Intelligence: Forecasting, Prevention, and

Mitigation,” Report, Future of Humanity Institute, University of Oxford, February 2018.

Ömer Sümer GANs that work well empirically 28 September 2019 4 / 37

Improved Techniques for Training GANs

Recap: Generative Adversarial Nets, (Goodfellow, 2014)

min

max

V(D,G) = E

x∼p

data

(x)

logD(x)

+ E

z∼p

(z)

log(1 − D(G(z)))

Discriminator:

Linear(784,1024)→Linear(1024,512)→Linear(512,256)→Linear(256,1)→Sigmoid()

Generator:

Linear(100,256)→Linear(256,512)→Linear(512,1024)→Linear(256,1)→Tanh()

Network architecture

Both D and G are made up of fully connected layers.

All activations are LeakyReLU(0.2).

Ömer Sümer GANs that work well empirically 28 September 2019 5 / 37

Improved Techniques for Training GANs

Recap: Generative Adversarial Nets, (Goodfellow, 2014)

MNIST (a), (b) TFD (b), CIFAR-10 linear (c), conv/deconv.(d)

Ömer Sümer GANs that work well empirically 28 September 2019 6 / 37

Improved Techniques for Training GANs DC-GAN and Improved Techniques

DC-GAN, Radford et al. (2016)

Replace any pooling layers with strided convolutions (in D) and

fractional-strided convolutions (G).

Remove fully connected hidden layers.

Use batch normalization.

Use ReLU activation in all layers of G, except for the output which

uses tanh() and LeakyReLU activation in all layers of D.

DC-GAN generator architecture

Ömer Sümer GANs that work well empirically 28 September 2019 7 / 37

Improved Techniques for Training GANs DC-GAN and Improved Techniques

DC-GAN, Radford et al. (2016)

Generated bedrooms and face arithmetic in the latent space.

Ömer Sümer GANs that work well empirically 28 September 2019 8 / 37

Improved Techniques for Training GANs DC-GAN and Improved Techniques

Improved Techniques, Salimans et al. (NeurIPS 2016)

Feature matching: letting f (x) denote activations on an intermediate

layer of discriminator (or another pre-trained net)

–



x∼p

data

f (x) − E

z∼p

f (G(z))



Minibatch discrimination: (to encourage diversity)

– Let f (x

) ∈ R

, T ∈ R

A×B×C

a tensor, f (x

) × T = M

∈ R

B×C

– Compute the L

distances between rows of M

and apply a negative

exponential: c

) = exp(−



i,b

− M

j,b



) ∈ R

Ömer Sümer GANs that work well empirically 28 September 2019 9 / 37

Improved Techniques for Training GANs DC-GAN and Improved Techniques

Improved Techniques, Salimans et al. (NeurIPS 2016)

Historical Averaging: learning rule scales well in time.

(loosely inspired by the ﬁctitious play that can ﬁnd equilibria)



θ −

∑

i=1

θ[i]



(One-sided) label smoothing: In the output of D, real images’

outputs smoothed to avoid overconﬁdence.

Other practical tricks to stabilize DC-GAN-like networks:

https://github.com/soumith/ganhacks

Ömer Sümer GANs that work well empirically 28 September 2019 10 / 37

Generating High-Quality Images

Plan

Improved Techniques for Training GANs

DC-GAN and Improved Techniques

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein GAN

(1) Progressive GAN

(2) Style GAN

Paired and Unpaired Image-to-Image Translation

Paired Translation

Unpaired Translation

More Recent Examples

Conclusion

Ömer Sümer GANs that work well empirically 28 September 2019 11 / 37

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein

GAN

Least Squares GAN, Mao et al. (2016)

Regular GANs formulate D as a classiﬁer with sigmoid cross entropy

(real/fake), and this loss, in many application, causes vanishing

gradient problem. Thus, they replace it with a least squares loss.

In return, higher quality images and more stable training.

LS-GAN loss

min

LSGAN

(D) =

x∼p

data

(x)

(D(x) − b)

z∼p

(z)

(D(G(z)) − a)

min

LSGAN

(G) =

z∼p

(z)

(D(G(z)) − c)

where b = 1(real), a = 0(fake) and b = 1 (as we want to fool D).

• They show that minimizing this loss yields minimizing χ

-divergence.

Ömer Sümer GANs that work well empirically 28 September 2019 12 / 37

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein

GAN

Least Squares GAN, Mao et al. (2016)

Generated Images LSUN b edrooms.

Ömer Sümer GANs that work well empirically 28 September 2019 13 / 37

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein

GAN

Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)

The Earth-Mover (EM) or Wasserstein-1 distance

W(P

) = inf

γ∈

∏

)

(x,y)∼γ

[

x − y

]

where

∏

) denotes the set of all γ(x,y) whose marginals are P

, P

According to Kantorovich-Rubinstein duality,

W(P

) = sup

≤1

x∼P

[ f (x) ] − E

x∼P

[ f (x) ]

where the supremum is over all the 1-Lipschitz functions f

X → R.

No log in the losses. We do not apply softmax in D’s output.

There is weight clipping in D.

D is trained more than G.

Use RMSProp instead of ADAM

Very low learning rate (α = 0.00005)

(improved) an alternative to clipping weights: penalize the norm of

gradient of the critic with respect to its input.

Ömer Sümer GANs that work well empirically 28 September 2019 14 / 37

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein

GAN

Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)

Wasserstein GAN

Ömer Sümer GANs that work well empirically 28 September 2019 15 / 37

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein

GAN

Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)

Ömer Sümer GANs that work well empirically 28 September 2019 16 / 37

Generating High-Quality Images (1) Progressive GAN

Progressive Growth of GAN’s, Karras et al. (ICLR 2018)

Why is generating high resolution images so diﬃcult?

(let’s say 1024×1024)

In larger resolution, it is easier to distinguish the generated images

from real ones (and this causes gradient problems.)

When networks trained with large resolutions, they use smaller

minibatches, due to memory constraints, making training more

instable.

Ömer Sümer GANs that work well empirically 28 September 2019 17 / 37

Generating High-Quality Images (1) Progressive GAN

Progressive Growth of GAN’s, Karras et al. (ICLR 2018)

Using a single NVIDIA Tesla P100 GPU, training time takes ∼96 hours in 6.4 million

images, and the authors argue that without progressive idea, it would be 520 hours to

reach the same point.

Ömer Sümer GANs that work well empirically 28 September 2019 18 / 37

Generating High-Quality Images (1) Progressive GAN

Progressive Growth of GAN’s, Karras et al. (ICLR 2018)

Doubling and halving the resolution in D and G.

Ömer Sümer GANs that work well empirically 28 September 2019 19 / 37

Generating High-Quality Images (1) Progressive GAN

What’s lead to this photo-realism? Karras et al. (ICLR 2018)

Progressive growth

Revised training parameters

Minibatch stddev

– add the across.minibatch standard deviation

as an additional feature map

Equalized learning rate

– initialize weights trivially by normal dist. with

unit variance and normalize by a constant

layerwise coeﬃcient (He et al. 2015), ˆw

= w

Pixelwise feature normalization in G

x,y

= a

x,y

∑

N−1

j=0

x,y

)

+ ε

Ömer Sümer GANs that work well empirically 28 September 2019 20 / 37

Generating High-Quality Images (1) Progressive GAN

Visual Quality Comparison, Karras et al. (ICLR 2018)

Ömer Sümer GANs that work well empirically 28 September 2019 21 / 37

Generating High-Quality Images (2) Style GAN

Style GAN, Karras et al. (2018)

What is style?

“generator starts from a learned constant input and adjusts the style of the

image at each convolution layer based on the latent code, therefore directly

controlling the strength of image features at diﬀerent scales.”

coarse (4

− 8

): pose, general hair style, face shape

middle (16

− 32

): facial features, hair style, eyes open/closed

ﬁne (64

− 1024

): color scheme (eye, hair and skin) and micro

features.

Ömer Sümer GANs that work well empirically 28 September 2019 22 / 37

Generating High-Quality Images (2) Style GAN

Style GAN, Karras et al. (2018)

Ömer Sümer GANs that work well empirically 28 September 2019 23 / 37

Generating High-Quality Images (2) Style GAN

Style GAN, Karras et al. (2018)

Ömer Sümer GANs that work well empirically 28 September 2019 24 / 37

Paired and Unpaired Image-to-Image Translation

Plan

Improved Techniques for Training GANs

DC-GAN and Improved Techniques

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein GAN

(1) Progressive GAN

(2) Style GAN

Paired and Unpaired Image-to-Image Translation

Paired Translation

Unpaired Translation

More Recent Examples

Conclusion

Ömer Sümer GANs that work well empirically 28 September 2019 25 / 37

Paired and Unpaired Image-to-Image Translation Paired Translation

What is the idea of image-to-image translation?

z → y

Standard GAN’s learn a mapping from random noise z to output image y.

GAN

(G,D) = E[log D(y)] + E

x,z

[log(1 − D(G(x,z)))]

{x,z} → y

Conditional GAN’s learn a mapping from observed data x and random noise z to

output image.

cGAN

(G,D) = E[log D(x, y)] + E

x,z

[log(1 − D(x,G(x,z)))]

Adding an L2 or (the better one) L1 regularization in G.

(G) = E

x,y,z

[



y − G(x,z)



]

cGAN

(G,D) + L

(G)

Ömer Sümer GANs that work well empirically 28 September 2019 26 / 37

Paired and Unpaired Image-to-Image Translation Paired Translation

pix2pix, Isola et al. (2016)

Paired Image-to-Image Translation (pix2pix)

Ömer Sümer GANs that work well empirically 28 September 2019 27 / 37

Paired and Unpaired Image-to-Image Translation Paired Translation

pix2pix, Isola et al. (2016)

Some working examples of pix2pix.

Ömer Sümer GANs that work well empirically 28 September 2019 28 / 37

Paired and Unpaired Image-to-Image Translation Unpaired Translation

CycleGAN, Zhu et al. (2017)

Paired translation (pix2pix) is good, but we do not have paired training

data!

The idea of cycle-consistency.

Ömer Sümer GANs that work well empirically 28 September 2019 29 / 37

Paired and Unpaired Image-to-Image Translation Unpaired Translation

CycleGAN, Zhu et al. (2017)

GAN

(G,D

,X, Y) = E

y∼p

data

(y)

[logD

(y)] + E

x∼p

data

(x)

[log(1 − D

(G(x)))]

cyc

(G,F) = E

x∼p

data

(x)

[



F(G(x)) − x



] + E

y∼p

data

(y)

[



G(F(y)) − y



]

Full objective is

L(G,F,D

) = L

GAN

(G,D

,X, Y) + L

GAN

(F,D

,Y,X) + λ L

cyc

(G,F)

Ömer Sümer GANs that work well empirically 28 September 2019 30 / 37

Paired and Unpaired Image-to-Image Translation Unpaired Translation

CycleGAN, Zhu et al. (2017)

Sample translations.

Ömer Sümer GANs that work well empirically 28 September 2019 31 / 37

Paired and Unpaired Image-to-Image Translation Unpaired Translation

Some examples that I played (1)

My idea was to try out CycleGAN

between unseen forensic sketches and

real pictures...

There is no public unseen forensic sketch

dataset, thus I tried between seen ones.

CUHK Face Sketch database ↔ FERET

Maybe if we have many unseen sketches

from an artist, then we can learn a

model to move his/her sketches to the

real appearance and geometric

distribution.

Ömer Sümer GANs that work well empirically 28 September 2019 32 / 37

Paired and Unpaired Image-to-Image Translation Unpaired Translation

Some examples that I played (2)

From facial landmarks to appearance (only L

regression).

Ömer Sümer GANs that work well empirically 28 September 2019 33 / 37

Paired and Unpaired Image-to-Image Translation More Recent Examples

Learning to combine appearance and pose.

Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, Mario Fritz;

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Ömer Sümer GANs that work well empirically 28 September 2019 34 / 37

Paired and Unpaired Image-to-Image Translation More Recent Examples

Learning to combine appearance and pose (dancing).

Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, “Everybody Dance Now,”

http://arxiv.org/abs/1808.07371

Ömer Sümer GANs that work well empirically 28 September 2019 35 / 37

Conclusion

Plan

Improved Techniques for Training GANs

DC-GAN and Improved Techniques

Generating High-Quality Images

Alternative Losses: Least Squares GAN and Wasserstein GAN

(1) Progressive GAN

(2) Style GAN

Paired and Unpaired Image-to-Image Translation

Paired Translation

Unpaired Translation

More Recent Examples

Conclusion

Ömer Sümer GANs that work well empirically 28 September 2019 36 / 37

Conclusion

There are many qualitative and quantitative measures to compare

generated images.

Besides showing more realistic images, there are two main categories

from applicational point of view:

– the power of unsupervised representation by showing the

performance of D’s features in image retrieval, classiﬁcation tasks.

– adding adversarial term to supervised learning to make them

perform better, for instance, pose estimation, facial keypoint

detection, person identiﬁcation etc.)

– augmenting the data particularly less sampled nodes.

GAN’s originally do not have any structure in their latent space, but

the recent works try to disentangle into meaningful features in an

unsupervised manner.

Ömer Sümer GANs that work well empirically 28 September 2019 37 / 37