GANs that work well empirically
Ömer Sümer
Department of Computer Science
University of Tübingen, Germany
ICERM, Generative Models Discussion Group
Ömer Sümer GANs that work well empirically 28 September 2019 1 / 37
Outline
1
Improved Techniques for Training GANs
DC-GAN and Improved Techniques
2
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein GAN
(1) Progressive GAN
(2) Style GAN
3
Paired and Unpaired Image-to-Image Translation
Paired Translation
Unpaired Translation
More Recent Examples
4
Conclusion
Ömer Sümer GANs that work well empirically 28 September 2019 2 / 37
Improved Techniques for Training GANs
Plan
1
Improved Techniques for Training GANs
DC-GAN and Improved Techniques
2
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein GAN
(1) Progressive GAN
(2) Style GAN
3
Paired and Unpaired Image-to-Image Translation
Paired Translation
Unpaired Translation
More Recent Examples
4
Conclusion
Ömer Sümer GANs that work well empirically 28 September 2019 3 / 37
Improved Techniques for Training GANs
How did they come to that point?
Increasingly realistic faces generated by GAN methods.
Brundage et al., “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and
Mitigation,” Report, Future of Humanity Institute, University of Oxford, February 2018.
Ömer Sümer GANs that work well empirically 28 September 2019 4 / 37
Improved Techniques for Training GANs
Recap: Generative Adversarial Nets, (Goodfellow, 2014)
min
G
max
D
V(D,G) = E
xp
data
(x)
h
logD(x)
i
+ E
zp
z
(z)
h
log(1 D(G(z)))
i
Discriminator:
Linear(784,1024)Linear(1024,512)Linear(512,256)Linear(256,1)Sigmoid()
Generator:
Linear(100,256)Linear(256,512)Linear(512,1024)Linear(256,1)Tanh()
Network architecture
Both D and G are made up of fully connected layers.
All activations are LeakyReLU(0.2).
Ömer Sümer GANs that work well empirically 28 September 2019 5 / 37
Improved Techniques for Training GANs
Recap: Generative Adversarial Nets, (Goodfellow, 2014)
MNIST (a), (b) TFD (b), CIFAR-10 linear (c), conv/deconv.(d)
Ömer Sümer GANs that work well empirically 28 September 2019 6 / 37
Improved Techniques for Training GANs DC-GAN and Improved Techniques
DC-GAN, Radford et al. (2016)
Replace any pooling layers with strided convolutions (in D) and
fractional-strided convolutions (G).
Remove fully connected hidden layers.
Use batch normalization.
Use ReLU activation in all layers of G, except for the output which
uses tanh() and LeakyReLU activation in all layers of D.
DC-GAN generator architecture
Ömer Sümer GANs that work well empirically 28 September 2019 7 / 37
Improved Techniques for Training GANs DC-GAN and Improved Techniques
DC-GAN, Radford et al. (2016)
Generated bedrooms and face arithmetic in the latent space.
Ömer Sümer GANs that work well empirically 28 September 2019 8 / 37
Improved Techniques for Training GANs DC-GAN and Improved Techniques
Improved Techniques, Salimans et al. (NeurIPS 2016)
Feature matching: letting f (x) denote activations on an intermediate
layer of discriminator (or another pre-trained net)
E
xp
data
f (x) E
zp
z
f (G(z))
2
2
Minibatch discrimination: (to encourage diversity)
Let f (x
i
) R
A
, T R
A×B×C
a tensor, f (x
i
) × T = M
i
R
B×C
Compute the L
1
distances between rows of M
i
and apply a negative
exponential: c
b
(x
i
,x
j
) = exp(
M
i,b
M
j,b
L
1
) R
Ömer Sümer GANs that work well empirically 28 September 2019 9 / 37
Improved Techniques for Training GANs DC-GAN and Improved Techniques
Improved Techniques, Salimans et al. (NeurIPS 2016)
Historical Averaging: learning rule scales well in time.
(loosely inspired by the fictitious play that can find equilibria)
θ
1
t
t
i=1
θ[i]
2
(One-sided) label smoothing: In the output of D, real images’
outputs smoothed to avoid overconfidence.
Other practical tricks to stabilize DC-GAN-like networks:
https://github.com/soumith/ganhacks
Ömer Sümer GANs that work well empirically 28 September 2019 10 / 37
Generating High-Quality Images
Plan
1
Improved Techniques for Training GANs
DC-GAN and Improved Techniques
2
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein GAN
(1) Progressive GAN
(2) Style GAN
3
Paired and Unpaired Image-to-Image Translation
Paired Translation
Unpaired Translation
More Recent Examples
4
Conclusion
Ömer Sümer GANs that work well empirically 28 September 2019 11 / 37
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein
GAN
Least Squares GAN, Mao et al. (2016)
Regular GANs formulate D as a classifier with sigmoid cross entropy
(real/fake), and this loss, in many application, causes vanishing
gradient problem. Thus, they replace it with a least squares loss.
In return, higher quality images and more stable training.
LS-GAN loss
min
D
V
LSGAN
(D) =
1
2
E
xp
data
(x)
h
(D(x) b)
2
i
+
1
2
E
zp
z
(z)
h
(D(G(z)) a)
2
i
min
G
V
LSGAN
(G) =
1
2
E
zp
z
(z)
h
(D(G(z)) c)
2
i
where b = 1(real), a = 0(fake) and b = 1 (as we want to fool D).
They show that minimizing this loss yields minimizing χ
2
-divergence.
Ömer Sümer GANs that work well empirically 28 September 2019 12 / 37
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein
GAN
Least Squares GAN, Mao et al. (2016)
Generated Images LSUN b edrooms.
Ömer Sümer GANs that work well empirically 28 September 2019 13 / 37
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein
GAN
Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)
The Earth-Mover (EM) or Wasserstein-1 distance
W(P
r
,P
θ
) = inf
γ
(P
r
,P
g
)
E
(x,y)γ
[
k
x y
k
]
where
(P
r
,P
g
) denotes the set of all γ(x,y) whose marginals are P
r
, P
g
.
According to Kantorovich-Rubinstein duality,
W(P
r
,P
θ
) = sup
k
f
k
L
1
E
xP
r
[ f (x) ] E
xP
θ
[ f (x) ]
where the supremum is over all the 1-Lipschitz functions f
:
X R.
No log in the losses. We do not apply softmax in D’s output.
There is weight clipping in D.
D is trained more than G.
Use RMSProp instead of ADAM
Very low learning rate (α = 0.00005)
(improved) an alternative to clipping weights: penalize the norm of
gradient of the critic with respect to its input.
Ömer Sümer GANs that work well empirically 28 September 2019 14 / 37
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein
GAN
Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)
Wasserstein GAN
Ömer Sümer GANs that work well empirically 28 September 2019 15 / 37
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein
GAN
Wasserstein GAN, Arjovsky et al. (2017), Gulrajani et al. (2017)
Ömer Sümer GANs that work well empirically 28 September 2019 16 / 37
Generating High-Quality Images (1) Progressive GAN
Progressive Growth of GAN’s, Karras et al. (ICLR 2018)
Why is generating high resolution images so difficult?
(let’s say 1024×1024)
1
In larger resolution, it is easier to distinguish the generated images
from real ones (and this causes gradient problems.)
2
When networks trained with large resolutions, they use smaller
minibatches, due to memory constraints, making training more
instable.
Ömer Sümer GANs that work well empirically 28 September 2019 17 / 37
Generating High-Quality Images (1) Progressive GAN
Progressive Growth of GAN’s, Karras et al. (ICLR 2018)
Using a single NVIDIA Tesla P100 GPU, training time takes 96 hours in 6.4 million
images, and the authors argue that without progressive idea, it would be 520 hours to
reach the same point.
Ömer Sümer GANs that work well empirically 28 September 2019 18 / 37
Generating High-Quality Images (1) Progressive GAN
Progressive Growth of GAN’s, Karras et al. (ICLR 2018)
Doubling and halving the resolution in D and G.
Ömer Sümer GANs that work well empirically 28 September 2019 19 / 37
Generating High-Quality Images (1) Progressive GAN
What’s lead to this photo-realism? Karras et al. (ICLR 2018)
Progressive growth
Revised training parameters
Minibatch stddev
add the across.minibatch standard deviation
as an additional feature map
Equalized learning rate
initialize weights trivially by normal dist. with
unit variance and normalize by a constant
layerwise coefficient (He et al. 2015), ˆw
i
= w
i
/c
Pixelwise feature normalization in G
b
x,y
= a
x,y
/
q
1
N
N1
j=0
(a
j
x,y
)
2
+ ε
Ömer Sümer GANs that work well empirically 28 September 2019 20 / 37
Generating High-Quality Images (1) Progressive GAN
Visual Quality Comparison, Karras et al. (ICLR 2018)
Ömer Sümer GANs that work well empirically 28 September 2019 21 / 37
Generating High-Quality Images (2) Style GAN
Style GAN, Karras et al. (2018)
What is style?
“generator starts from a learned constant input and adjusts the style of the
image at each convolution layer based on the latent code, therefore directly
controlling the strength of image features at different scales.”
1
coarse (4
2
8
2
): pose, general hair style, face shape
2
middle (16
2
32
2
): facial features, hair style, eyes open/closed
3
fine (64
2
1024
2
): color scheme (eye, hair and skin) and micro
features.
Ömer Sümer GANs that work well empirically 28 September 2019 22 / 37
Generating High-Quality Images (2) Style GAN
Style GAN, Karras et al. (2018)
Ömer Sümer GANs that work well empirically 28 September 2019 23 / 37
Generating High-Quality Images (2) Style GAN
Style GAN, Karras et al. (2018)
Ömer Sümer GANs that work well empirically 28 September 2019 24 / 37
Paired and Unpaired Image-to-Image Translation
Plan
1
Improved Techniques for Training GANs
DC-GAN and Improved Techniques
2
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein GAN
(1) Progressive GAN
(2) Style GAN
3
Paired and Unpaired Image-to-Image Translation
Paired Translation
Unpaired Translation
More Recent Examples
4
Conclusion
Ömer Sümer GANs that work well empirically 28 September 2019 25 / 37
Paired and Unpaired Image-to-Image Translation Paired Translation
What is the idea of image-to-image translation?
G
:
z y
Standard GAN’s learn a mapping from random noise z to output image y.
L
GAN
(G,D) = E[log D(y)] + E
x,z
[log(1 D(G(x,z)))]
G
:
{x,z} y
Conditional GAN’s learn a mapping from observed data x and random noise z to
output image.
L
cGAN
(G,D) = E[log D(x, y)] + E
x,z
[log(1 D(x,G(x,z)))]
Adding an L2 or (the better one) L1 regularization in G.
L
L1
(G) = E
x,y,z
[
y G(x,z)
1
]
L
cGAN
(G,D) + L
L1
(G)
Ömer Sümer GANs that work well empirically 28 September 2019 26 / 37
Paired and Unpaired Image-to-Image Translation Paired Translation
pix2pix, Isola et al. (2016)
Paired Image-to-Image Translation (pix2pix)
Ömer Sümer GANs that work well empirically 28 September 2019 27 / 37
Paired and Unpaired Image-to-Image Translation Paired Translation
pix2pix, Isola et al. (2016)
Some working examples of pix2pix.
Ömer Sümer GANs that work well empirically 28 September 2019 28 / 37
Paired and Unpaired Image-to-Image Translation Unpaired Translation
CycleGAN, Zhu et al. (2017)
Paired translation (pix2pix) is good, but we do not have paired training
data!
The idea of cycle-consistency.
Ömer Sümer GANs that work well empirically 28 September 2019 29 / 37
Paired and Unpaired Image-to-Image Translation Unpaired Translation
CycleGAN, Zhu et al. (2017)
L
GAN
(G,D
Y
,X, Y) = E
yp
data
(y)
[logD
Y
(y)] + E
xp
data
(x)
[log(1 D
Y
(G(x)))]
L
cyc
(G,F) = E
xp
data
(x)
[
F(G(x)) x
1
] + E
yp
data
(y)
[
G(F(y)) y
1
]
Full objective is
L(G,F,D
X
,D
Y
) = L
GAN
(G,D
Y
,X, Y) + L
GAN
(F,D
X
,Y,X) + λ L
cyc
(G,F)
Ömer Sümer GANs that work well empirically 28 September 2019 30 / 37
Paired and Unpaired Image-to-Image Translation Unpaired Translation
CycleGAN, Zhu et al. (2017)
Sample translations.
Ömer Sümer GANs that work well empirically 28 September 2019 31 / 37
Paired and Unpaired Image-to-Image Translation Unpaired Translation
Some examples that I played (1)
My idea was to try out CycleGAN
between unseen forensic sketches and
real pictures...
There is no public unseen forensic sketch
dataset, thus I tried between seen ones.
CUHK Face Sketch database FERET
Maybe if we have many unseen sketches
from an artist, then we can learn a
model to move his/her sketches to the
real appearance and geometric
distribution.
Ömer Sümer GANs that work well empirically 28 September 2019 32 / 37
Paired and Unpaired Image-to-Image Translation Unpaired Translation
Some examples that I played (2)
From facial landmarks to appearance (only L
1
regression).
Ömer Sümer GANs that work well empirically 28 September 2019 33 / 37
Paired and Unpaired Image-to-Image Translation More Recent Examples
Learning to combine appearance and pose.
Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, Mario Fritz;
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
Ömer Sümer GANs that work well empirically 28 September 2019 34 / 37
Paired and Unpaired Image-to-Image Translation More Recent Examples
Learning to combine appearance and pose (dancing).
Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, “Everybody Dance Now,”
http://arxiv.org/abs/1808.07371
Ömer Sümer GANs that work well empirically 28 September 2019 35 / 37
Conclusion
Plan
1
Improved Techniques for Training GANs
DC-GAN and Improved Techniques
2
Generating High-Quality Images
Alternative Losses: Least Squares GAN and Wasserstein GAN
(1) Progressive GAN
(2) Style GAN
3
Paired and Unpaired Image-to-Image Translation
Paired Translation
Unpaired Translation
More Recent Examples
4
Conclusion
Ömer Sümer GANs that work well empirically 28 September 2019 36 / 37
Conclusion
Conclusion
There are many qualitative and quantitative measures to compare
generated images.
Besides showing more realistic images, there are two main categories
from applicational point of view:
the power of unsupervised representation by showing the
performance of D’s features in image retrieval, classification tasks.
adding adversarial term to supervised learning to make them
perform better, for instance, pose estimation, facial keypoint
detection, person identification etc.)
augmenting the data particularly less sampled nodes.
GAN’s originally do not have any structure in their latent space, but
the recent works try to disentangle into meaningful features in an
unsupervised manner.
Ömer Sümer GANs that work well empirically 28 September 2019 37 / 37