Overview of Deep Generative Models

Division of Applied Math, Brown University

Guangyao (Stannis) Zhou

March 5th, 2019

Outline

Overview
Generative Adversarial Networks
Variational Autoencoders
Flow-based Generative Models
Discussions

Overview

Discriminative v.s. Generative

Objects of interest:
- Latent variable \(z\)
- Observed variable \(x\)
Discriminative models: model \(p(z|x)\)
- Logistic regression, neural networks, etc.
Generative models: model \(p(x, z)=p(z)p(x|z)\)
- Markov random fields, Bayesian networks
- Deep generative models

Deep Generative Models

Focus on easy sampling
Popular Frameworks
- Generative Adversarial Networks (GANs)
- Variational Autoencoders (VAEs)
- Flow-based Generative Models

Commonalities and Differences

Commonalities: Model the stochastic generation process
- Base distribution transformed by a neural network
Differences: How to learn the parameters
- GANs: Adversarial learning
- VAEs: Variational Inference
- Flow-based models: Maximum Likelihood

Generative Adversarial Networks

Stochastic Generation Process

Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
Generator: differentiable transformation \(G(z; \theta_g)\)
The actual random variable:
- \(x = G(z; \theta_g)\) where \(z\sim p_z(z)\)

Adversarial Training

Two different distributions:
- Generator distribution \(p_g(x)\)
- Data distribution \(p_{\text{data}}(x)\)
Use an discriminator to train the model
- Can be thought of as a simple two-category classifier
- Likelihood as the loss function
- Samples from both distributions to train the discriminator
- Samples from the generator distribution to train the generator

Variational Autoencoders

Stochastic Generation Process

Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
Differentiable transformation \(\mu(z; \theta)\) and \(\sigma(z; \theta)\)
The actual random variable:
- \(x_i\sim N(\mu_i(z; \theta), \sigma_i^2(z; \theta))\)

Variational Inference

Ideal scenario: maximize \(\log p_\theta(x)=\log \int p_\theta(x, z) dz\)
- The integral is usually intractable
Instead, look at the "Evidence Lower Bound (ELBO)": \[\begin{align*} \log p_\theta(x) &= \log \int p_\theta(x, z) dz = \log \int q_\phi(z|x) \frac{p_\theta(x, z)}{q_\phi(z|x)} dz\\ &\geq \int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \end{align*}\]

Variational Inference

Maximize ELBO, tightest lower bound
- Same as minimizing \(D(q_\phi(z|x)||p_\theta(x, z))\)
"Amortized" version of mean-field approximation
- More differentiable transformations
  - \(\mu(x; \phi)\) and \(\sigma(x; \phi)\)
- \(q_\phi(z_i|x) = N(z|\mu_i(x; \phi), \sigma_i^2(x; \phi))\)

Reparametrization trick

Want to maximize using gradient descent \[\begin{equation*} \int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \end{equation*}\]
Difficulty: estimating the gradients

Reparametrization trick

A naive approach for estimating the gradient w.r.t. \(\phi\): \[\begin{align*} &\nabla_{\phi}\int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \\ =& \int q_\phi(z|x) \nabla_{\phi} \log q_{\phi}(z|x)\left[\log \frac{p_\theta(x, z)}{q_\phi(z|x)} - 1\right] dz \end{align*}\]
Draw Monte Carlo samples \(z^{(i)}\sim q_{\phi}(z|x), i=1,\cdots, n\) to estimate the gradients: \[\begin{equation*} \frac{1}{n}\sum_{i=1}^n \nabla_{\phi} \log q_{\phi}(z^{(i)}|x)\left[\log \frac{p_\theta(x, z^{(i)})}{q_\phi(z^{(i)}|x)} - 1\right] \end{equation*}\]

Reparametrization trick

Problems with this naive approach:
- Sometimes we can't evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\)
- Even if we can evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\), we usually have a very high variance estimator
- Can't easily make use of automatic differentiation
Solution: reparametrization trick
- Look at \(q_\phi(z|x)\) as parameterless base distribution \(p(\epsilon)\) transformed by differentiable transformation \(g_\phi(\epsilon, x)\)

Reparametrization trick

For VAEs, \(p(\epsilon) = N(0, I)\), and \[\begin{equation*} g_\phi(\epsilon, x) = \mu(x; \phi) + \sigma(x; \phi) \odot \epsilon \end{equation*}\] where \(\odot\) represents elementwise multiplication.
Sample \(\epsilon^{(i)}, i=1, \cdots, n\) from \(p(\epsilon)\)
Use the objective \[\begin{equation*} \frac{1}{n}\sum_{i=1}^n \log \frac{p_\theta(x, g_\phi(\epsilon^{(i)}, x))}{q_\phi(g_\phi(\epsilon^{(i)}, x)|x)} \end{equation*}\]
Can easily estimate gradients w.r.t. \(\theta\) and \(\phi\) using backpropagation.

Flow-based Generative Models

Stochastic Generation Process

Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
Reversible differentiable transformation \(x = G(z)\), for which we can easily calculate the determinant of the Jacobian \(|J G(z)|\)
Exact probability density function given by \[\begin{equation*} p_x(x) = \frac{p_z(z)}{|J G(z)|}, \text{where }z = G^{-1}(x) \end{equation*}\]

Maximum Likelihood Estimation

Direct access to exact likelihood
Train with maximum likelihood
More details in the next meeting!

Overview of Deep Generative Models

Overview

Discriminative v.s. Generative

Deep Generative Models

Commonalities and Differences

Generative Adversarial Networks

Stochastic Generation Process

Adversarial Training

Variational Autoencoders

Stochastic Generation Process

Variational Inference

Variational Inference

Reparametrization trick

Reparametrization trick

Reparametrization trick

Reparametrization trick

Flow-based Generative Models

Stochastic Generation Process

Maximum Likelihood Estimation

Discussions