Overview of Deep Generative Models

Division of Applied Math, Brown University

Guangyao (Stannis) Zhou

March 5th, 2019

Outline

Overview

Discriminative v.s. Generative

  • Objects of interest:
    • Latent variable \(z\)
    • Observed variable \(x\)
  • Discriminative models: model \(p(z|x)\)
    • Logistic regression, neural networks, etc.
  • Generative models: model \(p(x, z)=p(z)p(x|z)\)
    • Markov random fields, Bayesian networks
    • Deep generative models

Deep Generative Models

  • Focus on easy sampling
  • Popular Frameworks
    • Generative Adversarial Networks (GANs)
    • Variational Autoencoders (VAEs)
    • Flow-based Generative Models

Commonalities and Differences

  • Commonalities: Model the stochastic generation process
    • Base distribution transformed by a neural network
  • Differences: How to learn the parameters
    • GANs: Adversarial learning
    • VAEs: Variational Inference
    • Flow-based models: Maximum Likelihood

Generative Adversarial Networks

Stochastic Generation Process

  • Base distribution: some simple distribution \(p_z(z)\)
    • Example: Multivariate Gaussian
    • Easy to sample from
  • Generator: differentiable transformation \(G(z; \theta_g)\)
  • The actual random variable:
    • \(x = G(z; \theta_g)\) where \(z\sim p_z(z)\)

Adversarial Training

  • Two different distributions:
    • Generator distribution \(p_g(x)\)
    • Data distribution \(p_{\text{data}}(x)\)
  • Use an discriminator to train the model
    • Can be thought of as a simple two-category classifier
    • Likelihood as the loss function
    • Samples from both distributions to train the discriminator
    • Samples from the generator distribution to train the generator

Variational Autoencoders

Stochastic Generation Process

  • Base distribution: some simple distribution \(p_z(z)\)
    • Example: Multivariate Gaussian
    • Easy to sample from
  • Differentiable transformation \(\mu(z; \theta)\) and \(\sigma(z; \theta)\)
  • The actual random variable:
    • \(x_i\sim N(\mu_i(z; \theta), \sigma_i^2(z; \theta))\)

Variational Inference

  • Ideal scenario: maximize \(\log p_\theta(x)=\log \int p_\theta(x, z) dz\)
    • The integral is usually intractable
  • Instead, look at the "Evidence Lower Bound (ELBO)": \[\begin{align*} \log p_\theta(x) &= \log \int p_\theta(x, z) dz = \log \int q_\phi(z|x) \frac{p_\theta(x, z)}{q_\phi(z|x)} dz\\ &\geq \int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \end{align*}\]

Variational Inference

  • Maximize ELBO, tightest lower bound
    • Same as minimizing \(D(q_\phi(z|x)||p_\theta(x, z))\)
  • "Amortized" version of mean-field approximation
    • More differentiable transformations
      • \(\mu(x; \phi)\) and \(\sigma(x; \phi)\)
    • \(q_\phi(z_i|x) = N(z|\mu_i(x; \phi), \sigma_i^2(x; \phi))\)

Reparametrization trick

  • Want to maximize using gradient descent \[\begin{equation*} \int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \end{equation*}\]
  • Difficulty: estimating the gradients

Reparametrization trick

  • A naive approach for estimating the gradient w.r.t. \(\phi\): \[\begin{align*} &\nabla_{\phi}\int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \\ =& \int q_\phi(z|x) \nabla_{\phi} \log q_{\phi}(z|x)\left[\log \frac{p_\theta(x, z)}{q_\phi(z|x)} - 1\right] dz \end{align*}\]
  • Draw Monte Carlo samples \(z^{(i)}\sim q_{\phi}(z|x), i=1,\cdots, n\) to estimate the gradients: \[\begin{equation*} \frac{1}{n}\sum_{i=1}^n \nabla_{\phi} \log q_{\phi}(z^{(i)}|x)\left[\log \frac{p_\theta(x, z^{(i)})}{q_\phi(z^{(i)}|x)} - 1\right] \end{equation*}\]

Reparametrization trick

  • Problems with this naive approach:
    • Sometimes we can't evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\)
    • Even if we can evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\), we usually have a very high variance estimator
    • Can't easily make use of automatic differentiation
  • Solution: reparametrization trick
    • Look at \(q_\phi(z|x)\) as parameterless base distribution \(p(\epsilon)\) transformed by differentiable transformation \(g_\phi(\epsilon, x)\)

Reparametrization trick

  • For VAEs, \(p(\epsilon) = N(0, I)\), and \[\begin{equation*} g_\phi(\epsilon, x) = \mu(x; \phi) + \sigma(x; \phi) \odot \epsilon \end{equation*}\] where \(\odot\) represents elementwise multiplication.
  • Sample \(\epsilon^{(i)}, i=1, \cdots, n\) from \(p(\epsilon)\)
  • Use the objective \[\begin{equation*} \frac{1}{n}\sum_{i=1}^n \log \frac{p_\theta(x, g_\phi(\epsilon^{(i)}, x))}{q_\phi(g_\phi(\epsilon^{(i)}, x)|x)} \end{equation*}\]
  • Can easily estimate gradients w.r.t. \(\theta\) and \(\phi\) using backpropagation.

Flow-based Generative Models

Stochastic Generation Process

  • Base distribution: some simple distribution \(p_z(z)\)
    • Example: Multivariate Gaussian
    • Easy to sample from
  • Reversible differentiable transformation \(x = G(z)\), for which we can easily calculate the determinant of the Jacobian \(|J G(z)|\)
  • Exact probability density function given by \[\begin{equation*} p_x(x) = \frac{p_z(z)}{|J G(z)|}, \text{where }z = G^{-1}(x) \end{equation*}\]

Maximum Likelihood Estimation

  • Direct access to exact likelihood
  • Train with maximum likelihood
  • More details in the next meeting!

Discussions