Overview of Deep Generative Models
Division of Applied Math, Brown University
Guangyao (Stannis) Zhou
March 5th, 2019
Discriminative v.s. Generative
- Objects of interest:
- Latent variable \(z\)
- Observed variable \(x\)
- Discriminative models: model \(p(z|x)\)
- Logistic regression, neural networks, etc.
- Generative models: model \(p(x, z)=p(z)p(x|z)\)
- Markov random fields, Bayesian networks
- Deep generative models
Deep Generative Models
- Focus on easy sampling
- Popular Frameworks
- Generative Adversarial Networks (GANs)
- Variational Autoencoders (VAEs)
- Flow-based Generative Models
Commonalities and Differences
- Commonalities: Model the stochastic generation process
- Base distribution transformed by a neural network
- Differences: How to learn the parameters
- GANs: Adversarial learning
- VAEs: Variational Inference
- Flow-based models: Maximum Likelihood
Generative Adversarial Networks
Stochastic Generation Process
- Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
- Generator: differentiable transformation \(G(z; \theta_g)\)
- The actual random variable:
- \(x = G(z; \theta_g)\) where \(z\sim p_z(z)\)
Adversarial Training
- Two different distributions:
- Generator distribution \(p_g(x)\)
- Data distribution \(p_{\text{data}}(x)\)
- Use an discriminator to train the model
- Can be thought of as a simple two-category classifier
- Likelihood as the loss function
- Samples from both distributions to train the discriminator
- Samples from the generator distribution to train the generator
Stochastic Generation Process
- Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
- Differentiable transformation \(\mu(z; \theta)\) and \(\sigma(z; \theta)\)
- The actual random variable:
- \(x_i\sim N(\mu_i(z; \theta), \sigma_i^2(z; \theta))\)
Variational Inference
- Ideal scenario: maximize \(\log p_\theta(x)=\log \int p_\theta(x, z) dz\)
- The integral is usually intractable
- Instead, look at the "Evidence Lower Bound (ELBO)":
\[\begin{align*}
\log p_\theta(x) &= \log \int p_\theta(x, z) dz = \log \int q_\phi(z|x) \frac{p_\theta(x, z)}{q_\phi(z|x)} dz\\
&\geq \int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz
\end{align*}\]
Variational Inference
- Maximize ELBO, tightest lower bound
- Same as minimizing \(D(q_\phi(z|x)||p_\theta(x, z))\)
- "Amortized" version of mean-field approximation
- More differentiable transformations
- \(\mu(x; \phi)\) and \(\sigma(x; \phi)\)
- \(q_\phi(z_i|x) = N(z|\mu_i(x; \phi), \sigma_i^2(x; \phi))\)
Reparametrization trick
- Want to maximize using gradient descent
\[\begin{equation*}
\int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz
\end{equation*}\]
- Difficulty: estimating the gradients
Reparametrization trick
- A naive approach for estimating the gradient w.r.t. \(\phi\):
\[\begin{align*}
&\nabla_{\phi}\int q_\phi(z|x) \log \frac{p_\theta(x, z)}{q_\phi(z|x)} dz \\
=& \int q_\phi(z|x) \nabla_{\phi} \log q_{\phi}(z|x)\left[\log \frac{p_\theta(x, z)}{q_\phi(z|x)} - 1\right] dz
\end{align*}\]
- Draw Monte Carlo samples \(z^{(i)}\sim q_{\phi}(z|x), i=1,\cdots, n\) to estimate the gradients:
\[\begin{equation*}
\frac{1}{n}\sum_{i=1}^n \nabla_{\phi} \log q_{\phi}(z^{(i)}|x)\left[\log \frac{p_\theta(x, z^{(i)})}{q_\phi(z^{(i)}|x)} - 1\right]
\end{equation*}\]
Reparametrization trick
- Problems with this naive approach:
- Sometimes we can't evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\)
- Even if we can evaluate \(\nabla_{\phi}\log q_{\phi}(z|x)\), we usually have a very high variance estimator
- Can't easily make use of automatic differentiation
- Solution: reparametrization trick
- Look at \(q_\phi(z|x)\) as parameterless base distribution \(p(\epsilon)\) transformed by differentiable transformation \(g_\phi(\epsilon, x)\)
Reparametrization trick
- For VAEs, \(p(\epsilon) = N(0, I)\), and
\[\begin{equation*}
g_\phi(\epsilon, x) = \mu(x; \phi) + \sigma(x; \phi) \odot \epsilon
\end{equation*}\]
where \(\odot\) represents elementwise multiplication.
- Sample \(\epsilon^{(i)}, i=1, \cdots, n\) from \(p(\epsilon)\)
- Use the objective
\[\begin{equation*}
\frac{1}{n}\sum_{i=1}^n \log \frac{p_\theta(x, g_\phi(\epsilon^{(i)}, x))}{q_\phi(g_\phi(\epsilon^{(i)}, x)|x)}
\end{equation*}\]
- Can easily estimate gradients w.r.t. \(\theta\) and \(\phi\) using backpropagation.
Flow-based Generative Models
Stochastic Generation Process
- Base distribution: some simple distribution \(p_z(z)\)
- Example: Multivariate Gaussian
- Easy to sample from
- Reversible differentiable transformation \(x = G(z)\), for which we can easily calculate the determinant of the Jacobian \(|J G(z)|\)
- Exact probability density function given by
\[\begin{equation*}
p_x(x) = \frac{p_z(z)}{|J G(z)|}, \text{where }z = G^{-1}(x)
\end{equation*}\]
Maximum Likelihood Estimation
- Direct access to exact likelihood
- Train with maximum likelihood
- More details in the next meeting!