Computability of Conditional Probability

Division of Applied Math, Brown University

Guangyao (Stannis) Zhou

April 25th, 2019

Outline

Overview

Computability of Probability Distributions

  • Class of computable probability distributions
    • A probabilistic Turing machine
    • Can sample to arbitrary accuracy
  • Formalize the concept of computability using theory of computable metric spaces
  • Central message:

    There exist computable joint distributions with noncomputable conditional distributions

A Simple Example

  • Latent random variables:
    • \(C\sim\text{Bernoulli}(\frac{1}{2})\)
    • \(U\sim\text{Uniform}([0,1])\)
    • \(N\sim\text{Geometric}(\frac{1}{2})\)
  • \(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
  • Observed random variable: \[\begin{equation} X = \begin{cases} U & \text{if }C = 1 \\ r_N & \text{otherwise} \end{cases} \end{equation}\]
  • Claim: \(\mathbb{P}(C|X)\) is not "computable"

A First Look at the Simple Example

  • Some direct calculations
    • \(\mathbb{P}(C=0|X\text{ rational}) = 1\)
    • \(\mathbb{P}(C=0|X\text{ irrational}) = 0\)
  • The conditional "probability kernel":
    • \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
    • The Dirichlet function, a nowhere continuous function
  • Computability implies continuity, so discontinuity implies noncomputability

Computable Probability Theory

Computably Enumerable (c.e.) and C.e. Reals

  • A (potentially countably infinite) set is computably enumerable (c.e.) when there is a program that outputs every element of the set eventually
  • A real number \(r\) is c.e. when both \(\{q\in\mathbb{Q}: q < r\}\) and \(\{q\in\mathbb{Q}: q > r\}\) are c.e.
  • Intuitively, \(r\) is c.e. if we can write a program which can approximate \(r\) to arbitrary accuracy

Computable Metric Spaces

  • A computable metric space is a triple \((S, \delta, \mathcal{D})\)
    • \(\delta\) is a metric on the set \(S\)
    • \((S, \delta)\) is a complete separable metric space
    • \(\mathcal{D} = \{s_i\}_{i\in\mathbb{N}}\) is an enumeration of a dense subset of \(S\), called ideal points
    • The real numbers \(\delta(s_i, s_j)\) are c.e./computable, uniformly in \(i\) and \(j\)
  • The ideal balls of \(S\): \[\begin{equation} \mathcal{B}_S = \{B(s_i, q_j): s_i\in\mathcal{D}, q_j\in\mathbb{Q}, q_j > 0\} \end{equation}\] \(B(s_i, q_j)\) denotes the ball of radius \(q_j\) centered at \(s_i\).

Computable Point

  • A point \(x\in S\) is computable when there's a program that enumerates a sequence \(\{x_i\}\subset\mathcal{D}\) where \[\delta(x_i, x)<2^{-i}, \forall i\]
  • Intuitively, a point is computable when we can write a program that approximates the point to arbitrary accuracy

Computable Partial Function

  • An open set \(U\) is c.e. open when there is some c.e. set \(E\subset\mathbb{N}\) such that \(U=\cup_{i\in E}B_i\)
  • Assume \((S, \delta_S, \mathcal{D}_S)\) and \((T, \delta_T, \mathcal{D}_T)\) are computable metric spaces, and \(\{B_n\}_{n\in\mathbb{N}}\) is an enumeration of \(\mathcal{B}_T\)
  • A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]

Computable Partial Function

  • A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]
  • Computability implies continuity
  • Intuition: given a computable \(x\), we can write program to approximate \(f(x)\) to arbitrary accuracy

Computable Random Variables

  • Underlying probability space: \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\)
    • \(\mathbb{P}\): (infinite) product measure of the uniform distribution on \(\{0, 1\}\)
    • An infinite sequence of fair coin tosses
  • Random variable \(X\) taking values in computable metric space \(S\)
    • \(X\) is a function from \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\) to \(S\)
    • \(X\) is a computable random variable if this function is computable
  • Common random variables are all computable

Computable Probability Measures

  • \((S, \delta, \mathcal{D}_S)\): a computable metric space
  • \(\mathcal{M}_1(S)\): set of all (Borel) probability measures on \(S\)
  • Define \(\mathcal{D}_P\subset \mathcal{M}_1(s)\) to be the set of probabilities measures that are concentrated on a finite subset of \(\mathcal{D}_S\) and have rational measure for each atom
  • Using the so-called Prokhorov metric \(\delta_P\), we can define a computable metric space \((\mathcal{M}_1(S), \delta_P, \mathcal{D}_p)\)
  • We say \(\mu\in\mathcal{M}_1(S)\) is a computable probability measure if it's a computable point in this space

Computable Probability Measures

  • All computable random variables induce computable probability measures
  • For any computable probability measures, we can identify a sequence of i.i.d. computable random variables

Computable Conditional Distributions

Conditioning w.r.t. a Single Event

  • Let \(S\) be a measurable space and let \(\mu\in\mathcal{M}_1(S)\)
  • For some event \(A\subset S\) s.t. \(\mu(A) > 0\), the conditional probability of \(B\subset S\) given \(A\) is \[\mu(B|A)=\frac{\mu(B\cap A)}{\mu(A)}\]

Conditioning w.r.t. a Single Event

  • Assume \((S, \mu)\) is a computable probability space
    • \(\mu\) is a computable probability measure
    • \(A\subset S\) is called almost decidable if there are c.e. open sets \(U\subset A\), \(V\subset S/A\), and \(\mu(U)+\mu(V)=1\)
    • \(\mu\) is computable \(\iff\) \(\mu(A)\) is c.e. real whenever \(A\) is almost decidable
  • \(A\) is almost decidable \(\implies\) \(\mu(\cdot|A)\) is a computable probability measure

Probability Kernel

  • Let \(S\) and \(T\) be measurable spaces
  • A function \(k: S\times\mathcal{B}_T\rightarrow[0, 1]\) is called a probability kernel when
    • \(\forall s\in S\), \(k(s, \cdot)\) is a probability measure on \(T\)
    • \(\forall B\in\mathcal{B}_T\), the function \(k(\cdot, B)\) is measurable

Regular Conditional Distribution

  • Let \(X\) and \(Y\) be random variables in measurable spaces \(S\) and \(T\)
  • A probability kernel \(k\) is called a regular version of the conditional distribution \(\mathbb{P}(Y\in\cdot|X)\) when \[\mathbb{P}(X\in A, Y\in B)=\int_A k(x, B)\mathcal{P}_X(d x)\] for all measurable sets \(A\subset S\) and \(B\subset T\)

Computable Probability Kernel

  • \(S\) and \(T\): computable metric spaces
  • \(k: S\times\mathcal{B}_T\rightarrow[0, 1]\): a probability kernel from \(S\) to \(T\)
  • Define \(\phi_k: S\rightarrow \mathcal{M}_1(T)\) to be \(\phi_k(s) = k(s, \cdot)\)
  • \(k\) is a computable probability kernel when \(\phi_k\) is a computable function

Negative and Positive Examples

Back to the Simple Example

  • Latent random variables:
    • \(C\sim\text{Bernoulli}(\frac{1}{2})\)
    • \(U\sim\text{Uniform}([0,1])\)
    • \(N\sim\text{Geometric}(\frac{1}{2})\)
  • \(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
  • Observed random variable: \[\begin{equation} X = \begin{cases} U & \text{if }C = 1 \\ r_N & \text{otherwise} \end{cases} \end{equation}\]

Back to the Simple Example

  • Want to study \(\mathbb{P}(C|X)\)
  • Using previous notations, \(S=[0, 1]\), \(T=\{0, 1\}\)
  • \(\mathcal{M}_1(T)\) is equivalent to \([0, 1]\)
  • The conditional probability kernel:
    • \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
    • The Dirichlet function, a nowhere continuous function
  • Discontinuity implies noncomputability

More Negative Examples

  • An example where the conditional distribution has a version that's continuous almost everywhere, yet still not computable
  • An example where the conditional distribution has a version that's continuous everywhere, yet still not computable

Positive Examples

  • Conditioning on discrete random variables is always computable
  • Conditioning is computable when there's a conditional density

Positive Examples

  • Conditioning on noisy observations is computable:
    • Assume \(U, V\) are computable random variables
    • Assume \(E\) is a computable random variable which has a bounded computable density and is independent of \(U, V\)
    • Can think of \(U + E\) as the corruption of an idealized measurement \(U\) by independent source of additive error \(E\)
    • \(\mathbb{P}(U, V|U + E)\) is computable

Discussions