Computability of Conditional Probability
Division of Applied Math, Brown University
Guangyao (Stannis) Zhou
April 25th, 2019
Computability of Probability Distributions
A Simple Example
- Latent random variables:
- \(C\sim\text{Bernoulli}(\frac{1}{2})\)
- \(U\sim\text{Uniform}([0,1])\)
- \(N\sim\text{Geometric}(\frac{1}{2})\)
- \(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
- Observed random variable:
\[\begin{equation}
X = \begin{cases}
U & \text{if }C = 1 \\
r_N & \text{otherwise}
\end{cases}
\end{equation}\]
- Claim: \(\mathbb{P}(C|X)\) is not "computable"
A First Look at the Simple Example
- Some direct calculations
- \(\mathbb{P}(C=0|X\text{ rational}) = 1\)
- \(\mathbb{P}(C=0|X\text{ irrational}) = 0\)
- The conditional "probability kernel":
- \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
- The Dirichlet function, a nowhere continuous function
- Computability implies continuity, so discontinuity implies noncomputability
Computable Probability Theory
Computably Enumerable (c.e.) and C.e. Reals
- A (potentially countably infinite) set is computably enumerable (c.e.) when there is a program that outputs every element of the set eventually
- A real number \(r\) is c.e. when both \(\{q\in\mathbb{Q}: q < r\}\) and \(\{q\in\mathbb{Q}: q > r\}\) are c.e.
- Intuitively, \(r\) is c.e. if we can write a program which can approximate \(r\) to arbitrary accuracy
Computable Metric Spaces
- A computable metric space is a triple \((S, \delta, \mathcal{D})\)
- \(\delta\) is a metric on the set \(S\)
- \((S, \delta)\) is a complete separable metric space
- \(\mathcal{D} = \{s_i\}_{i\in\mathbb{N}}\) is an enumeration of a dense subset of \(S\), called ideal points
- The real numbers \(\delta(s_i, s_j)\) are c.e./computable, uniformly in \(i\) and \(j\)
- The ideal balls of \(S\):
\[\begin{equation}
\mathcal{B}_S = \{B(s_i, q_j): s_i\in\mathcal{D}, q_j\in\mathbb{Q}, q_j > 0\}
\end{equation}\]
\(B(s_i, q_j)\) denotes the ball of radius \(q_j\) centered at \(s_i\).
Computable Point
- A point \(x\in S\) is computable when there's a program that enumerates a sequence \(\{x_i\}\subset\mathcal{D}\) where \[\delta(x_i, x)<2^{-i}, \forall i\]
- Intuitively, a point is computable when we can write a program that approximates the point to arbitrary accuracy
Computable Partial Function
- An open set \(U\) is c.e. open when there is some c.e. set \(E\subset\mathbb{N}\) such that \(U=\cup_{i\in E}B_i\)
- Assume \((S, \delta_S, \mathcal{D}_S)\) and \((T, \delta_T, \mathcal{D}_T)\) are computable metric spaces, and \(\{B_n\}_{n\in\mathbb{N}}\) is an enumeration of \(\mathcal{B}_T\)
- A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]
Computable Partial Function
- A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]
- Computability implies continuity
- Intuition: given a computable \(x\), we can write program to approximate \(f(x)\) to arbitrary accuracy
Computable Random Variables
- Underlying probability space: \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\)
- \(\mathbb{P}\): (infinite) product measure of the uniform distribution on \(\{0, 1\}\)
- An infinite sequence of fair coin tosses
- Random variable \(X\) taking values in computable metric space \(S\)
- \(X\) is a function from \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\) to \(S\)
- \(X\) is a computable random variable if this function is computable
- Common random variables are all computable
Computable Probability Measures
- \((S, \delta, \mathcal{D}_S)\): a computable metric space
- \(\mathcal{M}_1(S)\): set of all (Borel) probability measures on \(S\)
- Define \(\mathcal{D}_P\subset \mathcal{M}_1(s)\) to be the set of probabilities measures that are concentrated on a finite subset of \(\mathcal{D}_S\) and have rational measure for each atom
- Using the so-called Prokhorov metric \(\delta_P\), we can define a computable metric space \((\mathcal{M}_1(S), \delta_P, \mathcal{D}_p)\)
- We say \(\mu\in\mathcal{M}_1(S)\) is a computable probability measure if it's a computable point in this space
Computable Probability Measures
- All computable random variables induce computable probability measures
- For any computable probability measures, we can identify a sequence of i.i.d. computable random variables
Computable Conditional Distributions
Conditioning w.r.t. a Single Event
- Let \(S\) be a measurable space and let \(\mu\in\mathcal{M}_1(S)\)
- For some event \(A\subset S\) s.t. \(\mu(A) > 0\), the conditional probability of \(B\subset S\) given \(A\) is \[\mu(B|A)=\frac{\mu(B\cap A)}{\mu(A)}\]
Conditioning w.r.t. a Single Event
- Assume \((S, \mu)\) is a computable probability space
- \(\mu\) is a computable probability measure
- \(A\subset S\) is called almost decidable if there are c.e. open sets \(U\subset A\), \(V\subset S/A\), and \(\mu(U)+\mu(V)=1\)
- \(\mu\) is computable \(\iff\) \(\mu(A)\) is c.e. real whenever \(A\) is almost decidable
- \(A\) is almost decidable \(\implies\) \(\mu(\cdot|A)\) is a computable probability measure
Probability Kernel
- Let \(S\) and \(T\) be measurable spaces
- A function \(k: S\times\mathcal{B}_T\rightarrow[0, 1]\) is called a probability kernel when
- \(\forall s\in S\), \(k(s, \cdot)\) is a probability measure on \(T\)
- \(\forall B\in\mathcal{B}_T\), the function \(k(\cdot, B)\) is measurable
Regular Conditional Distribution
- Let \(X\) and \(Y\) be random variables in measurable spaces \(S\) and \(T\)
- A probability kernel \(k\) is called a regular version of the conditional distribution \(\mathbb{P}(Y\in\cdot|X)\) when \[\mathbb{P}(X\in A, Y\in B)=\int_A k(x, B)\mathcal{P}_X(d x)\] for all measurable sets \(A\subset S\) and \(B\subset T\)
Computable Probability Kernel
- \(S\) and \(T\): computable metric spaces
- \(k: S\times\mathcal{B}_T\rightarrow[0, 1]\): a probability kernel from \(S\) to \(T\)
- Define \(\phi_k: S\rightarrow \mathcal{M}_1(T)\) to be \(\phi_k(s) = k(s, \cdot)\)
- \(k\) is a computable probability kernel when \(\phi_k\) is a computable function
Negative and Positive Examples
Back to the Simple Example
- Latent random variables:
- \(C\sim\text{Bernoulli}(\frac{1}{2})\)
- \(U\sim\text{Uniform}([0,1])\)
- \(N\sim\text{Geometric}(\frac{1}{2})\)
- \(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
- Observed random variable:
\[\begin{equation}
X = \begin{cases}
U & \text{if }C = 1 \\
r_N & \text{otherwise}
\end{cases}
\end{equation}\]
Back to the Simple Example
- Want to study \(\mathbb{P}(C|X)\)
- Using previous notations, \(S=[0, 1]\), \(T=\{0, 1\}\)
- \(\mathcal{M}_1(T)\) is equivalent to \([0, 1]\)
- The conditional probability kernel:
- \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
- The Dirichlet function, a nowhere continuous function
- Discontinuity implies noncomputability
More Negative Examples
- An example where the conditional distribution has a version that's continuous almost everywhere, yet still not computable
- An example where the conditional distribution has a version that's continuous everywhere, yet still not computable
Positive Examples
- Conditioning on discrete random variables is always computable
- Conditioning is computable when there's a conditional density
Positive Examples
- Conditioning on noisy observations is computable:
- Assume \(U, V\) are computable random variables
- Assume \(E\) is a computable random variable which has a bounded computable density and is independent of \(U, V\)
- Can think of \(U + E\) as the corruption of an idealized measurement \(U\) by independent source of additive error \(E\)
- \(\mathbb{P}(U, V|U + E)\) is computable