Computability of Conditional Probability

Division of Applied Math, Brown University

Guangyao (Stannis) Zhou

April 25th, 2019

Outline

Overview
Computable Probability Theory
Computable Conditional Distributions
Negative and Positive Examples
Discussions

Overview

Computability of Probability Distributions

Class of computable probability distributions
- A probabilistic Turing machine
- Can sample to arbitrary accuracy
Formalize the concept of computability using theory of computable metric spaces
Central message:
There exist computable joint distributions with noncomputable conditional distributions

A Simple Example

Latent random variables:
- \(C\sim\text{Bernoulli}(\frac{1}{2})\)
- \(U\sim\text{Uniform}([0,1])\)
- \(N\sim\text{Geometric}(\frac{1}{2})\)
\(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
Observed random variable: \[\begin{equation} X = \begin{cases} U & \text{if }C = 1 \\ r_N & \text{otherwise} \end{cases} \end{equation}\]
Claim: \(\mathbb{P}(C|X)\) is not "computable"

A First Look at the Simple Example

Some direct calculations
- \(\mathbb{P}(C=0|X\text{ rational}) = 1\)
- \(\mathbb{P}(C=0|X\text{ irrational}) = 0\)
The conditional "probability kernel":
- \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
- The Dirichlet function, a nowhere continuous function
Computability implies continuity, so discontinuity implies noncomputability

Computable Probability Theory

Computably Enumerable (c.e.) and C.e. Reals

A (potentially countably infinite) set is computably enumerable (c.e.) when there is a program that outputs every element of the set eventually
A real number \(r\) is c.e. when both \(\{q\in\mathbb{Q}: q < r\}\) and \(\{q\in\mathbb{Q}: q > r\}\) are c.e.
Intuitively, \(r\) is c.e. if we can write a program which can approximate \(r\) to arbitrary accuracy

Computable Metric Spaces

A computable metric space is a triple \((S, \delta, \mathcal{D})\)
- \(\delta\) is a metric on the set \(S\)
- \((S, \delta)\) is a complete separable metric space
- \(\mathcal{D} = \{s_i\}_{i\in\mathbb{N}}\) is an enumeration of a dense subset of \(S\), called ideal points
- The real numbers \(\delta(s_i, s_j)\) are c.e./computable, uniformly in \(i\) and \(j\)
The ideal balls of \(S\): \[\begin{equation} \mathcal{B}_S = \{B(s_i, q_j): s_i\in\mathcal{D}, q_j\in\mathbb{Q}, q_j > 0\} \end{equation}\] \(B(s_i, q_j)\) denotes the ball of radius \(q_j\) centered at \(s_i\).

Computable Point

A point \(x\in S\) is computable when there's a program that enumerates a sequence \(\{x_i\}\subset\mathcal{D}\) where \[\delta(x_i, x)<2^{-i}, \forall i\]
Intuitively, a point is computable when we can write a program that approximates the point to arbitrary accuracy

Computable Partial Function

An open set \(U\) is c.e. open when there is some c.e. set \(E\subset\mathbb{N}\) such that \(U=\cup_{i\in E}B_i\)
Assume \((S, \delta_S, \mathcal{D}_S)\) and \((T, \delta_T, \mathcal{D}_T)\) are computable metric spaces, and \(\{B_n\}_{n\in\mathbb{N}}\) is an enumeration of \(\mathcal{B}_T\)
A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]

Computable Partial Function

A function \(f: S \rightarrow T\) is said to be computable on \(R\subset S\) if there is a computable sequence \(\{U_n\}_{n\in\mathbb{N}}\) of c.e. open sets \(U_n \subset S\) such that \[f^{-1}(B_n)\cap R=U_n\cap R, \forall n\in\mathbb{N}\]
Computability implies continuity
Intuition: given a computable \(x\), we can write program to approximate \(f(x)\) to arbitrary accuracy

Computable Random Variables

Underlying probability space: \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\)
- \(\mathbb{P}\): (infinite) product measure of the uniform distribution on \(\{0, 1\}\)
- An infinite sequence of fair coin tosses
Random variable \(X\) taking values in computable metric space \(S\)
- \(X\) is a function from \((\{0, 1\}^{\infty}, \mathcal{F}, \mathbb{P})\) to \(S\)
- \(X\) is a computable random variable if this function is computable
Common random variables are all computable

Computable Probability Measures

\((S, \delta, \mathcal{D}_S)\): a computable metric space
\(\mathcal{M}_1(S)\): set of all (Borel) probability measures on \(S\)
Define \(\mathcal{D}_P\subset \mathcal{M}_1(s)\) to be the set of probabilities measures that are concentrated on a finite subset of \(\mathcal{D}_S\) and have rational measure for each atom
Using the so-called Prokhorov metric \(\delta_P\), we can define a computable metric space \((\mathcal{M}_1(S), \delta_P, \mathcal{D}_p)\)
We say \(\mu\in\mathcal{M}_1(S)\) is a computable probability measure if it's a computable point in this space

Computable Probability Measures

All computable random variables induce computable probability measures
For any computable probability measures, we can identify a sequence of i.i.d. computable random variables

Computable Conditional Distributions

Conditioning w.r.t. a Single Event

Let \(S\) be a measurable space and let \(\mu\in\mathcal{M}_1(S)\)
For some event \(A\subset S\) s.t. \(\mu(A) > 0\), the conditional probability of \(B\subset S\) given \(A\) is \[\mu(B|A)=\frac{\mu(B\cap A)}{\mu(A)}\]

Conditioning w.r.t. a Single Event

Assume \((S, \mu)\) is a computable probability space
- \(\mu\) is a computable probability measure
- \(A\subset S\) is called almost decidable if there are c.e. open sets \(U\subset A\), \(V\subset S/A\), and \(\mu(U)+\mu(V)=1\)
- \(\mu\) is computable \(\iff\) \(\mu(A)\) is c.e. real whenever \(A\) is almost decidable
\(A\) is almost decidable \(\implies\) \(\mu(\cdot|A)\) is a computable probability measure

Probability Kernel

Let \(S\) and \(T\) be measurable spaces
A function \(k: S\times\mathcal{B}_T\rightarrow[0, 1]\) is called a probability kernel when
- \(\forall s\in S\), \(k(s, \cdot)\) is a probability measure on \(T\)
- \(\forall B\in\mathcal{B}_T\), the function \(k(\cdot, B)\) is measurable

Regular Conditional Distribution

Let \(X\) and \(Y\) be random variables in measurable spaces \(S\) and \(T\)
A probability kernel \(k\) is called a regular version of the conditional distribution \(\mathbb{P}(Y\in\cdot|X)\) when \[\mathbb{P}(X\in A, Y\in B)=\int_A k(x, B)\mathcal{P}_X(d x)\] for all measurable sets \(A\subset S\) and \(B\subset T\)

Computable Probability Kernel

\(S\) and \(T\): computable metric spaces
\(k: S\times\mathcal{B}_T\rightarrow[0, 1]\): a probability kernel from \(S\) to \(T\)
Define \(\phi_k: S\rightarrow \mathcal{M}_1(T)\) to be \(\phi_k(s) = k(s, \cdot)\)
\(k\) is a computable probability kernel when \(\phi_k\) is a computable function

Negative and Positive Examples

Back to the Simple Example

Latent random variables:
- \(C\sim\text{Bernoulli}(\frac{1}{2})\)
- \(U\sim\text{Uniform}([0,1])\)
- \(N\sim\text{Geometric}(\frac{1}{2})\)
\(\{r_i\}_{i\in\mathbb{N}}\): computable enumeration of \(\mathbb{Q}\cap(0, 1)\)
Observed random variable: \[\begin{equation} X = \begin{cases} U & \text{if }C = 1 \\ r_N & \text{otherwise} \end{cases} \end{equation}\]

Back to the Simple Example

Want to study \(\mathbb{P}(C|X)\)
Using previous notations, \(S=[0, 1]\), \(T=\{0, 1\}\)
\(\mathcal{M}_1(T)\) is equivalent to \([0, 1]\)
The conditional probability kernel:
- \(k(x, \{0\}) = \begin{cases}1 & x\text{ rational }\\0 & x\text{ irrational}\end{cases}\)
- The Dirichlet function, a nowhere continuous function
Discontinuity implies noncomputability

More Negative Examples

An example where the conditional distribution has a version that's continuous almost everywhere, yet still not computable
An example where the conditional distribution has a version that's continuous everywhere, yet still not computable

Positive Examples

Conditioning on discrete random variables is always computable
Conditioning is computable when there's a conditional density

Positive Examples

Conditioning on noisy observations is computable:
- Assume \(U, V\) are computable random variables
- Assume \(E\) is a computable random variable which has a bounded computable density and is independent of \(U, V\)
- Can think of \(U + E\) as the corruption of an idealized measurement \(U\) by independent source of additive error \(E\)
- \(\mathbb{P}(U, V|U + E)\) is computable

Computability of Conditional Probability

Overview

Computability of Probability Distributions

A Simple Example

A First Look at the Simple Example

Computable Probability Theory

Computably Enumerable (c.e.) and C.e. Reals

Computable Metric Spaces

Computable Point

Computable Partial Function

Computable Partial Function

Computable Random Variables

Computable Probability Measures

Computable Probability Measures

Computable Conditional Distributions

Conditioning w.r.t. a Single Event

Conditioning w.r.t. a Single Event

Probability Kernel

Regular Conditional Distribution

Computable Probability Kernel

Negative and Positive Examples

Back to the Simple Example

Back to the Simple Example

More Negative Examples

Positive Examples

Positive Examples

Discussions