by DeGroot and Schervish
Scott Mueller (lvl 19)Convergence in expectation
Front
Active users
4
All-time users
4
Favorites
0
Last updated
4 years ago
Date created
Oct 5, 2020
Unsectioned
(43 cards)
Convergence in expectation
\(X_n \rightarrow X\) in expectation if \(E(|X_n - X|) \rightarrow 0\). Implies \(E(X_n) \rightarrow E(X)\).
Variance of a random variable vector
$$\text{Var}(\mathbf{X}) = E(\mathbf{X}\mathbf{X}^\top) - \mathbf{\mu}\mathbf{\mu}^\top$$
PCA error
With \(\mathbf{x} = \mathbf{\mu} + \sum_{j=1}^p z_j\mathbf{v}_j\) and K-term approximation \(\mathbf{\hat{x}} = \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j\):
$$\begin{aligned}e_K &= \Vert \mathbf{x} - \mathbf{\hat{x}} \Vert^2 = \sum_{j=K+1}^p z_j^2,\\E(e_K) &= \sum_{j=K+1}^p E[\mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top\mathbf{v}_j]\\&= \sum_{j=K+1}^p \mathbf{v}_j^\top S_x \mathbf{v}_j\\&= \sum_{j=K+1}^p \lambda_j\end{aligned}$$
Variance of random vector whose components are i.i.d.
$$\sigma^2I$$
PDF of multivariable function of \(\mathbf{X}\) with joint PDF \(f_X(x)\)
Like scalar case of \(f_Y(y) = f_X(h(y))|h'(y)|\), \(\mathbf{Y} = g(\mathbf{X})\). Let \(\mathbf{X} = h(\mathbf{Y})\) be an inverse:
$$f_Y(\mathbf{y}) = f_X(h(\mathbf{y}))\left|\det{\frac{\partial h(y)}{\partial y}}\right|$$
Dirac delta function
$$\delta(x) = \begin{cases}+\infty &\text{if }x = 0,\\0 &\text{if }x \neq 0,\end{cases}$$
$$\int_a^b \delta(x - x_0) \,dx = \begin{cases}1 &\text{if }x_0 \in [a,b],\\0 &\text{if }x_0 \not\in [a,b],\end{cases}$$
If \(f(x)\) is continuous at \(x_0\):
$$\int_{-\infty}^\infty f(x) \delta(x - x_0) \,dx = f(x_0)$$
Expectation and variance of \(\mathbf{Y} = \mathbf{AX} + \mathbf{b}\)
$$\begin{aligned}E(\mathbf{Y}) &= \mathbf{A}E(\mathbf{X}) + \mathbf{b},\\\text{Var}(\mathbf{Y}) &= \mathbf{A}\text{Var}(\mathbf{X})\mathbf{A}^\top\end{aligned}$$
Proportion of variance (PoV)
Fraction of variance explained by first \(k\) PCs:
$$\text{PoV}(k) = 1 - \frac{\sum_{i=k+1}^p \lambda_i}{\sum_{i=1}^p \lambda_i} = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^p \lambda_i}$$
Joint PDF of \(Y = (Y_1, Y_2)\) where \(Y_1\) and \(Y_2\) are functions of \(X = (X_1, X_2)\)
Perron-Frobenius theorem
For finite, aperiodic, irreducible Markov chains with transition matrix \(\mathbf{P}\):
Find eigenvalues for matrix \(A\)
Roots of \(\det(\lambda I - A)\)
PCA transform
$$z_j = \mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})$$
$$\frac{\partial \mathbf{v}^\top\mathbf{A}\mathbf{v}}{\partial v}$$
$$2\mathbf{A}\mathbf{v}$$
Linear difference equation
With equations of the form \(\theta_{k+1} = c\theta_k + b\), where \(c\) and \(b\) are constants, \(\theta_k\) can be solved with:
$$\theta_k = Ac^k + \frac{b}{1-c},$$
where \(A\) is a constant that can be solved by plugging in \(\theta_0\)
PCA inverse transform
$$\mathbf{x} \approx \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$$
Covariance matrix \(\mathbf{S}\) and its inverse for bivariate Gaussian random vector
$$\begin{aligned}\mathbf{S} &= \begin{bmatrix}\sigma_1^2 & \sigma_{12}\\\sigma_{12} & \sigma_2^2\end{bmatrix}\\\mathbf{S}^{-1} &= \frac1{\sigma_1^2\sigma_2^2(1 - \rho^2)}\begin{bmatrix}\sigma_2^2 & -\rho\sigma_1\sigma_2\\-\rho\sigma_1\sigma_2 & \sigma_1^2\end{bmatrix}\end{aligned}$$
Positive definite and positive semi-definite matrices
\(\mathbf{Q} \in \mathbb{R}^{N \times N}\) is:
Mean squared distance
$$\begin{aligned}E(\Vert \mathbf{X} - \mathbf{Y} \Vert^2) &= \sum_j E\left[(X_j - Y_j)^2\right]\\E(\Vert \mathbf{X} - \mathbf{\mu} \Vert^2) &= \text{Tr}(\mathbf{S})\\&=\sum_j \lambda_j\end{aligned}$$
Expected value of absolute value of standard Gaussian variable \(Z\)
$$E(|Z|) = \sqrt{\frac2{\pi}}$$
Compute HMM non-causal estimate
Stationary Markov evolution
With probability of initial state \(\mathbf{\alpha}_0\):
$$\mathbf{\alpha}_{k+1}^\top = \mathbf{\alpha}_k^\top \mathbf{P}$$
$$\mathbf{\alpha}_k^\top = \mathbf{\alpha}_0^\top \mathbf{P}^k$$
Test for whether \(\mathbf{X} = (X_1, \ldots, X_d)\) is jointly Gaussian
\(\mathbf{X} = (X_1, \ldots, X_d)\) is jointly Gaussian iff linear combinations of \(X_j\) are Gaussian, or
$$Z = \mathbf{a}^\top\mathbf{X} = \sum_ia_iX_i$$
is a scalar Gaussian for all vectors \(\mathbf{a} \in \mathbb{R}^d\)
Singular value decomposition (SVD)
SVD is \(\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top\):
Forward term \(\alpha_k(i)\) with regards to Hidden Markov Model:
\(\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow\)
and
\(X_{k-1} \rightarrow Y_{k-1}\)
\(X_k \rightarrow Y_k\)
\(X_{k+1} \rightarrow X_{k+1}\)
$$\alpha_k(i) = P(X_k = i, y_0^k)$$
Irreducible set of states
Irreducible set if all pairs of states in the set communicate
Valid covariance matrix
\(S\) must be positive semi-definite, so \(\det(S) \geqslant 0\)
Properties of \(N\) orthonormal eigenvectors: \(\mathbf{V} = [\mathbf{v}_1, \ldots, \mathbf{v}_N] \in \mathbb{R}^{N \times N}\)
Determinant of \(\begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}\)
$$aei + bfg + cdh - gec - hfa - idb$$
Go diagonally down from \(a\), \(b\), and \(c\), multiply and add. Go diagonally up from \(g\), \(h\), and \(i\), multiply and subtract.
Sample variance matrix
With \(n\) samples each having \(p\) features, \(\mathbf{x}_i \in \mathbb{R}^p\):
$$\begin{aligned}\mathbf{S}_x &= \frac1{n} \sum_{i=1}^n (\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\\&=E\left[(\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\right]\end{aligned}$$
Non-causal estimate
$$P(X_k = i | y_0^{T-1}) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$$
Inverse of \(A = \begin{bmatrix}a&b\\c&d\end{bmatrix}\)
$$\frac1{\text{det}(A)}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}$$
Inverse doesn't exist if \(A\) is singular (determinant is \(0\))
HMM backward term recursion
$$\beta_k(i) = \sum_j \beta_{k+1}(j)\gamma_{k+1}(j)P_k(i,j)$$
with initial condition
$$\beta_T(i) = P(y_T | X_T = i) = 1$$
Matrix square root
With the eigenvalue decomposition of \(S = UDU^\top\):
$$S^\frac12 = UD^\frac12U^\top$$
Vector Gaussian mixture model mean and variance
Just like a scalar mixture distribution:
$$\begin{aligned}E(\mathbf{x}) &= \sum_i \mathbf{\mu}_iq_i = \mathbf{\mu}\\\text{Var}(\mathbf{x}) &= \sum_i q_i\left[\mathbf{S}_i + (\mathbf{\mu} - \mathbf{\mu}_i)(\mathbf{\mu} - \mathbf{\mu}_i)^\top\right]\\&= \mathbf{S}\end{aligned}$$
Multivariable Gaussian PDF with sample variance
$$f_X(\mathbf{x}) = \frac{e^{-\frac12(\mathbf{x} - \mathbf{\mu})^\top\mathbf{S}^{-1}(\mathbf{x} - \mathbf{\mu})}}{(2\pi)^{\frac{d}2}\sqrt{\text{det}(\mathbf{S})}},$$
where \(d\) is the length of \(\mathbf{X}\) and \(\mathbf{S}\) is the sample variance
Stochastic matrix
Satisfies
Generate i.i.d. multivariable Gaussian samples \(\mathbf{x}_i \in \mathbb{R}^d, \mathbf{x}_i\sim\mathcal{N}(\mu, S)\)
HMM forward term recursion
$$\alpha_{k+1}(j) = \gamma_{k+1}(j) \sum_i P_k(i, j)\alpha_k(i)$$
where
$$\begin{aligned}P_k(i, j) &= P(X_{k+1} = j | X_k = i)\\\gamma_{k+1}(j) &= P(y_{k+1} | X_{k+1} = j)\end{aligned}$$
with initial condition
$$\alpha_k(i) = P(X_0 = i)\gamma_0(i)$$
Find eigenvectors of matrix \(\mathbf{A}\)
Properties of symmetric matrix \(S\)
$$\frac{\partial \mathbf{v}^\top\mathbf{v}}{\partial v}$$
$$2\mathbf{v}$$
Backward term \(\beta_k(i)\) with regards to Hidden Markov Model:
\(\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow\)
and
\(X_{k-1} \rightarrow Y_{k-1}\)
\(X_k \rightarrow Y_k\)
\(X_{k+1} \rightarrow X_{k+1}\)
$$\beta_k(i) = P(y_{k+1}^{T-1}|X_k = i)$$
Stationary transition matrix
AKA homogeneous, graphical drawing called state transition diagram:
$$P_{ij}(k) = P_{ij}$$
Chapter 1
(2 cards)
Probability axioms
Bonferroni inequality
$$P\left(\bigcap_{i=1}^n A_i\right) \ge 1 - \sum_{i=1}^n P\left(A_i^c\right)$$
Chapter 3
(12 cards)
Inverse CDF method to generate random variables
Convergence in distribution
At every point where \(F_X(x)\) is continuous, \(\forall x\):
$$F_{X_n}(x) = P(X_n \leqslant x) \to F_X(x) = P(X \leqslant x)$$
Convolution of two PDFs \(f_X(x)\) and \(f_Y(y)\)
$$f_Z(z) = (f_X * f_Y)(z) = \int_{-\infty}^\infty f_X(t)f_Y(z - t) \,dt$$
PDF for an additive channel
If \(Y = X + W\) with \(X, W\) independent, then
$$f_{Y|X}(y|x) = f_W(y-x)$$
3 methods to compute PDF of \(Y = g(X)\)
PDF of an invertible function of a random variable
Let \(Y = g(X)\) with \(g(x)\) invertible so that \(X = g^{-1}(Y)\):
$$f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{\partial g^{-1}(y)}{\partial y}\right|$$
Relationship between a PDF and a probability
$$P(X \in [a, a + \epsilon]) \approx f_X(a) \cdot \epsilon$$
or
$$f_X(a) = \lim_{\epsilon \to 0} \frac{P(X \in [a, a + \epsilon])}{\epsilon}$$
Leibnitz rule
$$\begin{aligned}\frac{d}{dz}\int_{a(z)}^{b(z)} h(x, z) \,dx &= h(b(z), z)b'\\&- h(a(z), z)a'\\&+ \int_{a(z)}^{b(z)} \frac{\partial h(x, z)}{\partial z} \,dx\end{aligned}$$
Inverse CDF method of computing PDF for a single random variable
Convergence in Probability
Random variables \(X_n \to X\) in probability if \(\forall \epsilon > 0\):
$$\lim_{n \to \infty} P(|X_n - X| \geqslant \epsilon) = 0$$
or, given any \(\epsilon\) and \(\delta > 0\), \(\exists N > 0\) such that:
$$P(|X_n - X| \geqslant \epsilon) < \delta, \forall n > N$$
PDF of a linear function of a random variable
Let \(X\) be a random variable with PDF \(f_X(x)\) and \(Y = aX + b\) with PDF \(f_Y(y)\):
$$f_Y(y) = \frac1{|a|}f_X\left(\frac{y - b}{a}\right)\text{ for }-\infty < y < \infty$$
Quantile function of the distribution of \(X\)
\(F^{-1}(p)\) is defined as the smallest value \(x\) such that \(F(x) \geqslant p\) for \(0 < p < 1\)
Chapter 4
(20 cards)
Optimal estimate and resulting MSE for constant estimator
$$\begin{aligned}\hat{Y} &= E(Y),\\\text{MSE} &= \text{Var}(Y)\end{aligned}$$
Optimal parameters for linear estimator \(\hat{Y} = \beta_1X + \beta_0\)
$$\begin{aligned}\beta_1 &= \frac{\sigma_{XY}}{\sigma_X^2},\\\beta_0 &= E(Y) - \beta_1E(X)\end{aligned}$$
Jensen's inequality
Let \(g\) be a convex function and let \(\mathbf{X}\) be a random vector with finite mean:
$$E[g(\mathbf{X})] \geqslant g(E(\mathbf{X}))$$
Minimum MSE for linear estimation
$$\sigma_Y^2(1 - \rho_{XY}^2)$$
$$\int_0^1 p^k(1-p)^l \,dp$$
$$\frac{k!l!}{(k + l + 1)!}$$
Convex function
For every \(\alpha \in (0, 1)\) and every \(\mathbf{x}\) and \(\mathbf{y}\),
$$g[\alpha\mathbf{x} + (1 - \alpha)\mathbf{y}] \geqslant \alpha g(\mathbf{x}) + (1 - \alpha) g(\mathbf{y})$$
Law of Total Probability for Variance
$$\begin{aligned}\text{Var}_Y(Y) &= E_X[\text{Var}_{Y|X}(Y|X)]\\&+ \text{Var}_X[E_{Y|X}(Y|X)]\end{aligned}$$
Sample mean of i.i.d. \(X_i\) with \(E(X) = \mu\) and \(\text{Var}(X) = \sigma^2\)
$$\begin{aligned}S_n &= \frac1{n} \sum_{i=1}^n X_i,\\E(S_n) &= \mu,\\\text{Var}(S_n) &= \frac{\sigma^2}{n}\end{aligned}$$
Variance of sum of pairwise uncorrelated \(a_1X_1, \ldots, a_dX_d\)
$$\text{Var}(a_1X_1 + \ldots + a_dX_d) = \sum_{i=1}^d a_i^2\text{Var}(X_i)$$
Moment generating function
Let \(X\) be random variable. For each real number \(t\),
$$\psi(t) = E(e^{tX})$$
Expectation of non-negative integer random variable
$$E(X) = \sum_{n=1}^\infty Pr(X \geqslant n)$$
Covariance of linear relationship \(Y = aX + b\)
$$\sigma_{XY} = a\sigma_X^2$$
Special case of nested conditional expectation with \(g(X,Y) = h(Y) \cdot f(X)\)
$$E_{XY}[h(Y) \cdot f(X)] = E_Y[h(Y) \cdot E_{X|Y}(f(X)|Y)]$$
Cauchy-Schwarz Inequality
\(X\) and \(Y\) are random variables with finite variance:
$$[\text{Cov}(X, Y)]^2 \leqslant \sigma_X^2\sigma_Y^2$$
and
$$-1 \leqslant \rho(X, Y) \leqslant 1$$
Expectation of non-negative random variable with c.d.f. \(F\)
$$E(X) = \int_0^\infty [1 - F(x)] \,dx$$
Law of Total Probability for Expectations
$$E_X[E_{Y|X}(Y|X)] = E_Y(Y)$$
Nested conditional expectation of joint random variables
$$E_{XY}[g(X,Y)] = E_X[E_{Y|X}(g(X,Y)|X)]$$
Mixture distribution
Let \(X = 1, 2, \ldots, M\) with \(P(X=i)=q_i, E(Y|X=i)=\mu_i, \text{Var}(Y|X=i)=\sigma_i^2\):
$$\begin{aligned}E(Y) &= \mu_y = \sum_i q_i\mu_i,\\\text{Var}(Y) &= \sum_i q_i\left[\sigma_i^2+(\mu_i-\mu_y)^2\right]\end{aligned}$$
Schwarz Inequality
$$[E(UV)]^2 \leqslant E(U^2)E(V^2)$$
Optimal estimate and resulting MSE for MMSE estimator
$$\begin{aligned}\hat{Y} &= E(Y|X),\\\text{MSE} &= E_X[\text{Var}_{Y|X}(Y|X)]\end{aligned}$$
Chapter 5
(11 cards)
Relationship of Poisson and Binomial distributions as \(n \to \infty\)
Total number of arrivals:
$$\lim_{n \to \infty} \text{Binom}\left(n, \frac{\lambda}{n}\right) = \text{Poisson}(\lambda)$$
Correlation and independence within jointly Gaussian vector
Maclaurin series for \(e^x\)
$$e^x = \sum_{k=0}^\infty \frac{x^k}{k!}$$
Central Limit Theorem
Let \(Z_i\) be i.i.d. random variables, \(\mu = E(Z_i)\), and \(\sigma^2 = \text{Var}(Z_i)\):
$$\begin{aligned}\lim_{n \to \infty} \frac1{\sqrt{n}}\sum_{i=1}^n (Z_i - \mu) &\sim \mathcal{N}(0, \sigma^2),\\\lim_{n \to \infty} \bar{Z} &\sim \mathcal{N}(\mu, \frac{\sigma^2}{n}),\\\lim_{n \to \infty} \sum_i Z_i &\sim \mathcal{N}(n\mu, n\sigma^2)\end{aligned}$$
Tail bounds on Standard Normal CDF
$$\frac1{\sqrt{2\pi}z}\left(1 - \frac1{z^2}\right)e^{-\frac{z^2}2} \leqslant 1 - \Phi(z) \leqslant \frac1{\sqrt{2\pi}z}e^{-\frac{z^2}2}$$
Negative binomial distribution
The number \(X\) of failures that occur before the \(r\)th success has p.d.f.:
$$f(x|r, p) = \binom{r + x - 1}{x} p^r (1 - p)^x$$
for \(x = 0, 1, 2, \ldots\) or \(0\) otherwise.
$$\begin{aligned}E(X) &= \frac{r(1 - p)}{p}\\Var(X) &= \frac{r(1 - p)}{p^2}.\end{aligned}$$
2nd moment of normal distribution
$$\mu^2 + \sigma^2$$
Exponential distribution
Time between events in a Poisson point process (events occur continuously and independently at a constant average rate). With \(\lambda\) representing event rate:
$$f(x; \lambda) = \begin{cases}\lambda e^{-\lambda x},&x \geqslant 0\\0,&\text{ otherwise}.\end{cases}$$
Mean is \(\frac1{\lambda}\), variance is \(\frac1{\lambda^2}\)
Linear combination of bivariate normal mean and variance. Let \(X_1\) and \(X_2\) be two random bivariate normal variables, what is the mean and variance of:
$$a_1X_1 + a_2X_2 + b$$
$$\begin{aligned}mean&=a_1\mu_1 + a_2\mu_2 + b\\variance&=a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + 2a_1a_2\rho\sigma_1\sigma_2\end{aligned}$$
\(e^x\) as a limit
$$e^x = \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n$$
Moments of exponential variables
$$E(X^n) = \frac{n!}{\lambda^n}$$
Chapter 6
(7 cards)
Approximation of \(P(S_n \leqslant c)\) where \(S_n = X_1 + \ldots + X_n\) and \(X_i\) are i.i.d. with mean \(\mu\) and variance \(\sigma^2\)
Use Central Limit Theorem
Weak law of large numbers
If \(X_k\) are uncorrelated with same mean and variance, then \(\overline{X}_n \rightarrow E(X_k) = \mu\) in probability
Strong law of large numbers
If \(X_k\) are i.i.d. and \(E(|X_k|) < \infty\), then \(\overline{X}_n \rightarrow E(X)\) almost surely. Also applies to functions, \(\overline{g(X)}_n \rightarrow E[g(X)]\) if \(E(|g(X)|) < \infty\).
Chebyshev inequality
If \(X\) is a random variable for which \(\text{Var}(X)\) exists, \(\forall t>0\):
$$P(|X - E(X)| > t) \leqslant \frac{\text{Var}(X)}{t^2}$$
Delta method
Let \(Y_1, Y_2, \ldots\) be a sequence of random variables, \(F^*\) be a continuous CDF, \(\theta\) be a real number, and \(a_1, a_2, \ldots\) be a sequence of positive numbers increasing to \(\infty\).
If \(a_n(Y_n - \theta)\) converges in distribution to \(F^*\) and \(\alpha\) is a function with continuous derivative such that \(\alpha'(\theta) \ne 0\), then the following converges in distribution to \(F^*\):
$$\frac{a_n}{\alpha'(\theta)}(\alpha(Y_n) - \alpha(\theta))$$
Almost sure convergence
\(X_n \rightarrow X\) almost surely if
$$P\left(\lim_{n \to \infty} X_n = X\right) = 1$$
Markov inequality
If \(X\) is a random variable with \(P(X \geqslant 0) = 1\), \(\forall t>0\):
$$P(X \geqslant t) \leqslant \frac{E(X)}{t}$$