# Probability and Statistics (Classic Version), 4th Edition

by DeGroot and Schervish

Scott Mueller (lvl 19)
Unsectioned

### Preview this deck

Convergence in expectation

Front

### 0.0

0 reviews

 5 0 4 0 3 0 2 0 1 0

Active users

4

All-time users

4

Favorites

0

Last updated

3 years ago

Date created

Oct 5, 2020

## Cards(95)

Unsectioned

(43 cards)

Convergence in expectation

Front

$$X_n \rightarrow X$$ in expectation if $$E(|X_n - X|) \rightarrow 0$$. Implies $$E(X_n) \rightarrow E(X)$$.

Back

Variance of a random variable vector

Front

$$\text{Var}(\mathbf{X}) = E(\mathbf{X}\mathbf{X}^\top) - \mathbf{\mu}\mathbf{\mu}^\top$$

Back

PCA error

Front

With $$\mathbf{x} = \mathbf{\mu} + \sum_{j=1}^p z_j\mathbf{v}_j$$ and K-term approximation $$\mathbf{\hat{x}} = \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$$:

\begin{aligned}e_K &= \Vert \mathbf{x} - \mathbf{\hat{x}} \Vert^2 = \sum_{j=K+1}^p z_j^2,\\E(e_K) &= \sum_{j=K+1}^p E[\mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top\mathbf{v}_j]\\&= \sum_{j=K+1}^p \mathbf{v}_j^\top S_x \mathbf{v}_j\\&= \sum_{j=K+1}^p \lambda_j\end{aligned}

Back

Variance of random vector whose components are i.i.d.

Front

$$\sigma^2I$$

Back

PDF of multivariable function of $$\mathbf{X}$$ with joint PDF $$f_X(x)$$

Front

Like scalar case of $$f_Y(y) = f_X(h(y))|h'(y)|$$, $$\mathbf{Y} = g(\mathbf{X})$$. Let $$\mathbf{X} = h(\mathbf{Y})$$ be an inverse:

$$f_Y(\mathbf{y}) = f_X(h(\mathbf{y}))\left|\det{\frac{\partial h(y)}{\partial y}}\right|$$

Back

Dirac delta function

Front

$$\delta(x) = \begin{cases}+\infty &\text{if }x = 0,\\0 &\text{if }x \neq 0,\end{cases}$$

$$\int_a^b \delta(x - x_0) \,dx = \begin{cases}1 &\text{if }x_0 \in [a,b],\\0 &\text{if }x_0 \not\in [a,b],\end{cases}$$

If $$f(x)$$ is continuous at $$x_0$$:

$$\int_{-\infty}^\infty f(x) \delta(x - x_0) \,dx = f(x_0)$$

Back

Expectation and variance of $$\mathbf{Y} = \mathbf{AX} + \mathbf{b}$$

Front

\begin{aligned}E(\mathbf{Y}) &= \mathbf{A}E(\mathbf{X}) + \mathbf{b},\\\text{Var}(\mathbf{Y}) &= \mathbf{A}\text{Var}(\mathbf{X})\mathbf{A}^\top\end{aligned}

Back

Proportion of variance (PoV)

Front

Fraction of variance explained by first $$k$$ PCs:

$$\text{PoV}(k) = 1 - \frac{\sum_{i=k+1}^p \lambda_i}{\sum_{i=1}^p \lambda_i} = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^p \lambda_i}$$

Back

Joint PDF of $$Y = (Y_1, Y_2)$$ where $$Y_1$$ and $$Y_2$$ are functions of $$X = (X_1, X_2)$$

Front
1. Invert mapping: $$(X_1, X_2) = h(Y_1, Y_2)$$
2. Take Jacobian: $$J = \frac{\partial h(y_1, y_2)}{\partial y}$$
3. $$f_Y(y_1, y_2) = f_X(h(y_1, y_2))|\det{J}|$$
Back

Perron-Frobenius theorem

Front

For finite, aperiodic, irreducible Markov chains with transition matrix $$\mathbf{P}$$:

• Eigenvalues $$\lambda_1, \ldots, \lambda_M$$ in decreasing magnitude
• $$\lambda_1$$ = 1
• Unique left eigenvector $$\mathbf{\alpha}^\top = \mathbf{\alpha}^\top\mathbf{P}$$
• Unique stationary distribution
• Eigenvector for $$\lambda_1$$ of $$\mathbf{P}^\top$$
• Normalize by dividing eigenvector by sum of elements
Back

Find eigenvalues for matrix $$A$$

Front

Roots of $$\det(\lambda I - A)$$

Back

PCA transform

Front

$$z_j = \mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})$$

Back

$$\frac{\partial \mathbf{v}^\top\mathbf{A}\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{A}\mathbf{v}$$

Back

Linear difference equation

Front

With equations of the form $$\theta_{k+1} = c\theta_k + b$$, where $$c$$ and $$b$$ are constants, $$\theta_k$$ can be solved with:

$$\theta_k = Ac^k + \frac{b}{1-c},$$

where $$A$$ is a constant that can be solved by plugging in $$\theta_0$$

Back

PCA inverse transform

Front

$$\mathbf{x} \approx \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$$

Back

Covariance matrix $$\mathbf{S}$$ and its inverse for bivariate Gaussian random vector

Front

\begin{aligned}\mathbf{S} &= \begin{bmatrix}\sigma_1^2 & \sigma_{12}\\\sigma_{12} & \sigma_2^2\end{bmatrix}\\\mathbf{S}^{-1} &= \frac1{\sigma_1^2\sigma_2^2(1 - \rho^2)}\begin{bmatrix}\sigma_2^2 & -\rho\sigma_1\sigma_2\\-\rho\sigma_1\sigma_2 & \sigma_1^2\end{bmatrix}\end{aligned}

Back

Positive definite and positive semi-definite matrices

Front

$$\mathbf{Q} \in \mathbb{R}^{N \times N}$$ is:

• Positive semi-definite if $$\mathbf{Q} = \mathbf{Q}^\top$$ and $$\mathbf{x}^\top\mathbf{Q}\mathbf{x} \geqslant 0$$ for all $$\mathbf{x} \in \mathbb{R}^N$$, written $$\mathbf{Q} \geqslant 0$$
• Positive definite if $$\mathbf{Q} = \mathbf{Q}^\top$$ and $$\mathbf{x}^\top\mathbf{Q}\mathbf{x} > 0$$ for all $$\mathbf{x} \in \mathbb{R}^N, \mathbf{x} \neq 0$$, written $$\mathbf{Q} > 0$$
Back

Mean squared distance

Front

\begin{aligned}E(\Vert \mathbf{X} - \mathbf{Y} \Vert^2) &= \sum_j E\left[(X_j - Y_j)^2\right]\\E(\Vert \mathbf{X} - \mathbf{\mu} \Vert^2) &= \text{Tr}(\mathbf{S})\\&=\sum_j \lambda_j\end{aligned}

Back

Expected value of absolute value of standard Gaussian variable $$Z$$

Front

$$E(|Z|) = \sqrt{\frac2{\pi}}$$

Back

Compute HMM non-causal estimate

Front
1. $$\gamma_k(i) = P(y_k | X_k = i)$$ for all $$i, k$$
1. From observation matrix
2. $$\gamma_k(i) = 1$$ when $$k = T$$ or missing
2. $$\alpha_0(i) = \gamma_0(i)P(X_0 = i)$$
1. Can use vector notation: $$\mathbf{\alpha}_0 = \begin{bmatrix}\gamma_0(1)\\\gamma_0(2)\end{bmatrix} \otimes \begin{bmatrix}P(X_0 = 1)\\P(X_0 = 2)\end{bmatrix}$$
3. $$\alpha_{k+1}(j) = \gamma_{k+1}(j)\sum_i P_{ij}\alpha_k(i)$$
1. $$\mathbf{\alpha}_{k+1} = \mathbf{\gamma}_{k+1} \otimes (\mathbf{P}^\top \mathbf{\alpha}_k)$$
4. $$\beta_k(i) = \sum_j P_{ij}\gamma_{k+1}(j)\beta_{k+1}(j)$$
1. $$\mathbf{\beta}_k = \mathbf{P}(\mathbf{\gamma}_{k+1} \otimes \mathbf{\beta}_{k+1})$$
2. $$\beta_T(i) = 1$$ or $$\frac1{M}$$ for normalization
5. $$p_k(i) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$$
Back

Stationary Markov evolution

Front

With probability of initial state $$\mathbf{\alpha}_0$$:

$$\mathbf{\alpha}_{k+1}^\top = \mathbf{\alpha}_k^\top \mathbf{P}$$

$$\mathbf{\alpha}_k^\top = \mathbf{\alpha}_0^\top \mathbf{P}^k$$

Back

Test for whether $$\mathbf{X} = (X_1, \ldots, X_d)$$ is jointly Gaussian

Front

$$\mathbf{X} = (X_1, \ldots, X_d)$$ is jointly Gaussian iff linear combinations of $$X_j$$ are Gaussian, or

$$Z = \mathbf{a}^\top\mathbf{X} = \sum_ia_iX_i$$

is a scalar Gaussian for all vectors $$\mathbf{a} \in \mathbb{R}^d$$

Back

Singular value decomposition (SVD)

Front

SVD is $$\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top$$:

• $$\mathbf{X} \in \mathbb{R}^{N \times p}$$ is data with sample mean subtracted
• $$\mathbf{U} \in \mathbb{R}^{N \times r}, \mathbf{U}^\top\mathbf{U} = \mathbf{I}_r$$
• $$\mathbf{V} \in \mathbb{R}^{p \times r}, \mathbf{V}^\top\mathbf{V} = \mathbf{I}_r$$
• Eigenvectors of $$\mathbf{S}_x$$ (PCs)
• $$\mathbf{\Sigma} = \text{diag}(\alpha_1, \ldots, \alpha_r)$$, singular values sorted descending
• Eigenvalues are $$\frac{\alpha_j^2}{N}$$
• $$\mathbf{S}_x = \frac1{N}\mathbf{X}^\top\mathbf{X} = \frac1{N}\mathbf{V}\mathbf{\Sigma}^2\mathbf{V}^\top$$
Back

Forward term $$\alpha_k(i)$$ with regards to Hidden Markov Model:

$$\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow$$

and

$$X_{k-1} \rightarrow Y_{k-1}$$

$$X_k \rightarrow Y_k$$

$$X_{k+1} \rightarrow X_{k+1}$$

Front

$$\alpha_k(i) = P(X_k = i, y_0^k)$$

Back

Irreducible set of states

Front

Irreducible set if all pairs of states in the set communicate

• There is a path between any pair of states in the set
• A Markov chain is irreducible if set of all states is irreducible
• There won't be a unique steady state distribution unless the entire Markov chain is irreducible
Back

Valid covariance matrix

Front

$$S$$ must be positive semi-definite, so $$\det(S) \geqslant 0$$

Back

Properties of $$N$$ orthonormal eigenvectors: $$\mathbf{V} = [\mathbf{v}_1, \ldots, \mathbf{v}_N] \in \mathbb{R}^{N \times N}$$

Front
• Since $$\mathbf{v}_i$$ are orthonormal, $$\mathbf{V}$$ is an orthogonal matrix
• $$\mathbf{V}\mathbf{V}^\top = \mathbf{V}^\top\mathbf{V} = I$$
• Since $$\mathbf{v}_i$$ are eigenvectors: $$\mathbf{S}\mathbf{V} = \mathbf{V}\mathbf{D}$$
• $$\mathbf{D} = \text{diag}(\lambda_1, \ldots, \lambda_N)$$
• Diagonalization: $$\mathbf{S} = \mathbf{V}\mathbf{D}\mathbf{V}^\top$$
Back

Determinant of $$\begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}$$

Front

$$aei + bfg + cdh - gec - hfa - idb$$

Go diagonally down from $$a$$, $$b$$, and $$c$$, multiply and add. Go diagonally up from $$g$$, $$h$$, and $$i$$, multiply and subtract.

Back

Sample variance matrix

Front

With $$n$$ samples each having $$p$$ features, $$\mathbf{x}_i \in \mathbb{R}^p$$:

\begin{aligned}\mathbf{S}_x &= \frac1{n} \sum_{i=1}^n (\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\\&=E\left[(\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\right]\end{aligned}

Back

Non-causal estimate

Front

$$P(X_k = i | y_0^{T-1}) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$$

Back

Inverse of $$A = \begin{bmatrix}a&b\\c&d\end{bmatrix}$$

Front

$$\frac1{\text{det}(A)}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}$$

Inverse doesn't exist if $$A$$ is singular (determinant is $$0$$)

Back

HMM backward term recursion

Front

$$\beta_k(i) = \sum_j \beta_{k+1}(j)\gamma_{k+1}(j)P_k(i,j)$$

with initial condition

$$\beta_T(i) = P(y_T | X_T = i) = 1$$

Back

Matrix square root

Front

With the eigenvalue decomposition of $$S = UDU^\top$$:

$$S^\frac12 = UD^\frac12U^\top$$

Back

Vector Gaussian mixture model mean and variance

Front

Just like a scalar mixture distribution:

\begin{aligned}E(\mathbf{x}) &= \sum_i \mathbf{\mu}_iq_i = \mathbf{\mu}\\\text{Var}(\mathbf{x}) &= \sum_i q_i\left[\mathbf{S}_i + (\mathbf{\mu} - \mathbf{\mu}_i)(\mathbf{\mu} - \mathbf{\mu}_i)^\top\right]\\&= \mathbf{S}\end{aligned}

Back

Multivariable Gaussian PDF with sample variance

Front

$$f_X(\mathbf{x}) = \frac{e^{-\frac12(\mathbf{x} - \mathbf{\mu})^\top\mathbf{S}^{-1}(\mathbf{x} - \mathbf{\mu})}}{(2\pi)^{\frac{d}2}\sqrt{\text{det}(\mathbf{S})}},$$

where $$d$$ is the length of $$\mathbf{X}$$ and $$\mathbf{S}$$ is the sample variance

Back

Stochastic matrix

Front

Satisfies

• $$P_{ij} \geqslant 0$$
• $$\forall i: \sum_j P_{ij} = 1$$
Back

Generate i.i.d. multivariable Gaussian samples $$\mathbf{x}_i \in \mathbb{R}^d, \mathbf{x}_i\sim\mathcal{N}(\mu, S)$$

Front
1. Generate i.i.d. $$\mathbf{z}_i \in \mathbb{R}^d, \mathbf{z}_i\sim\mathcal{N}(0, I)$$
1. Components $$z_{ij}$$ are i.i.d. $$\mathcal{N}(0, 1)$$
2. $$x_i = S^\frac12z_i + \mu$$, works because:
1. $$E(x_i) = S^\frac12E(z_i) + \mu = S^\frac120 + \mu = \mu$$
2. $$\text{Var}(x_i) = S^\frac12\text{Var}(z_i)S^\frac12 = S^\frac12IS^\frac12 = S^\frac12S^\frac12 = S$$
Back

HMM forward term recursion

Front

$$\alpha_{k+1}(j) = \gamma_{k+1}(j) \sum_i P_k(i, j)\alpha_k(i)$$

where

\begin{aligned}P_k(i, j) &= P(X_{k+1} = j | X_k = i)\\\gamma_{k+1}(j) &= P(y_{k+1} | X_{k+1} = j)\end{aligned}

with initial condition

$$\alpha_k(i) = P(X_0 = i)\gamma_0(i)$$

Back

Find eigenvectors of matrix $$\mathbf{A}$$

Front
1. Solve for $$\mathbf{v}_i$$ with each $$\lambda_i$$
2. $$\mathbf{A}\mathbf{v}_i = \lambda_i \mathbf{v}_i$$
3. Normalize: $$\Vert\mathbf{v_i}\Vert^2 = 1$$
Back

Properties of symmetric matrix $$S$$

Front
• $$N$$ orthonormal eigenvectors: $$\Vert v_j \Vert = 1$$ and $$v_i^\top v_j = 0$$ for $$i \ne j$$
• Eigenvalues are real: $$Sv_i = \lambda_iv_i$$
• S is positive semi-definite iff $$\lambda_i \geqslant 0$$
• S is positive definite iff $$\lambda_i > 0$$
Back

$$\frac{\partial \mathbf{v}^\top\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{v}$$

Back

Backward term $$\beta_k(i)$$ with regards to Hidden Markov Model:

$$\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow$$

and

$$X_{k-1} \rightarrow Y_{k-1}$$

$$X_k \rightarrow Y_k$$

$$X_{k+1} \rightarrow X_{k+1}$$

Front

$$\beta_k(i) = P(y_{k+1}^{T-1}|X_k = i)$$

Back

Stationary transition matrix

Front

AKA homogeneous, graphical drawing called state transition diagram:

$$P_{ij}(k) = P_{ij}$$

Back

Chapter 1

(2 cards)

Probability axioms

Front
• For every event $$A$$, $$P(A) \ge 0$$
• $$P(S) = 1$$
• For every infinite sequence of disjoint events $$A_1, A_2, \ldots,$$
$$P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)$$
Back

Bonferroni inequality

Front

$$P\left(\bigcap_{i=1}^n A_i\right) \ge 1 - \sum_{i=1}^n P\left(A_i^c\right)$$

Back

Chapter 3

(12 cards)

Inverse CDF method to generate random variables

Front
1. Find CDF $$F(x) = u$$
2. Find inverse CDF $$F^{-1}(u) = x$$
3. Generate $$x$$'s from standard uniform samples
Back

Convergence in distribution

Front

At every point where $$F_X(x)$$ is continuous, $$\forall x$$:

$$F_{X_n}(x) = P(X_n \leqslant x) \to F_X(x) = P(X \leqslant x)$$

Back

Convolution of two PDFs $$f_X(x)$$ and $$f_Y(y)$$

Front

$$f_Z(z) = (f_X * f_Y)(z) = \int_{-\infty}^\infty f_X(t)f_Y(z - t) \,dt$$

Back

Front

If $$Y = X + W$$ with $$X, W$$ independent, then

$$f_{Y|X}(y|x) = f_W(y-x)$$

Back

3 methods to compute PDF of $$Y = g(X)$$

Front
• For discrete RV, inverse PMF method
• For continuous or discrete RVs, inverse CDF method
• For continuous RVs with invertible $$g(X)$$, derivative formula
Back

PDF of an invertible function of a random variable

Front

Let $$Y = g(X)$$ with $$g(x)$$ invertible so that $$X = g^{-1}(Y)$$:

$$f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{\partial g^{-1}(y)}{\partial y}\right|$$

Back

Relationship between a PDF and a probability

Front

$$P(X \in [a, a + \epsilon]) \approx f_X(a) \cdot \epsilon$$

or

$$f_X(a) = \lim_{\epsilon \to 0} \frac{P(X \in [a, a + \epsilon])}{\epsilon}$$

Back

Leibnitz rule

Front

\begin{aligned}\frac{d}{dz}\int_{a(z)}^{b(z)} h(x, z) \,dx &= h(b(z), z)b'\\&- h(a(z), z)a'\\&+ \int_{a(z)}^{b(z)} \frac{\partial h(x, z)}{\partial z} \,dx\end{aligned}

Back

Inverse CDF method of computing PDF for a single random variable

Front
1. $$Y = g(X)$$
2. $$F_Y(y) = P(Y \leqslant y) = P(g(X) \leqslant y) = P(X \leqslant g^{-1}(y))$$
3. $$f_Y(y) = F_Y'(y)$$
Back

Convergence in Probability

Front

Random variables $$X_n \to X$$ in probability if $$\forall \epsilon > 0$$:

$$\lim_{n \to \infty} P(|X_n - X| \geqslant \epsilon) = 0$$

or, given any $$\epsilon$$ and $$\delta > 0$$, $$\exists N > 0$$ such that:

$$P(|X_n - X| \geqslant \epsilon) < \delta, \forall n > N$$

Back

PDF of a linear function of a random variable

Front

Let $$X$$ be a random variable with PDF $$f_X(x)$$ and $$Y = aX + b$$ with PDF $$f_Y(y)$$:

$$f_Y(y) = \frac1{|a|}f_X\left(\frac{y - b}{a}\right)\text{ for }-\infty < y < \infty$$

Back

Quantile function of the distribution of $$X$$

Front

$$F^{-1}(p)$$ is defined as the smallest value $$x$$ such that $$F(x) \geqslant p$$ for $$0 < p < 1$$

Back

Chapter 4

(20 cards)

Optimal estimate and resulting MSE for constant estimator

Front

\begin{aligned}\hat{Y} &= E(Y),\\\text{MSE} &= \text{Var}(Y)\end{aligned}

Back

Optimal parameters for linear estimator $$\hat{Y} = \beta_1X + \beta_0$$

Front

\begin{aligned}\beta_1 &= \frac{\sigma_{XY}}{\sigma_X^2},\\\beta_0 &= E(Y) - \beta_1E(X)\end{aligned}

Back

Jensen's inequality

Front

Let $$g$$ be a convex function and let $$\mathbf{X}$$ be a random vector with finite mean:

$$E[g(\mathbf{X})] \geqslant g(E(\mathbf{X}))$$

Back

Minimum MSE for linear estimation

Front

$$\sigma_Y^2(1 - \rho_{XY}^2)$$

Back

$$\int_0^1 p^k(1-p)^l \,dp$$

Front

$$\frac{k!l!}{(k + l + 1)!}$$

Back

Convex function

Front

For every $$\alpha \in (0, 1)$$ and every $$\mathbf{x}$$ and $$\mathbf{y}$$,

$$g[\alpha\mathbf{x} + (1 - \alpha)\mathbf{y}] \geqslant \alpha g(\mathbf{x}) + (1 - \alpha) g(\mathbf{y})$$

Back

Law of Total Probability for Variance

Front

\begin{aligned}\text{Var}_Y(Y) &= E_X[\text{Var}_{Y|X}(Y|X)]\\&+ \text{Var}_X[E_{Y|X}(Y|X)]\end{aligned}

Back

Sample mean of i.i.d. $$X_i$$ with $$E(X) = \mu$$ and $$\text{Var}(X) = \sigma^2$$

Front

\begin{aligned}S_n &= \frac1{n} \sum_{i=1}^n X_i,\\E(S_n) &= \mu,\\\text{Var}(S_n) &= \frac{\sigma^2}{n}\end{aligned}

Back

Variance of sum of pairwise uncorrelated $$a_1X_1, \ldots, a_dX_d$$

Front

$$\text{Var}(a_1X_1 + \ldots + a_dX_d) = \sum_{i=1}^d a_i^2\text{Var}(X_i)$$

Back

Moment generating function

Front

Let $$X$$ be random variable. For each real number $$t$$,

$$\psi(t) = E(e^{tX})$$

Back

Expectation of non-negative integer random variable

Front

$$E(X) = \sum_{n=1}^\infty Pr(X \geqslant n)$$

Back

Covariance of linear relationship $$Y = aX + b$$

Front

$$\sigma_{XY} = a\sigma_X^2$$

Back

Special case of nested conditional expectation with $$g(X,Y) = h(Y) \cdot f(X)$$

Front

$$E_{XY}[h(Y) \cdot f(X)] = E_Y[h(Y) \cdot E_{X|Y}(f(X)|Y)]$$

Back

Cauchy-Schwarz Inequality

Front

$$X$$ and $$Y$$ are random variables with finite variance:

$$[\text{Cov}(X, Y)]^2 \leqslant \sigma_X^2\sigma_Y^2$$

and

$$-1 \leqslant \rho(X, Y) \leqslant 1$$

Back

Expectation of non-negative random variable with c.d.f. $$F$$

Front

$$E(X) = \int_0^\infty [1 - F(x)] \,dx$$

Back

Law of Total Probability for Expectations

Front

$$E_X[E_{Y|X}(Y|X)] = E_Y(Y)$$

Back

Nested conditional expectation of joint random variables

Front

$$E_{XY}[g(X,Y)] = E_X[E_{Y|X}(g(X,Y)|X)]$$

Back

Mixture distribution

Front

Let $$X = 1, 2, \ldots, M$$ with $$P(X=i)=q_i, E(Y|X=i)=\mu_i, \text{Var}(Y|X=i)=\sigma_i^2$$:

\begin{aligned}E(Y) &= \mu_y = \sum_i q_i\mu_i,\\\text{Var}(Y) &= \sum_i q_i\left[\sigma_i^2+(\mu_i-\mu_y)^2\right]\end{aligned}

Back

Schwarz Inequality

Front

$$[E(UV)]^2 \leqslant E(U^2)E(V^2)$$

Back

Optimal estimate and resulting MSE for MMSE estimator

Front

\begin{aligned}\hat{Y} &= E(Y|X),\\\text{MSE} &= E_X[\text{Var}_{Y|X}(Y|X)]\end{aligned}

Back

Chapter 5

(11 cards)

Relationship of Poisson and Binomial distributions as $$n \to \infty$$

Front

Total number of arrivals:

$$\lim_{n \to \infty} \text{Binom}\left(n, \frac{\lambda}{n}\right) = \text{Poisson}(\lambda)$$

Back

Correlation and independence within jointly Gaussian vector

Front
• Gaussian random variables $$\not{\!\!\Rightarrow}$$ jointly Gaussian
• Independent Gaussian random variables $$\Rightarrow$$ jointly Gaussian
• Uncorrelated jointly Gaussian random variables $$\Rightarrow$$ independent
Back

Maclaurin series for $$e^x$$

Front

$$e^x = \sum_{k=0}^\infty \frac{x^k}{k!}$$

Back

Central Limit Theorem

Front

Let $$Z_i$$ be i.i.d. random variables, $$\mu = E(Z_i)$$, and $$\sigma^2 = \text{Var}(Z_i)$$:

\begin{aligned}\lim_{n \to \infty} \frac1{\sqrt{n}}\sum_{i=1}^n (Z_i - \mu) &\sim \mathcal{N}(0, \sigma^2),\\\lim_{n \to \infty} \bar{Z} &\sim \mathcal{N}(\mu, \frac{\sigma^2}{n}),\\\lim_{n \to \infty} \sum_i Z_i &\sim \mathcal{N}(n\mu, n\sigma^2)\end{aligned}

Back

Tail bounds on Standard Normal CDF

Front

$$\frac1{\sqrt{2\pi}z}\left(1 - \frac1{z^2}\right)e^{-\frac{z^2}2} \leqslant 1 - \Phi(z) \leqslant \frac1{\sqrt{2\pi}z}e^{-\frac{z^2}2}$$

Back

Negative binomial distribution

Front

The number $$X$$ of failures that occur before the $$r$$th success has p.d.f.:

$$f(x|r, p) = \binom{r + x - 1}{x} p^r (1 - p)^x$$

for $$x = 0, 1, 2, \ldots$$ or $$0$$ otherwise.

\begin{aligned}E(X) &= \frac{r(1 - p)}{p}\\Var(X) &= \frac{r(1 - p)}{p^2}.\end{aligned}

Back

2nd moment of normal distribution

Front

$$\mu^2 + \sigma^2$$

Back

Exponential distribution

Front

Time between events in a Poisson point process (events occur continuously and independently at a constant average rate). With $$\lambda$$ representing event rate:

$$f(x; \lambda) = \begin{cases}\lambda e^{-\lambda x},&x \geqslant 0\\0,&\text{ otherwise}.\end{cases}$$

Mean is $$\frac1{\lambda}$$, variance is $$\frac1{\lambda^2}$$

Back

Linear combination of bivariate normal mean and variance. Let $$X_1$$ and $$X_2$$ be two random bivariate normal variables, what is the mean and variance of:

$$a_1X_1 + a_2X_2 + b$$

Front

\begin{aligned}mean&=a_1\mu_1 + a_2\mu_2 + b\\variance&=a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + 2a_1a_2\rho\sigma_1\sigma_2\end{aligned}

Back

$$e^x$$ as a limit

Front

$$e^x = \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n$$

Back

Moments of exponential variables

Front

$$E(X^n) = \frac{n!}{\lambda^n}$$

Back

Chapter 6

(7 cards)

Approximation of $$P(S_n \leqslant c)$$ where $$S_n = X_1 + \ldots + X_n$$ and $$X_i$$ are i.i.d. with mean $$\mu$$ and variance $$\sigma^2$$

Front

Use Central Limit Theorem

1. Approximate $$S_n \sim \mathcal{N}(n\mu, n\sigma^2)$$
2. Let $$Z_n = \frac{S_n - n\mu}{\sigma\sqrt{n}}$$
3. $$P(S_n \leqslant c) = P\left(Z_n \leqslant \frac{c-n\mu}{\sigma\sqrt{n}}\right) \approx \Phi\left(\frac{c - n\mu}{\sigma\sqrt{n}}\right)$$
Back

Weak law of large numbers

Front

If $$X_k$$ are uncorrelated with same mean and variance, then $$\overline{X}_n \rightarrow E(X_k) = \mu$$ in probability

Back

Strong law of large numbers

Front

If $$X_k$$ are i.i.d. and $$E(|X_k|) < \infty$$, then $$\overline{X}_n \rightarrow E(X)$$ almost surely. Also applies to functions, $$\overline{g(X)}_n \rightarrow E[g(X)]$$ if $$E(|g(X)|) < \infty$$.

Back

Chebyshev inequality

Front

If $$X$$ is a random variable for which $$\text{Var}(X)$$ exists, $$\forall t>0$$:

$$P(|X - E(X)| > t) \leqslant \frac{\text{Var}(X)}{t^2}$$

Back

Delta method

Front

Let $$Y_1, Y_2, \ldots$$ be a sequence of random variables, $$F^*$$ be a continuous CDF, $$\theta$$ be a real number, and $$a_1, a_2, \ldots$$ be a sequence of positive numbers increasing to $$\infty$$.

If $$a_n(Y_n - \theta)$$ converges in distribution to $$F^*$$ and $$\alpha$$ is a function with continuous derivative such that $$\alpha'(\theta) \ne 0$$, then the following converges in distribution to $$F^*$$:

$$\frac{a_n}{\alpha'(\theta)}(\alpha(Y_n) - \alpha(\theta))$$

Back

Almost sure convergence

Front

$$X_n \rightarrow X$$ almost surely if

$$P\left(\lim_{n \to \infty} X_n = X\right) = 1$$

Back

Markov inequality

Front

If $$X$$ is a random variable with $$P(X \geqslant 0) = 1$$, $$\forall t>0$$:

$$P(X \geqslant t) \leqslant \frac{E(X)}{t}$$

Back