Probability and Statistics (Classic Version), 4th Edition

Probability and Statistics (Classic Version), 4th Edition

by DeGroot and Schervish

Scott Mueller (lvl 17)
Unsectioned

Preview this deck

Convergence in expectation

Front

Star 0%
Star 0%
Star 0%
Star 0%
Star 0%

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Active users

4

All-time users

4

Favorites

0

Last updated

9 months ago

Date created

Oct 5, 2020

Cards (95)

Unsectioned

(43 cards)

Convergence in expectation

Front

\(X_n \rightarrow X\) in expectation if \(E(|X_n - X|) \rightarrow 0\). Implies \(E(X_n) \rightarrow E(X)\).

Back

Variance of a random variable vector

Front

$$\text{Var}(\mathbf{X}) = E(\mathbf{X}\mathbf{X}^\top) - \mathbf{\mu}\mathbf{\mu}^\top$$

Back

PCA error

Front

With \(\mathbf{x} = \mathbf{\mu} + \sum_{j=1}^p z_j\mathbf{v}_j\) and K-term approximation \(\mathbf{\hat{x}} = \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j\):

$$\begin{aligned}e_K &= \Vert \mathbf{x} - \mathbf{\hat{x}} \Vert^2 = \sum_{j=K+1}^p z_j^2,\\E(e_K) &= \sum_{j=K+1}^p E[\mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top\mathbf{v}_j]\\&= \sum_{j=K+1}^p \mathbf{v}_j^\top S_x \mathbf{v}_j\\&= \sum_{j=K+1}^p \lambda_j\end{aligned}$$

Back

Variance of random vector whose components are i.i.d.

Front

$$\sigma^2I$$

Back

PDF of multivariable function of \(\mathbf{X}\) with joint PDF \(f_X(x)\)

Front

Like scalar case of \(f_Y(y) = f_X(h(y))|h'(y)|\), \(\mathbf{Y} = g(\mathbf{X})\). Let \(\mathbf{X} = h(\mathbf{Y})\) be an inverse:

$$f_Y(\mathbf{y}) = f_X(h(\mathbf{y}))\left|\det{\frac{\partial h(y)}{\partial y}}\right|$$

Back

Dirac delta function

Front

$$\delta(x) = \begin{cases}+\infty &\text{if }x = 0,\\0 &\text{if }x \neq 0,\end{cases}$$

$$\int_a^b \delta(x - x_0) \,dx = \begin{cases}1 &\text{if }x_0 \in [a,b],\\0 &\text{if }x_0 \not\in [a,b],\end{cases}$$

If \(f(x)\) is continuous at \(x_0\):

$$\int_{-\infty}^\infty f(x) \delta(x - x_0) \,dx = f(x_0)$$

Back

Expectation and variance of \(\mathbf{Y} = \mathbf{AX} + \mathbf{b}\)

Front

$$\begin{aligned}E(\mathbf{Y}) &= \mathbf{A}E(\mathbf{X}) + \mathbf{b},\\\text{Var}(\mathbf{Y}) &= \mathbf{A}\text{Var}(\mathbf{X})\mathbf{A}^\top\end{aligned}$$

Back

Proportion of variance (PoV)

Front

Fraction of variance explained by first \(k\) PCs:

$$\text{PoV}(k) = 1 - \frac{\sum_{i=k+1}^p \lambda_i}{\sum_{i=1}^p \lambda_i} = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^p \lambda_i}$$

Back

Joint PDF of \(Y = (Y_1, Y_2)\) where \(Y_1\) and \(Y_2\) are functions of \(X = (X_1, X_2)\)

Front
  1. Invert mapping: \((X_1, X_2) = h(Y_1, Y_2)\)
  2. Take Jacobian: \(J = \frac{\partial h(y_1, y_2)}{\partial y}\)
  3. \(f_Y(y_1, y_2) = f_X(h(y_1, y_2))|\det{J}|\)
Back

Perron-Frobenius theorem

Front

For finite, aperiodic, irreducible Markov chains with transition matrix \(\mathbf{P}\):

  • Eigenvalues \(\lambda_1, \ldots, \lambda_M\) in decreasing magnitude
  • \(\lambda_1\) = 1
  • Unique left eigenvector \(\mathbf{\alpha}^\top = \mathbf{\alpha}^\top\mathbf{P}\)
  • Unique stationary distribution
    • Eigenvector for \(\lambda_1\) of \(\mathbf{P}^\top\)
    • Normalize by dividing eigenvector by sum of elements
Back

Find eigenvalues for matrix \(A\)

Front

Roots of \(\det(\lambda I - A)\)

Back

PCA transform

Front

$$z_j = \mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})$$

Back

$$\frac{\partial \mathbf{v}^\top\mathbf{A}\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{A}\mathbf{v}$$

Back

Linear difference equation

Front

With equations of the form \(\theta_{k+1} = c\theta_k + b\), where \(c\) and \(b\) are constants, \(\theta_k\) can be solved with:

$$\theta_k = Ac^k + \frac{b}{1-c},$$

where \(A\) is a constant that can be solved by plugging in \(\theta_0\)

Back

PCA inverse transform

Front

$$\mathbf{x} \approx \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$$

Back

Covariance matrix \(\mathbf{S}\) and its inverse for bivariate Gaussian random vector

Front

$$\begin{aligned}\mathbf{S} &= \begin{bmatrix}\sigma_1^2 & \sigma_{12}\\\sigma_{12} & \sigma_2^2\end{bmatrix}\\\mathbf{S}^{-1} &= \frac1{\sigma_1^2\sigma_2^2(1 - \rho^2)}\begin{bmatrix}\sigma_2^2 & -\rho\sigma_1\sigma_2\\-\rho\sigma_1\sigma_2 & \sigma_1^2\end{bmatrix}\end{aligned}$$

Back

Positive definite and positive semi-definite matrices

Front

\(\mathbf{Q} \in \mathbb{R}^{N \times N}\) is:

  • Positive semi-definite if \(\mathbf{Q} = \mathbf{Q}^\top\) and \(\mathbf{x}^\top\mathbf{Q}\mathbf{x} \geqslant 0\) for all \(\mathbf{x} \in \mathbb{R}^N\), written \(\mathbf{Q} \geqslant 0\)
  • Positive definite if \(\mathbf{Q} = \mathbf{Q}^\top\) and \(\mathbf{x}^\top\mathbf{Q}\mathbf{x} > 0\) for all \(\mathbf{x} \in \mathbb{R}^N, \mathbf{x} \neq 0\), written \(\mathbf{Q} > 0\)
Back

Mean squared distance

Front

$$\begin{aligned}E(\Vert \mathbf{X} - \mathbf{Y} \Vert^2) &= \sum_j E\left[(X_j - Y_j)^2\right]\\E(\Vert \mathbf{X} - \mathbf{\mu} \Vert^2) &= \text{Tr}(\mathbf{S})\\&=\sum_j \lambda_j\end{aligned}$$

Back

Expected value of absolute value of standard Gaussian variable \(Z\)

Front

$$E(|Z|) = \sqrt{\frac2{\pi}}$$

Back

Compute HMM non-causal estimate

Front
  1. \(\gamma_k(i) = P(y_k | X_k = i)\) for all \(i, k\)
    1. From observation matrix
    2. \(\gamma_k(i) = 1\) when \(k = T\) or missing
  2. \(\alpha_0(i) = \gamma_0(i)P(X_0 = i)\)
    1. Can use vector notation: \(\mathbf{\alpha}_0 = \begin{bmatrix}\gamma_0(1)\\\gamma_0(2)\end{bmatrix} \otimes \begin{bmatrix}P(X_0 = 1)\\P(X_0 = 2)\end{bmatrix}\)
  3. \(\alpha_{k+1}(j) = \gamma_{k+1}(j)\sum_i P_{ij}\alpha_k(i)\)
    1. \(\mathbf{\alpha}_{k+1} = \mathbf{\gamma}_{k+1} \otimes (\mathbf{P}^\top \mathbf{\alpha}_k)\)
  4. \(\beta_k(i) = \sum_j P_{ij}\gamma_{k+1}(j)\beta_{k+1}(j)\)
    1. \(\mathbf{\beta}_k = \mathbf{P}(\mathbf{\gamma}_{k+1} \otimes \mathbf{\beta}_{k+1})\)
    2. \(\beta_T(i) = 1\) or \(\frac1{M}\) for normalization
  5. \(p_k(i) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}\)
Back

Stationary Markov evolution

Front

With probability of initial state \(\mathbf{\alpha}_0\):

$$\mathbf{\alpha}_{k+1}^\top = \mathbf{\alpha}_k^\top \mathbf{P}$$

$$\mathbf{\alpha}_k^\top = \mathbf{\alpha}_0^\top \mathbf{P}^k$$

Back

Test for whether \(\mathbf{X} = (X_1, \ldots, X_d)\) is jointly Gaussian

Front

\(\mathbf{X} = (X_1, \ldots, X_d)\) is jointly Gaussian iff linear combinations of \(X_j\) are Gaussian, or

$$Z = \mathbf{a}^\top\mathbf{X} = \sum_ia_iX_i$$

is a scalar Gaussian for all vectors \(\mathbf{a} \in \mathbb{R}^d\)

Back

Singular value decomposition (SVD)

Front

SVD is \(\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top\):

  • \(\mathbf{X} \in \mathbb{R}^{N \times p}\) is data with sample mean subtracted
  • \(\mathbf{U} \in \mathbb{R}^{N \times r}, \mathbf{U}^\top\mathbf{U} = \mathbf{I}_r\)
  • \(\mathbf{V} \in \mathbb{R}^{p \times r}, \mathbf{V}^\top\mathbf{V} = \mathbf{I}_r\)
    • Eigenvectors of \(\mathbf{S}_x\) (PCs)
  • \(\mathbf{\Sigma} = \text{diag}(\alpha_1, \ldots, \alpha_r)\), singular values sorted descending
    • Eigenvalues are \(\frac{\alpha_j^2}{N}\)
  • \(\mathbf{S}_x = \frac1{N}\mathbf{X}^\top\mathbf{X} = \frac1{N}\mathbf{V}\mathbf{\Sigma}^2\mathbf{V}^\top\)
Back

Forward term \(\alpha_k(i)\) with regards to Hidden Markov Model:

\(\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow\)

and

\(X_{k-1} \rightarrow Y_{k-1}\)

\(X_k \rightarrow Y_k\)

\(X_{k+1} \rightarrow X_{k+1}\)

Front

$$\alpha_k(i) = P(X_k = i, y_0^k)$$

Back

Irreducible set of states

Front

Irreducible set if all pairs of states in the set communicate

  • There is a path between any pair of states in the set
  • A Markov chain is irreducible if set of all states is irreducible
  • There won't be a unique steady state distribution unless the entire Markov chain is irreducible
Back

Valid covariance matrix

Front

\(S\) must be positive semi-definite, so \(\det(S) \geqslant 0\)

Back

Properties of \(N\) orthonormal eigenvectors: \(\mathbf{V} = [\mathbf{v}_1, \ldots, \mathbf{v}_N] \in \mathbb{R}^{N \times N}\)

Front
  • Since \(\mathbf{v}_i\) are orthonormal, \(\mathbf{V}\) is an orthogonal matrix
    • \(\mathbf{V}\mathbf{V}^\top = \mathbf{V}^\top\mathbf{V} = I\)
  • Since \(\mathbf{v}_i\) are eigenvectors: \(\mathbf{S}\mathbf{V} = \mathbf{V}\mathbf{D}\)
    • \(\mathbf{D} = \text{diag}(\lambda_1, \ldots, \lambda_N)\)
  • Diagonalization: \(\mathbf{S} = \mathbf{V}\mathbf{D}\mathbf{V}^\top\)
Back

Determinant of \(\begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}\)

Front

$$aei + bfg + cdh - gec - hfa - idb$$

Go diagonally down from \(a\), \(b\), and \(c\), multiply and add. Go diagonally up from \(g\), \(h\), and \(i\), multiply and subtract.

Back

Sample variance matrix

Front

With \(n\) samples each having \(p\) features, \(\mathbf{x}_i \in \mathbb{R}^p\):

$$\begin{aligned}\mathbf{S}_x &= \frac1{n} \sum_{i=1}^n (\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\\&=E\left[(\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\right]\end{aligned}$$

Back

Non-causal estimate

Front

$$P(X_k = i | y_0^{T-1}) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$$

Back

Inverse of \(A = \begin{bmatrix}a&b\\c&d\end{bmatrix}\)

Front

$$\frac1{\text{det}(A)}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}$$

Inverse doesn't exist if \(A\) is singular (determinant is \(0\))

Back

HMM backward term recursion

Front

$$\beta_k(i) = \sum_j \beta_{k+1}(j)\gamma_{k+1}(j)P_k(i,j)$$

with initial condition

$$\beta_T(i) = P(y_T | X_T = i) = 1$$

Back

Matrix square root

Front

With the eigenvalue decomposition of \(S = UDU^\top\):

$$S^\frac12 = UD^\frac12U^\top$$

Back

Vector Gaussian mixture model mean and variance

Front

Just like a scalar mixture distribution:

$$\begin{aligned}E(\mathbf{x}) &= \sum_i \mathbf{\mu}_iq_i = \mathbf{\mu}\\\text{Var}(\mathbf{x}) &= \sum_i q_i\left[\mathbf{S}_i + (\mathbf{\mu} - \mathbf{\mu}_i)(\mathbf{\mu} - \mathbf{\mu}_i)^\top\right]\\&= \mathbf{S}\end{aligned}$$

Back

Multivariable Gaussian PDF with sample variance

Front

$$f_X(\mathbf{x}) = \frac{e^{-\frac12(\mathbf{x} - \mathbf{\mu})^\top\mathbf{S}^{-1}(\mathbf{x} - \mathbf{\mu})}}{(2\pi)^{\frac{d}2}\sqrt{\text{det}(\mathbf{S})}},$$

where \(d\) is the length of \(\mathbf{X}\) and \(\mathbf{S}\) is the sample variance

Back

Stochastic matrix

Front

Satisfies

  • \(P_{ij} \geqslant 0\)
  • \(\forall i: \sum_j P_{ij} = 1\)
Back

Generate i.i.d. multivariable Gaussian samples \(\mathbf{x}_i \in \mathbb{R}^d, \mathbf{x}_i\sim\mathcal{N}(\mu, S)\)

Front
  1. Generate i.i.d. \(\mathbf{z}_i \in \mathbb{R}^d, \mathbf{z}_i\sim\mathcal{N}(0, I)\)
    1. Components \(z_{ij}\) are i.i.d. \(\mathcal{N}(0, 1)\)
  2. \(x_i = S^\frac12z_i + \mu\), works because:
    1. \(E(x_i) = S^\frac12E(z_i) + \mu = S^\frac120 + \mu = \mu\)
    2. \(\text{Var}(x_i) = S^\frac12\text{Var}(z_i)S^\frac12 = S^\frac12IS^\frac12 = S^\frac12S^\frac12 = S\)
Back

HMM forward term recursion

Front

$$\alpha_{k+1}(j) = \gamma_{k+1}(j) \sum_i P_k(i, j)\alpha_k(i)$$

where

$$\begin{aligned}P_k(i, j) &= P(X_{k+1} = j | X_k = i)\\\gamma_{k+1}(j) &= P(y_{k+1} | X_{k+1} = j)\end{aligned}$$

with initial condition

$$\alpha_k(i) = P(X_0 = i)\gamma_0(i)$$

Back

Find eigenvectors of matrix \(\mathbf{A}\)

Front
  1. Solve for \(\mathbf{v}_i\) with each \(\lambda_i\)
  2. \(\mathbf{A}\mathbf{v}_i = \lambda_i \mathbf{v}_i\)
  3. Normalize: \(\Vert\mathbf{v_i}\Vert^2 = 1\)
Back

Properties of symmetric matrix \(S\)

Front
  • \(N\) orthonormal eigenvectors: \(\Vert v_j \Vert = 1\) and \(v_i^\top v_j = 0\) for \(i \ne j\)
  • Eigenvalues are real: \(Sv_i = \lambda_iv_i\)
  • S is positive semi-definite iff \(\lambda_i \geqslant 0\)
  • S is positive definite iff \(\lambda_i > 0\)
Back

$$\frac{\partial \mathbf{v}^\top\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{v}$$

Back

Backward term \(\beta_k(i)\) with regards to Hidden Markov Model:

\(\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow\)

and

\(X_{k-1} \rightarrow Y_{k-1}\)

\(X_k \rightarrow Y_k\)

\(X_{k+1} \rightarrow X_{k+1}\)

Front

$$\beta_k(i) = P(y_{k+1}^{T-1}|X_k = i)$$

Back

Stationary transition matrix

Front

AKA homogeneous, graphical drawing called state transition diagram:

$$P_{ij}(k) = P_{ij}$$

Back

Chapter 1

(2 cards)

Probability axioms

Front
  • For every event \(A\), \(P(A) \ge 0\)
  • \(P(S) = 1\)
  • For every infinite sequence of disjoint events \(A_1, A_2, \ldots,\)
    $$P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)$$
Back

Bonferroni inequality

Front

$$P\left(\bigcap_{i=1}^n A_i\right) \ge 1 - \sum_{i=1}^n P\left(A_i^c\right)$$

Back

Chapter 3

(12 cards)

Inverse CDF method to generate random variables

Front
  1. Find CDF \(F(x) = u\)
  2. Find inverse CDF \(F^{-1}(u) = x\)
  3. Generate \(x\)'s from standard uniform samples
Back

Convergence in distribution

Front

At every point where \(F_X(x)\) is continuous, \(\forall x\):

$$F_{X_n}(x) = P(X_n \leqslant x) \to F_X(x) = P(X \leqslant x)$$

Back

Convolution of two PDFs \(f_X(x)\) and \(f_Y(y)\)

Front

$$f_Z(z) = (f_X * f_Y)(z) = \int_{-\infty}^\infty f_X(t)f_Y(z - t) \,dt$$

Back

PDF for an additive channel

Front

If \(Y = X + W\) with \(X, W\) independent, then

$$f_{Y|X}(y|x) = f_W(y-x)$$

Back

3 methods to compute PDF of \(Y = g(X)\)

Front
  • For discrete RV, inverse PMF method
  • For continuous or discrete RVs, inverse CDF method
  • For continuous RVs with invertible \(g(X)\), derivative formula
Back

PDF of an invertible function of a random variable

Front

Let \(Y = g(X)\) with \(g(x)\) invertible so that \(X = g^{-1}(Y)\):

$$f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{\partial g^{-1}(y)}{\partial y}\right|$$

Back

Relationship between a PDF and a probability

Front

$$P(X \in [a, a + \epsilon]) \approx f_X(a) \cdot \epsilon$$

or

$$f_X(a) = \lim_{\epsilon \to 0} \frac{P(X \in [a, a + \epsilon])}{\epsilon}$$

Back

Leibnitz rule

Front

$$\begin{aligned}\frac{d}{dz}\int_{a(z)}^{b(z)} h(x, z) \,dx &= h(b(z), z)b'\\&- h(a(z), z)a'\\&+ \int_{a(z)}^{b(z)} \frac{\partial h(x, z)}{\partial z} \,dx\end{aligned}$$

Back

Inverse CDF method of computing PDF for a single random variable

Front
  1. \(Y = g(X)\)
  2. \(F_Y(y) = P(Y \leqslant y) = P(g(X) \leqslant y) = P(X \leqslant g^{-1}(y))\)
  3. \(f_Y(y) = F_Y'(y)\)
Back

Convergence in Probability

Front

Random variables \(X_n \to X\) in probability if \(\forall \epsilon > 0\):

$$\lim_{n \to \infty} P(|X_n - X| \geqslant \epsilon) = 0$$

or, given any \(\epsilon\) and \(\delta > 0\), \(\exists N > 0\) such that:

$$P(|X_n - X| \geqslant \epsilon) < \delta, \forall n > N$$

Back

PDF of a linear function of a random variable

Front

Let \(X\) be a random variable with PDF \(f_X(x)\) and \(Y = aX + b\) with PDF \(f_Y(y)\):

$$f_Y(y) = \frac1{|a|}f_X\left(\frac{y - b}{a}\right)\text{ for }-\infty < y < \infty$$

Back

Quantile function of the distribution of \(X\)

Front

\(F^{-1}(p)\) is defined as the smallest value \(x\) such that \(F(x) \geqslant p\) for \(0 < p < 1\)

Back

Chapter 4

(20 cards)

Optimal estimate and resulting MSE for constant estimator

Front

$$\begin{aligned}\hat{Y} &= E(Y),\\\text{MSE} &= \text{Var}(Y)\end{aligned}$$

Back

Optimal parameters for linear estimator \(\hat{Y} = \beta_1X + \beta_0\)

Front

$$\begin{aligned}\beta_1 &= \frac{\sigma_{XY}}{\sigma_X^2},\\\beta_0 &= E(Y) - \beta_1E(X)\end{aligned}$$

Back

Jensen's inequality

Front

Let \(g\) be a convex function and let \(\mathbf{X}\) be a random vector with finite mean:

$$E[g(\mathbf{X})] \geqslant g(E(\mathbf{X}))$$

Back

Minimum MSE for linear estimation

Front

$$\sigma_Y^2(1 - \rho_{XY}^2)$$

Back

$$\int_0^1 p^k(1-p)^l \,dp$$

Front

$$\frac{k!l!}{(k + l + 1)!}$$

Back

Convex function

Front

For every \(\alpha \in (0, 1)\) and every \(\mathbf{x}\) and \(\mathbf{y}\),

$$g[\alpha\mathbf{x} + (1 - \alpha)\mathbf{y}] \geqslant \alpha g(\mathbf{x}) + (1 - \alpha) g(\mathbf{y})$$

Back

Law of Total Probability for Variance

Front

$$\begin{aligned}\text{Var}_Y(Y) &= E_X[\text{Var}_{Y|X}(Y|X)]\\&+ \text{Var}_X[E_{Y|X}(Y|X)]\end{aligned}$$

Back

Sample mean of i.i.d. \(X_i\) with \(E(X) = \mu\) and \(\text{Var}(X) = \sigma^2\)

Front

$$\begin{aligned}S_n &= \frac1{n} \sum_{i=1}^n X_i,\\E(S_n) &= \mu,\\\text{Var}(S_n) &= \frac{\sigma^2}{n}\end{aligned}$$

Back

Variance of sum of pairwise uncorrelated \(a_1X_1, \ldots, a_dX_d\)

Front

$$\text{Var}(a_1X_1 + \ldots + a_dX_d) = \sum_{i=1}^d a_i^2\text{Var}(X_i)$$

Back

Moment generating function

Front

Let \(X\) be random variable. For each real number \(t\),

$$\psi(t) = E(e^{tX})$$

Back

Expectation of non-negative integer random variable

Front

$$E(X) = \sum_{n=1}^\infty Pr(X \geqslant n)$$

Back

Covariance of linear relationship \(Y = aX + b\)

Front

$$\sigma_{XY} = a\sigma_X^2$$

Back

Special case of nested conditional expectation with \(g(X,Y) = h(Y) \cdot f(X)\)

Front

$$E_{XY}[h(Y) \cdot f(X)] = E_Y[h(Y) \cdot E_{X|Y}(f(X)|Y)]$$

Back

Cauchy-Schwarz Inequality

Front

\(X\) and \(Y\) are random variables with finite variance:

$$[\text{Cov}(X, Y)]^2 \leqslant \sigma_X^2\sigma_Y^2$$

and

$$-1 \leqslant \rho(X, Y) \leqslant 1$$

Back

Expectation of non-negative random variable with c.d.f. \(F\)

Front

$$E(X) = \int_0^\infty [1 - F(x)] \,dx$$

Back

Law of Total Probability for Expectations

Front

$$E_X[E_{Y|X}(Y|X)] = E_Y(Y)$$

Back

Nested conditional expectation of joint random variables

Front

$$E_{XY}[g(X,Y)] = E_X[E_{Y|X}(g(X,Y)|X)]$$

Back

Mixture distribution

Front

Let \(X = 1, 2, \ldots, M\) with \(P(X=i)=q_i, E(Y|X=i)=\mu_i, \text{Var}(Y|X=i)=\sigma_i^2\):

$$\begin{aligned}E(Y) &= \mu_y = \sum_i q_i\mu_i,\\\text{Var}(Y) &= \sum_i q_i\left[\sigma_i^2+(\mu_i-\mu_y)^2\right]\end{aligned}$$

Back

Schwarz Inequality

Front

$$[E(UV)]^2 \leqslant E(U^2)E(V^2)$$

Back

Optimal estimate and resulting MSE for MMSE estimator

Front

$$\begin{aligned}\hat{Y} &= E(Y|X),\\\text{MSE} &= E_X[\text{Var}_{Y|X}(Y|X)]\end{aligned}$$

Back

Chapter 5

(11 cards)

Relationship of Poisson and Binomial distributions as \(n \to \infty\)

Front

Total number of arrivals:

$$\lim_{n \to \infty} \text{Binom}\left(n, \frac{\lambda}{n}\right) = \text{Poisson}(\lambda)$$

Back

Correlation and independence within jointly Gaussian vector

Front
  • Gaussian random variables \(\not{\!\!\Rightarrow}\) jointly Gaussian
  • Independent Gaussian random variables \(\Rightarrow\) jointly Gaussian
  • Uncorrelated jointly Gaussian random variables \(\Rightarrow\) independent
Back

Maclaurin series for \(e^x\)

Front

$$e^x = \sum_{k=0}^\infty \frac{x^k}{k!}$$

Back

Central Limit Theorem

Front

Let \(Z_i\) be i.i.d. random variables, \(\mu = E(Z_i)\), and \(\sigma^2 = \text{Var}(Z_i)\):

$$\begin{aligned}\lim_{n \to \infty} \frac1{\sqrt{n}}\sum_{i=1}^n (Z_i - \mu) &\sim \mathcal{N}(0, \sigma^2),\\\lim_{n \to \infty} \bar{Z} &\sim \mathcal{N}(\mu, \frac{\sigma^2}{n}),\\\lim_{n \to \infty} \sum_i Z_i &\sim \mathcal{N}(n\mu, n\sigma^2)\end{aligned}$$

Back

Tail bounds on Standard Normal CDF

Front

$$\frac1{\sqrt{2\pi}z}\left(1 - \frac1{z^2}\right)e^{-\frac{z^2}2} \leqslant 1 - \Phi(z) \leqslant \frac1{\sqrt{2\pi}z}e^{-\frac{z^2}2}$$

Back

Negative binomial distribution

Front

The number \(X\) of failures that occur before the \(r\)th success has p.d.f.:

$$f(x|r, p) = \binom{r + x - 1}{x} p^r (1 - p)^x$$

for \(x = 0, 1, 2, \ldots\) or \(0\) otherwise.

$$\begin{aligned}E(X) &= \frac{r(1 - p)}{p}\\Var(X) &= \frac{r(1 - p)}{p^2}.\end{aligned}$$

Back

2nd moment of normal distribution

Front

$$\mu^2 + \sigma^2$$

Back

Exponential distribution

Front

Time between events in a Poisson point process (events occur continuously and independently at a constant average rate). With \(\lambda\) representing event rate:

$$f(x; \lambda) = \begin{cases}\lambda e^{-\lambda x},&x \geqslant 0\\0,&\text{ otherwise}.\end{cases}$$

Mean is \(\frac1{\lambda}\), variance is \(\frac1{\lambda^2}\)

Back

Linear combination of bivariate normal mean and variance. Let \(X_1\) and \(X_2\) be two random bivariate normal variables, what is the mean and variance of:

$$a_1X_1 + a_2X_2 + b$$

Front

$$\begin{aligned}mean&=a_1\mu_1 + a_2\mu_2 + b\\variance&=a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + 2a_1a_2\rho\sigma_1\sigma_2\end{aligned}$$

Back

\(e^x\) as a limit

Front

$$e^x = \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n$$

Back

Moments of exponential variables

Front

$$E(X^n) = \frac{n!}{\lambda^n}$$

Back

Chapter 6

(7 cards)

Approximation of \(P(S_n \leqslant c)\) where \(S_n = X_1 + \ldots + X_n\) and \(X_i\) are i.i.d. with mean \(\mu\) and variance \(\sigma^2\)

Front

Use Central Limit Theorem

  1. Approximate \(S_n \sim \mathcal{N}(n\mu, n\sigma^2)\)
  2. Let \(Z_n = \frac{S_n - n\mu}{\sigma\sqrt{n}}\)
  3. \(P(S_n \leqslant c) = P\left(Z_n \leqslant \frac{c-n\mu}{\sigma\sqrt{n}}\right) \approx \Phi\left(\frac{c - n\mu}{\sigma\sqrt{n}}\right)\)
Back

Weak law of large numbers

Front

If \(X_k\) are uncorrelated with same mean and variance, then \(\overline{X}_n \rightarrow E(X_k) = \mu\) in probability

Back

Strong law of large numbers

Front

If \(X_k\) are i.i.d. and \(E(|X_k|) < \infty\), then \(\overline{X}_n \rightarrow E(X)\) almost surely. Also applies to functions, \(\overline{g(X)}_n \rightarrow E[g(X)]\) if \(E(|g(X)|) < \infty\).

Back

Chebyshev inequality

Front

If \(X\) is a random variable for which \(\text{Var}(X)\) exists, \(\forall t>0\):

$$P(|X - E(X)| > t) \leqslant \frac{\text{Var}(X)}{t^2}$$

Back

Delta method

Front

Let \(Y_1, Y_2, \ldots\) be a sequence of random variables, \(F^*\) be a continuous CDF, \(\theta\) be a real number, and \(a_1, a_2, \ldots\) be a sequence of positive numbers increasing to \(\infty\).

 

If \(a_n(Y_n - \theta)\) converges in distribution to \(F^*\) and \(\alpha\) is a function with continuous derivative such that \(\alpha'(\theta) \ne 0\), then the following converges in distribution to \(F^*\):

$$\frac{a_n}{\alpha'(\theta)}(\alpha(Y_n) - \alpha(\theta))$$

Back

Almost sure convergence

Front

\(X_n \rightarrow X\) almost surely if

$$P\left(\lim_{n \to \infty} X_n = X\right) = 1$$

Back

Markov inequality

Front

If \(X\) is a random variable with \(P(X \geqslant 0) = 1\), \(\forall t>0\):

$$P(X \geqslant t) \leqslant \frac{E(X)}{t}$$

Back