Go on. Explore!

lvl 0 • 0 xp

lvl 1

Home Market Decks Interests

Probability and Statistics (Classic Version), 4th Edition

by DeGroot and Schervish

Scott Mueller (lvl 19)

Unsectioned

Preview this deck

Convergence in expectation

Front

1 / 95

https://www.pearson.com/us/higher-education/program/Schervish-Probability-and-Statistics-Classic-Version-4th-Edition/PGM2043173.html

0.0

0 reviews

5			0
4			0
3			0
2			0
1			0

Active users

All-time users

Favorites

Last updated

4 years ago

Date created

Oct 5, 2020

Cards (95)

Unsectioned

(43 cards)

Convergence in expectation

Front

$X_n \rightarrow X$ in expectation if $E(|X_n - X|) \rightarrow 0$. Implies $E(X_n) \rightarrow E(X)$.

Back

Variance of a random variable vector

Front

$$\text{Var}(\mathbf{X}) = E(\mathbf{X}\mathbf{X}^\top) - \mathbf{\mu}\mathbf{\mu}^\top$$

Back

PCA error

Front

With $\mathbf{x} = \mathbf{\mu} + \sum_{j=1}^p z_j\mathbf{v}_j$ and K-term approximation $\mathbf{\hat{x}} = \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$:

$$\begin{aligned}e_K &= \Vert \mathbf{x} - \mathbf{\hat{x}} \Vert^2 = \sum_{j=K+1}^p z_j^2,\\E(e_K) &= \sum_{j=K+1}^p E[\mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top\mathbf{v}_j]\\&= \sum_{j=K+1}^p \mathbf{v}_j^\top S_x \mathbf{v}_j\\&= \sum_{j=K+1}^p \lambda_j\end{aligned}$$

Back

Variance of random vector whose components are i.i.d.

Front

$$\sigma^2I$$

Back

PDF of multivariable function of $\mathbf{X}$ with joint PDF $f_X(x)$

Front

Like scalar case of $f_Y(y) = f_X(h(y))|h'(y)|$, $\mathbf{Y} = g(\mathbf{X})$. Let $\mathbf{X} = h(\mathbf{Y})$ be an inverse:

$$f_Y(\mathbf{y}) = f_X(h(\mathbf{y}))\left|\det{\frac{\partial h(y)}{\partial y}}\right|$$

Back

Dirac delta function

Front

$$\delta(x) = \begin{cases}+\infty &\text{if }x = 0,\\0 &\text{if }x \neq 0,\end{cases}$$

$$\int_a^b \delta(x - x_0) \,dx = \begin{cases}1 &\text{if }x_0 \in [a,b],\\0 &\text{if }x_0 \not\in [a,b],\end{cases}$$

If $f(x)$ is continuous at $x_0$:

$$\int_{-\infty}^\infty f(x) \delta(x - x_0) \,dx = f(x_0)$$

Back

Expectation and variance of $\mathbf{Y} = \mathbf{AX} + \mathbf{b}$

Front

$$\begin{aligned}E(\mathbf{Y}) &= \mathbf{A}E(\mathbf{X}) + \mathbf{b},\\\text{Var}(\mathbf{Y}) &= \mathbf{A}\text{Var}(\mathbf{X})\mathbf{A}^\top\end{aligned}$$

Back

Proportion of variance (PoV)

Front

Fraction of variance explained by first $k$ PCs:

$$\text{PoV}(k) = 1 - \frac{\sum_{i=k+1}^p \lambda_i}{\sum_{i=1}^p \lambda_i} = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^p \lambda_i}$$

Back

Joint PDF of $Y = (Y_1, Y_2)$ where $Y_1$ and $Y_2$ are functions of $X = (X_1, X_2)$

Front

Invert mapping: $(X_1, X_2) = h(Y_1, Y_2)$
Take Jacobian: $J = \frac{\partial h(y_1, y_2)}{\partial y}$
$f_Y(y_1, y_2) = f_X(h(y_1, y_2))|\det{J}|$

Back

Perron-Frobenius theorem

Front

For finite, aperiodic, irreducible Markov chains with transition matrix $\mathbf{P}$:

Eigenvalues $\lambda_1, \ldots, \lambda_M$ in decreasing magnitude
$\lambda_1$ = 1
Unique left eigenvector $\mathbf{\alpha}^\top = \mathbf{\alpha}^\top\mathbf{P}$
Unique stationary distribution
- Eigenvector for $\lambda_1$ of $\mathbf{P}^\top$
- Normalize by dividing eigenvector by sum of elements

Back

Find eigenvalues for matrix $A$

Front

Roots of $\det(\lambda I - A)$

Back

PCA transform

Front

$$z_j = \mathbf{v}_j^\top(\mathbf{x} - \mathbf{\mu})$$

Back

$$\frac{\partial \mathbf{v}^\top\mathbf{A}\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{A}\mathbf{v}$$

Back

Linear difference equation

Front

With equations of the form $\theta_{k+1} = c\theta_k + b$, where $c$ and $b$ are constants, $\theta_k$ can be solved with:

$$\theta_k = Ac^k + \frac{b}{1-c},$$

where $A$ is a constant that can be solved by plugging in $\theta_0$

Back

PCA inverse transform

Front

$$\mathbf{x} \approx \mathbf{\mu} + \sum_{j=1}^K z_j\mathbf{v}_j$$

Back

Covariance matrix $\mathbf{S}$ and its inverse for bivariate Gaussian random vector

Front

$$\begin{aligned}\mathbf{S} &= \begin{bmatrix}\sigma_1^2 & \sigma_{12}\\\sigma_{12} & \sigma_2^2\end{bmatrix}\\\mathbf{S}^{-1} &= \frac1{\sigma_1^2\sigma_2^2(1 - \rho^2)}\begin{bmatrix}\sigma_2^2 & -\rho\sigma_1\sigma_2\\-\rho\sigma_1\sigma_2 & \sigma_1^2\end{bmatrix}\end{aligned}$$

Back

Positive definite and positive semi-definite matrices

Front

$\mathbf{Q} \in \mathbb{R}^{N \times N}$ is:

Positive semi-definite if $\mathbf{Q} = \mathbf{Q}^\top$ and $\mathbf{x}^\top\mathbf{Q}\mathbf{x} \geqslant 0$ for all $\mathbf{x} \in \mathbb{R}^N$, written $\mathbf{Q} \geqslant 0$
Positive definite if $\mathbf{Q} = \mathbf{Q}^\top$ and $\mathbf{x}^\top\mathbf{Q}\mathbf{x} > 0$ for all $\mathbf{x} \in \mathbb{R}^N, \mathbf{x} \neq 0$, written $\mathbf{Q} > 0$

Back

Mean squared distance

Front

$$\begin{aligned}E(\Vert \mathbf{X} - \mathbf{Y} \Vert^2) &= \sum_j E\left[(X_j - Y_j)^2\right]\\E(\Vert \mathbf{X} - \mathbf{\mu} \Vert^2) &= \text{Tr}(\mathbf{S})\\&=\sum_j \lambda_j\end{aligned}$$

Back

Expected value of absolute value of standard Gaussian variable $Z$

Front

$$E(|Z|) = \sqrt{\frac2{\pi}}$$

Back

Compute HMM non-causal estimate

Front

$\gamma_k(i) = P(y_k | X_k = i)$ for all $i, k$
1. From observation matrix
2. $\gamma_k(i) = 1$ when $k = T$ or missing
$\alpha_0(i) = \gamma_0(i)P(X_0 = i)$
1. Can use vector notation: $\mathbf{\alpha}_0 = \begin{bmatrix}\gamma_0(1)\\\gamma_0(2)\end{bmatrix} \otimes \begin{bmatrix}P(X_0 = 1)\\P(X_0 = 2)\end{bmatrix}$
$\alpha_{k+1}(j) = \gamma_{k+1}(j)\sum_i P_{ij}\alpha_k(i)$
1. $\mathbf{\alpha}_{k+1} = \mathbf{\gamma}_{k+1} \otimes (\mathbf{P}^\top \mathbf{\alpha}_k)$
$\beta_k(i) = \sum_j P_{ij}\gamma_{k+1}(j)\beta_{k+1}(j)$
1. $\mathbf{\beta}_k = \mathbf{P}(\mathbf{\gamma}_{k+1} \otimes \mathbf{\beta}_{k+1})$
2. $\beta_T(i) = 1$ or $\frac1{M}$ for normalization
$p_k(i) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$

Back

Stationary Markov evolution

Front

With probability of initial state $\mathbf{\alpha}_0$:

$$\mathbf{\alpha}_{k+1}^\top = \mathbf{\alpha}_k^\top \mathbf{P}$$

$$\mathbf{\alpha}_k^\top = \mathbf{\alpha}_0^\top \mathbf{P}^k$$

Back

Test for whether $\mathbf{X} = (X_1, \ldots, X_d)$ is jointly Gaussian

Front

$\mathbf{X} = (X_1, \ldots, X_d)$ is jointly Gaussian iff linear combinations of $X_j$ are Gaussian, or

$$Z = \mathbf{a}^\top\mathbf{X} = \sum_ia_iX_i$$

is a scalar Gaussian for all vectors $\mathbf{a} \in \mathbb{R}^d$

Back

Singular value decomposition (SVD)

Front

SVD is $\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top$:

$\mathbf{X} \in \mathbb{R}^{N \times p}$ is data with sample mean subtracted
$\mathbf{U} \in \mathbb{R}^{N \times r}, \mathbf{U}^\top\mathbf{U} = \mathbf{I}_r$
$\mathbf{V} \in \mathbb{R}^{p \times r}, \mathbf{V}^\top\mathbf{V} = \mathbf{I}_r$
- Eigenvectors of $\mathbf{S}_x$ (PCs)
$\mathbf{\Sigma} = \text{diag}(\alpha_1, \ldots, \alpha_r)$, singular values sorted descending
- Eigenvalues are $\frac{\alpha_j^2}{N}$
$\mathbf{S}_x = \frac1{N}\mathbf{X}^\top\mathbf{X} = \frac1{N}\mathbf{V}\mathbf{\Sigma}^2\mathbf{V}^\top$

Back

Forward term $\alpha_k(i)$ with regards to Hidden Markov Model:

$\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow$

and

$X_{k-1} \rightarrow Y_{k-1}$

$X_k \rightarrow Y_k$

$X_{k+1} \rightarrow X_{k+1}$

Front

$$\alpha_k(i) = P(X_k = i, y_0^k)$$

Back

Irreducible set of states

Front

Irreducible set if all pairs of states in the set communicate

There is a path between any pair of states in the set
A Markov chain is irreducible if set of all states is irreducible
There won't be a unique steady state distribution unless the entire Markov chain is irreducible

Back

Valid covariance matrix

Front

$S$ must be positive semi-definite, so $\det(S) \geqslant 0$

Back

Properties of $N$ orthonormal eigenvectors: $\mathbf{V} = [\mathbf{v}_1, \ldots, \mathbf{v}_N] \in \mathbb{R}^{N \times N}$

Front

Since $\mathbf{v}_i$ are orthonormal, $\mathbf{V}$ is an orthogonal matrix
- $\mathbf{V}\mathbf{V}^\top = \mathbf{V}^\top\mathbf{V} = I$
Since $\mathbf{v}_i$ are eigenvectors: $\mathbf{S}\mathbf{V} = \mathbf{V}\mathbf{D}$
- $\mathbf{D} = \text{diag}(\lambda_1, \ldots, \lambda_N)$
Diagonalization: $\mathbf{S} = \mathbf{V}\mathbf{D}\mathbf{V}^\top$

Back

Determinant of $\begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}$

Front

$$aei + bfg + cdh - gec - hfa - idb$$

Go diagonally down from $a$, $b$, and $c$, multiply and add. Go diagonally up from $g$, $h$, and $i$, multiply and subtract.

Back

Sample variance matrix

Front

With $n$ samples each having $p$ features, $\mathbf{x}_i \in \mathbb{R}^p$:

$$\begin{aligned}\mathbf{S}_x &= \frac1{n} \sum_{i=1}^n (\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\\&=E\left[(\mathbf{x}_i - \overline{\mathbf{x}})(\mathbf{x}_i - \overline{\mathbf{x}})^\top\right]\end{aligned}$$

Back

Non-causal estimate

Front

$$P(X_k = i | y_0^{T-1}) = \frac{\alpha_k(i)\beta_k(i)}{\sum_j\alpha_k(j)\beta_k(j)}$$

Back

Inverse of $A = \begin{bmatrix}a&b\\c&d\end{bmatrix}$

Front

$$\frac1{\text{det}(A)}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}$$

Inverse doesn't exist if $A$ is singular (determinant is $0$)

Back

HMM backward term recursion

Front

$$\beta_k(i) = \sum_j \beta_{k+1}(j)\gamma_{k+1}(j)P_k(i,j)$$

with initial condition

$$\beta_T(i) = P(y_T | X_T = i) = 1$$

Back

Matrix square root

Front

With the eigenvalue decomposition of $S = UDU^\top$:

$$S^\frac12 = UD^\frac12U^\top$$

Back

Vector Gaussian mixture model mean and variance

Front

Just like a scalar mixture distribution:

$$\begin{aligned}E(\mathbf{x}) &= \sum_i \mathbf{\mu}_iq_i = \mathbf{\mu}\\\text{Var}(\mathbf{x}) &= \sum_i q_i\left[\mathbf{S}_i + (\mathbf{\mu} - \mathbf{\mu}_i)(\mathbf{\mu} - \mathbf{\mu}_i)^\top\right]\\&= \mathbf{S}\end{aligned}$$

Back

Multivariable Gaussian PDF with sample variance

Front

$$f_X(\mathbf{x}) = \frac{e^{-\frac12(\mathbf{x} - \mathbf{\mu})^\top\mathbf{S}^{-1}(\mathbf{x} - \mathbf{\mu})}}{(2\pi)^{\frac{d}2}\sqrt{\text{det}(\mathbf{S})}},$$

where $d$ is the length of $\mathbf{X}$ and $\mathbf{S}$ is the sample variance

Back

Stochastic matrix

Front

Satisfies

$P_{ij} \geqslant 0$
$\forall i: \sum_j P_{ij} = 1$

Back

Generate i.i.d. multivariable Gaussian samples $\mathbf{x}_i \in \mathbb{R}^d, \mathbf{x}_i\sim\mathcal{N}(\mu, S)$

Front

Generate i.i.d. $\mathbf{z}_i \in \mathbb{R}^d, \mathbf{z}_i\sim\mathcal{N}(0, I)$
1. Components $z_{ij}$ are i.i.d. $\mathcal{N}(0, 1)$
$x_i = S^\frac12z_i + \mu$, works because:
1. $E(x_i) = S^\frac12E(z_i) + \mu = S^\frac120 + \mu = \mu$
2. $\text{Var}(x_i) = S^\frac12\text{Var}(z_i)S^\frac12 = S^\frac12IS^\frac12 = S^\frac12S^\frac12 = S$

Back

HMM forward term recursion

Front

$$\alpha_{k+1}(j) = \gamma_{k+1}(j) \sum_i P_k(i, j)\alpha_k(i)$$

where

$$\begin{aligned}P_k(i, j) &= P(X_{k+1} = j | X_k = i)\\\gamma_{k+1}(j) &= P(y_{k+1} | X_{k+1} = j)\end{aligned}$$

with initial condition

$$\alpha_k(i) = P(X_0 = i)\gamma_0(i)$$

Back

Find eigenvectors of matrix $\mathbf{A}$

Front

Solve for $\mathbf{v}_i$ with each $\lambda_i$
$\mathbf{A}\mathbf{v}_i = \lambda_i \mathbf{v}_i$
Normalize: $\Vert\mathbf{v_i}\Vert^2 = 1$

Back

Properties of symmetric matrix $S$

Front

$N$ orthonormal eigenvectors: $\Vert v_j \Vert = 1$ and $v_i^\top v_j = 0$ for $i \ne j$
Eigenvalues are real: $Sv_i = \lambda_iv_i$
S is positive semi-definite iff $\lambda_i \geqslant 0$
S is positive definite iff $\lambda_i > 0$

Back

$$\frac{\partial \mathbf{v}^\top\mathbf{v}}{\partial v}$$

Front

$$2\mathbf{v}$$

Back

Backward term $\beta_k(i)$ with regards to Hidden Markov Model:

$\rightarrow X_{k-1} \rightarrow X_k \rightarrow X_{k+1} \rightarrow$

and

$X_{k-1} \rightarrow Y_{k-1}$

$X_k \rightarrow Y_k$

$X_{k+1} \rightarrow X_{k+1}$

Front

$$\beta_k(i) = P(y_{k+1}^{T-1}|X_k = i)$$

Back

Stationary transition matrix

Front

AKA homogeneous, graphical drawing called state transition diagram:

$$P_{ij}(k) = P_{ij}$$

Back

Chapter 1

(2 cards)

Probability axioms

Front

For every event $A$, $P(A) \ge 0$
$P(S) = 1$
For every infinite sequence of disjoint events $A_1, A_2, \ldots,$
$$P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)$$

Back

Bonferroni inequality

Front

$$P\left(\bigcap_{i=1}^n A_i\right) \ge 1 - \sum_{i=1}^n P\left(A_i^c\right)$$

Back

Chapter 3

(12 cards)

Inverse CDF method to generate random variables

Front

Find CDF $F(x) = u$
Find inverse CDF $F^{-1}(u) = x$
Generate $x$'s from standard uniform samples

Back

Convergence in distribution

Front

At every point where $F_X(x)$ is continuous, $\forall x$:

$$F_{X_n}(x) = P(X_n \leqslant x) \to F_X(x) = P(X \leqslant x)$$

Back

Convolution of two PDFs $f_X(x)$ and $f_Y(y)$

Front

$$f_Z(z) = (f_X * f_Y)(z) = \int_{-\infty}^\infty f_X(t)f_Y(z - t) \,dt$$

Back

PDF for an additive channel

Front

If $Y = X + W$ with $X, W$ independent, then

$$f_{Y|X}(y|x) = f_W(y-x)$$

Back

3 methods to compute PDF of $Y = g(X)$

Front

For discrete RV, inverse PMF method
For continuous or discrete RVs, inverse CDF method
For continuous RVs with invertible $g(X)$, derivative formula

Back

PDF of an invertible function of a random variable

Front

Let $Y = g(X)$ with $g(x)$ invertible so that $X = g^{-1}(Y)$:

$$f_Y(y) = f_X(g^{-1}(y)) \cdot \left|\frac{\partial g^{-1}(y)}{\partial y}\right|$$

Back

Relationship between a PDF and a probability

Front

$$P(X \in [a, a + \epsilon]) \approx f_X(a) \cdot \epsilon$$

$$f_X(a) = \lim_{\epsilon \to 0} \frac{P(X \in [a, a + \epsilon])}{\epsilon}$$

Back

Leibnitz rule

Front

$$\begin{aligned}\frac{d}{dz}\int_{a(z)}^{b(z)} h(x, z) \,dx &= h(b(z), z)b'\\&- h(a(z), z)a'\\&+ \int_{a(z)}^{b(z)} \frac{\partial h(x, z)}{\partial z} \,dx\end{aligned}$$

Back

Inverse CDF method of computing PDF for a single random variable

Front

$Y = g(X)$
$F_Y(y) = P(Y \leqslant y) = P(g(X) \leqslant y) = P(X \leqslant g^{-1}(y))$
$f_Y(y) = F_Y'(y)$

Back

Convergence in Probability

Front

Random variables $X_n \to X$ in probability if $\forall \epsilon > 0$:

$$\lim_{n \to \infty} P(|X_n - X| \geqslant \epsilon) = 0$$

or, given any $\epsilon$ and $\delta > 0$, $\exists N > 0$ such that:

$$P(|X_n - X| \geqslant \epsilon) < \delta, \forall n > N$$

Back

PDF of a linear function of a random variable

Front

Let $X$ be a random variable with PDF $f_X(x)$ and $Y = aX + b$ with PDF $f_Y(y)$:

$$f_Y(y) = \frac1{|a|}f_X\left(\frac{y - b}{a}\right)\text{ for }-\infty < y < \infty$$

Back

Quantile function of the distribution of $X$

Front

$F^{-1}(p)$ is defined as the smallest value $x$ such that $F(x) \geqslant p$ for $0 < p < 1$

Back

Chapter 4

(20 cards)

Optimal estimate and resulting MSE for constant estimator

Front

$$\begin{aligned}\hat{Y} &= E(Y),\\\text{MSE} &= \text{Var}(Y)\end{aligned}$$

Back

Optimal parameters for linear estimator $\hat{Y} = \beta_1X + \beta_0$

Front

$$\begin{aligned}\beta_1 &= \frac{\sigma_{XY}}{\sigma_X^2},\\\beta_0 &= E(Y) - \beta_1E(X)\end{aligned}$$

Back

Jensen's inequality

Front

Let $g$ be a convex function and let $\mathbf{X}$ be a random vector with finite mean:

$$E[g(\mathbf{X})] \geqslant g(E(\mathbf{X}))$$

Back

Minimum MSE for linear estimation

Front

$$\sigma_Y^2(1 - \rho_{XY}^2)$$

Back

$$\int_0^1 p^k(1-p)^l \,dp$$

Front

$$\frac{k!l!}{(k + l + 1)!}$$

Back

Convex function

Front

For every $\alpha \in (0, 1)$ and every $\mathbf{x}$ and $\mathbf{y}$,

$$g[\alpha\mathbf{x} + (1 - \alpha)\mathbf{y}] \geqslant \alpha g(\mathbf{x}) + (1 - \alpha) g(\mathbf{y})$$

Back

Law of Total Probability for Variance

Front

$$\begin{aligned}\text{Var}_Y(Y) &= E_X[\text{Var}_{Y|X}(Y|X)]\\&+ \text{Var}_X[E_{Y|X}(Y|X)]\end{aligned}$$

Back

Sample mean of i.i.d. $X_i$ with $E(X) = \mu$ and $\text{Var}(X) = \sigma^2$

Front

$$\begin{aligned}S_n &= \frac1{n} \sum_{i=1}^n X_i,\\E(S_n) &= \mu,\\\text{Var}(S_n) &= \frac{\sigma^2}{n}\end{aligned}$$

Back

Variance of sum of pairwise uncorrelated $a_1X_1, \ldots, a_dX_d$

Front

$$\text{Var}(a_1X_1 + \ldots + a_dX_d) = \sum_{i=1}^d a_i^2\text{Var}(X_i)$$

Back

Moment generating function

Front

Let $X$ be random variable. For each real number $t$,

$$\psi(t) = E(e^{tX})$$

Back

Expectation of non-negative integer random variable

Front

$$E(X) = \sum_{n=1}^\infty Pr(X \geqslant n)$$

Back

Covariance of linear relationship $Y = aX + b$

Front

$$\sigma_{XY} = a\sigma_X^2$$

Back

Special case of nested conditional expectation with $g(X,Y) = h(Y) \cdot f(X)$

Front

$$E_{XY}[h(Y) \cdot f(X)] = E_Y[h(Y) \cdot E_{X|Y}(f(X)|Y)]$$

Back

Cauchy-Schwarz Inequality

Front

$X$ and $Y$ are random variables with finite variance:

$$[\text{Cov}(X, Y)]^2 \leqslant \sigma_X^2\sigma_Y^2$$

and

$$-1 \leqslant \rho(X, Y) \leqslant 1$$

Back

Expectation of non-negative random variable with c.d.f. $F$

Front

$$E(X) = \int_0^\infty [1 - F(x)] \,dx$$

Back

Law of Total Probability for Expectations

Front

$$E_X[E_{Y|X}(Y|X)] = E_Y(Y)$$

Back

Nested conditional expectation of joint random variables

Front

$$E_{XY}[g(X,Y)] = E_X[E_{Y|X}(g(X,Y)|X)]$$

Back

Mixture distribution

Front

Let $X = 1, 2, \ldots, M$ with $P(X=i)=q_i, E(Y|X=i)=\mu_i, \text{Var}(Y|X=i)=\sigma_i^2$:

$$\begin{aligned}E(Y) &= \mu_y = \sum_i q_i\mu_i,\\\text{Var}(Y) &= \sum_i q_i\left[\sigma_i^2+(\mu_i-\mu_y)^2\right]\end{aligned}$$

Back

Schwarz Inequality

Front

$$[E(UV)]^2 \leqslant E(U^2)E(V^2)$$

Back

Optimal estimate and resulting MSE for MMSE estimator

Front

$$\begin{aligned}\hat{Y} &= E(Y|X),\\\text{MSE} &= E_X[\text{Var}_{Y|X}(Y|X)]\end{aligned}$$

Back

Chapter 5

(11 cards)

Relationship of Poisson and Binomial distributions as $n \to \infty$

Front

Total number of arrivals:

$$\lim_{n \to \infty} \text{Binom}\left(n, \frac{\lambda}{n}\right) = \text{Poisson}(\lambda)$$

Back

Correlation and independence within jointly Gaussian vector

Front

Gaussian random variables $\not{\!\!\Rightarrow}$ jointly Gaussian
Independent Gaussian random variables $\Rightarrow$ jointly Gaussian
Uncorrelated jointly Gaussian random variables $\Rightarrow$ independent

Back

Maclaurin series for $e^x$

Front

$$e^x = \sum_{k=0}^\infty \frac{x^k}{k!}$$

Back

Central Limit Theorem

Front

Let $Z_i$ be i.i.d. random variables, $\mu = E(Z_i)$, and $\sigma^2 = \text{Var}(Z_i)$:

$$\begin{aligned}\lim_{n \to \infty} \frac1{\sqrt{n}}\sum_{i=1}^n (Z_i - \mu) &\sim \mathcal{N}(0, \sigma^2),\\\lim_{n \to \infty} \bar{Z} &\sim \mathcal{N}(\mu, \frac{\sigma^2}{n}),\\\lim_{n \to \infty} \sum_i Z_i &\sim \mathcal{N}(n\mu, n\sigma^2)\end{aligned}$$

Back

Tail bounds on Standard Normal CDF

Front

$$\frac1{\sqrt{2\pi}z}\left(1 - \frac1{z^2}\right)e^{-\frac{z^2}2} \leqslant 1 - \Phi(z) \leqslant \frac1{\sqrt{2\pi}z}e^{-\frac{z^2}2}$$

Back

Negative binomial distribution

Front

The number $X$ of failures that occur before the $r$th success has p.d.f.:

$$f(x|r, p) = \binom{r + x - 1}{x} p^r (1 - p)^x$$

for $x = 0, 1, 2, \ldots$ or $0$ otherwise.

$$\begin{aligned}E(X) &= \frac{r(1 - p)}{p}\\Var(X) &= \frac{r(1 - p)}{p^2}.\end{aligned}$$

Back

2nd moment of normal distribution

Front

$$\mu^2 + \sigma^2$$

Back

Exponential distribution

Front

Time between events in a Poisson point process (events occur continuously and independently at a constant average rate). With $\lambda$ representing event rate:

$$f(x; \lambda) = \begin{cases}\lambda e^{-\lambda x},&x \geqslant 0\\0,&\text{ otherwise}.\end{cases}$$

Mean is $\frac1{\lambda}$, variance is $\frac1{\lambda^2}$

Back

Linear combination of bivariate normal mean and variance. Let $X_1$ and $X_2$ be two random bivariate normal variables, what is the mean and variance of:

$$a_1X_1 + a_2X_2 + b$$

Front

$$\begin{aligned}mean&=a_1\mu_1 + a_2\mu_2 + b\\variance&=a_1^2\sigma_1^2 + a_2^2\sigma_2^2 + 2a_1a_2\rho\sigma_1\sigma_2\end{aligned}$$

Back

$e^x$ as a limit

Front

$$e^x = \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n$$

Back

Moments of exponential variables

Front

$$E(X^n) = \frac{n!}{\lambda^n}$$

Back

Chapter 6

(7 cards)

Approximation of $P(S_n \leqslant c)$ where $S_n = X_1 + \ldots + X_n$ and $X_i$ are i.i.d. with mean $\mu$ and variance $\sigma^2$

Front

Use Central Limit Theorem

Approximate $S_n \sim \mathcal{N}(n\mu, n\sigma^2)$
Let $Z_n = \frac{S_n - n\mu}{\sigma\sqrt{n}}$
$P(S_n \leqslant c) = P\left(Z_n \leqslant \frac{c-n\mu}{\sigma\sqrt{n}}\right) \approx \Phi\left(\frac{c - n\mu}{\sigma\sqrt{n}}\right)$

Back

Weak law of large numbers

Front

If $X_k$ are uncorrelated with same mean and variance, then $\overline{X}_n \rightarrow E(X_k) = \mu$ in probability

Back

Strong law of large numbers

Front

If $X_k$ are i.i.d. and $E(|X_k|) < \infty$, then $\overline{X}_n \rightarrow E(X)$ almost surely. Also applies to functions, $\overline{g(X)}_n \rightarrow E[g(X)]$ if $E(|g(X)|) < \infty$.

Back

Chebyshev inequality

Front

If $X$ is a random variable for which $\text{Var}(X)$ exists, $\forall t>0$:

$$P(|X - E(X)| > t) \leqslant \frac{\text{Var}(X)}{t^2}$$

Back

Delta method

Front

Let $Y_1, Y_2, \ldots$ be a sequence of random variables, $F^*$ be a continuous CDF, $\theta$ be a real number, and $a_1, a_2, \ldots$ be a sequence of positive numbers increasing to $\infty$.

If $a_n(Y_n - \theta)$ converges in distribution to $F^*$ and $\alpha$ is a function with continuous derivative such that $\alpha'(\theta) \ne 0$, then the following converges in distribution to $F^*$:

$$\frac{a_n}{\alpha'(\theta)}(\alpha(Y_n) - \alpha(\theta))$$

Back

Almost sure convergence

Front

$X_n \rightarrow X$ almost surely if

$$P\left(\lim_{n \to \infty} X_n = X\right) = 1$$

Back

Markov inequality

Front

If $X$ is a random variable with $P(X \geqslant 0) = 1$, $\forall t>0$:

$$P(X \geqslant t) \leqslant \frac{E(X)}{t}$$

Back