Skip to content

Cheatsheet

Basics

Mean

\[\mu = \frac{x_1 + x_2 + ... + x_N}{N} = \frac{\sum_{i = 1}^N x_i}{N}\]
\[\overline x = \frac{x_1 + x_2 + ... + x_n}{n} = \frac{\sum_{i = 1}^n x_i}{n}\]

Also see Random Variables: Expected or Mean Value

Weighted Mean

\[\mu = \frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}\]
\[\overline x = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}\]

Grouped Data Mean

\[\mu = \frac{\sum_{i=1}^N f_i M_i}{N}\]

where \(M_i\) is the midpoint and \(f_i\) is the frequency of each class

\[\overline x = \frac{\sum_{i=1}^n f_i M_i}{n}\]

where \(M_i\) is the midpoint and \(f_i\) is the frequency of each class

Variance

\[\sigma^2 = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N} \]

Biased

\[s^2 = \frac{\sum_{i=1}^n (x_i - \overline x)^2}{n} \]

Unbiased

\[s_{n-1}^2 = \frac{\sum_{i=1}^n (x_i - \overline x)^2}{n - 1} \]

Also see Random Variables: Variance

Grouped Data Variance

\[\sigma^2 = \frac{\sum_{i=1}^N f_i(M_i - \mu)^2}{N}\]
\[s^2 = \frac{\sum_{i=1}^n f_i(M_i - \overline x)^2}{n-1}\]

Standard Deviation

\[\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum_{i=1}^N (x_i - \mu)^2}{N}} \]
\[s = \sqrt{s_{n-1}^2} = \sqrt{\frac{\sum_{i=1}^n (x_i - \overline x)^2}{n - 1}} \]

Covariance

\[\sigma_{xy} = cov(x,y) = \frac{\sum(x_i - \mu_x)(y_i - \mu_y)}{N}\]
\[s_{xy} = cov(x,y) = \frac{\sum(x_i - \overline x)(y_i - \overline y)}{n - 1}\]

Also see Random Variables: Covariance

Variance Covariance Matrix

\[vcov(x,y) = \begin{bmatrix} var(x) & cov(x,y) \\ cov(x,y) & var(y) \end{bmatrix}\]
\[var(X) = \begin{bmatrix} var(X_1) & cov(X_1, X_2) & \dots & cov(X_1, X_n) \\ cov(X_2, X_1) & var(X_2) & \dots & cov(X_2, X_n) \\ \vdots & \ & \ddots \\ cov(X_n, X_1) & cov(X_n, X_2) & \dots & var(X_n) \\ \end{bmatrix}\]

Correlation

\[ r = \frac{1}{n - 1}\sum {\biggl(\frac{x_i - \overline x}{S_x}\biggl)\biggl(\frac{y_i - \overline y}{S_y}\biggl)} = \frac{1}{n - 1}\sum {(z_x)(z_y)}\]

where \(S_x\) and \(S_y\) are the standard deviations with respect to x and y and \(z_x\) and \(z_y\) are the z-scores for x and y.

Pearson Correlation Coefficient

\[\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}\]

where \(\sigma_{xy}\) is the covariance and \(\sigma_x\) and \(\sigma_y\) are the standard deviations of the variables.

\[r_{xy} = \frac{s_{xy}}{s_xs_y}\]

Also see Random Variables: Correlation

Correlation Matrix

\[r(X) = \begin{bmatrix} 1 & r(X_1, X_2) & \dots & r(X_1, X_n) \\ r(X_2, X_1) & 1 & \dots & r(X_2, X_n) \\ \vdots & \ & \ddots \\ r(X_n, X_1) & r(X_n, X_2) & \dots & 1 \\ \end{bmatrix}\]

Z-Score

\[z = \frac{\overline x - \mu}{\sigma}\]

where \(x\) is the data point, \(\mu\) is the mean and \(\sigma\) is the standard deviation.

Random Variables

Discrete Random Variables

Probability Mass Function (pmf), \(p(x)\) for a discrete random variable \(X\) is given by where \(x\) is a value of the discrete RV \(X\) and belongs to the set of real numbers and maps to the outcomes w and \((X = x)\) is the corresponding event. Also, and

Continuous Random Variables

For any two numbers a and b, the probability density function (pdf) of a continuous RV \(X\) is given by where for all \(x\) and is the area under the entire graph of \(f(x)\).

\(f(x)dx\) is the probability that \(X\) is in an infinitesimal range around \(x\) of width \(dx\).

Also,

Continuous RV at any single value

Expected or Mean value

where X is a discrete RV with set of possible values D and pmf p(x)

If \(p(x_i)=p(x_j) \ \forall \ i,j\), i.e. each outcome has the same probability and hence equal likelihood of happening, then where 1/k is the probability of k terms with equal likelihood

For any linear function \(h(X) = aX + b\),

\[ \mu_X = E(X) = \int_{-\infty}^{\infty}x.f(x) \ dx \]
\[ \mu_{h(X)} = E[h(X)] = \int_{-\infty}^{\infty}h(x).f(x) \ dx \]

Rules of Expected Value

\[ E(X + Y) = E(X) + E(Y) \]
\[ E(X - Y) = E(X) - E(Y) \]
\[ E(aX + bY) = aE(X) + bE(Y) \]

If \(X\) and \(Y\) are independent,

For any linear function \(h(X) = aX + b\),

Variance

\[ V(X) = \sigma_X^2 = \sum_D(x - \mu)^2 . p(x) = \sum_{i=1}^n (x_i - \mu)^2 . p(x_i) \]
\[ = E[(x- \mu)^2] \]
\[ = \biggl[\sum_D x^2 .p(x)\biggl] - \mu^2 \]
\[ = E(X^2) - [E(X)]^2 \]
\[ = E(X^2) - \mu^2 \]

For any linear function \(h(X) = aX + b\),

\[ V(X) = \sigma_X^2 = \int_{-\infty}^{\infty}(x - \mu)^2 . f(x) \ dx \]
\[ = E[(x- \mu)^2] \]

Standard Deviation

\[ \sigma_X = \sqrt{\sigma_X^2} \]

Rules of Variance & Standard Deviation

\[ Var(aX + bY) = a^2Var(X) + b^2Var(Y) + 2ab Cov(X,Y) \]
\[ Var(aX - bY) = a^2Var(X) + b^2Var(Y) - 2ab Cov(X,Y) \]

If \(X\) and \(Y\) are independent,

Standard Deviation,

For any linear function \(h(X) = aX + b\),

\(E[(X - a)^2]\) is a minumum when

Variance of any constant is zero and if a random variable has zero variance, then it is essentially constant.

Covariance

\[ Cov(X, Y) = \sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)] = E(XY) - E(X)E(Y) \]
\[ =\sum_x\sum_y(x - \mu_X)(y - \mu_Y)p(x,y) = \biggl(\sum_x\sum_y xyp(x,y)\biggl) - \mu_X\mu_Y \]
\[ =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}(x - \mu_X)(y - \mu_Y)f(x,y) \ dx \ dy =\biggl(\int_a^b\int_c^dxyf(x,y) \ dx \ dy \biggl) - \mu_X\mu_Y \]

Rules of Covariance

\[ Cov(X, X) = E[(X - \mu_X)^2] = V(X) \]
\[ Cov(aX + b, cY + d) = acCov(X,Y) \]
\[ Cov(X, Y+Z) = Cov(X,Y) + Cov(X,Z) \]

Correlation

Rules of Correlation

  • If \(a\) and \(c\) are either both positive or both negative,
  • If \(ac \lt 0\) ,
  • For any two rv’s X and Y,

Binomial RV

where \(n\) is the number of trials and \(p\) is the probability of success and \(q\) is the probability of failure in a single trial.

where \(q=1-p\)

\[ \sigma_x = \sqrt{np (1 - p)} = \sqrt{npq} \]

Bernoulli RV

\[ \mu_X = p = p * 1 + (1 - p) * 0 \]

where \(q=1-p\)

\[ \sigma_x = \sqrt{p (1 - p)} = \sqrt{pq} \]

Geometric RV

\[ \mu_X = E(X) = \frac{1}{p} \]
\[ \sigma^2_X = \frac{1 - p}{p^2} \]
\[ \sigma_x = \sqrt{\frac{1 - p}{p^2}} \]

Distributions

Normal Distribution

\[ f(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-(x-\mu)^2/(2 \sigma^2)} \]

where \(-\infty \lt x \lt \infty, -\infty \lt \mu \lt \infty, 0 \lt \sigma\)

For n independent RVs,

\[ f(x_1,...,x_n;\mu,\sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-(x_1 -\mu)^2/(2 \sigma^2)} \cdot ... \cdot \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-(x_n -\mu)^2/(2 \sigma^2)} \]
\[ = \biggl(\frac{1}{2 \pi \sigma^2}\biggl)^{n/2}e^{-\sum (x_i-\mu)^2/(2 \sigma^2)} \]
\[ F(x) = P(X \leq x) = \frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^x e^{-(v-\mu)^2/2 \sigma^2} dv \]
\[ P(a \leq X \leq b) = \int_a^b\frac{1}{\sqrt{2 \pi \sigma^2}}e^{-(x-\mu)^2/(2 \sigma^2)}dx \]

Standard Normal Distribution (z-Distribution), \(Z \sim N(\mu=0, \sigma = 1)\)

where \(-\infty \lt z \lt \infty\)

The CDF is obtained as the area under \(\phi\), to the left of \(z\).

\[ \phi(z) = P(Z \leq z) = \int_{-\infty}^z f(y;0,1)dy \]
\[ =\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^z e^{-u^2/2}du \]

where the area of the standard normal curve to the right of 0 (between \(-\infty\) to 0) is 1/2.

For any \(c \gt 0\),

Binomial Distribution

\[ b(x; n,p) = \begin{cases} \binom nxp^x(1-p)^{n-x}, & \ x=0,1,2,3,...,n \\ 0, & \ otherwise \end{cases} \]
\[ B(x; n,p) = P(X \leq x) = \sum_{y=0}^xb(y;n,p) \ \ x=0,1,2,...,n \]

** Use cumulative binomial probability table to find out the PMF and CDF values

Binomial distribution approaches normal distribution when

\[ P(X \leq x) = B(x; n,p) \approx (area \ under \ normal \ curve \ to \ the \ left \ of \ x + .5) \]

provided \(np \geq 10\) and \(nq \geq 10\)

Bernoulli Distribution

If \(P(X=1)=\alpha\), then \(P(X=0) = 1 - \alpha\). Hence

\[ F(x; \alpha) = \begin{cases} 0, & \ x \lt 0 \\ 1 - \alpha, & \ 0 \leq x \lt 1 \\ 1, & \ x \geq 1 \end{cases} \]

Geometric Distribution

where \(x\) is the number of failures before the first success

where \([x]\) is the largest integer \(\leq x\)

Poisson Distribution

\[ p(x; \mu) = P(X = x) = \frac{\mu^x e^{- \mu}}{x!} \ \ x=0,1,2... \]

From Maclaurin series expansion of \(e^\mu\):

Probability that \(k\) events will be observed during any particular time interval of length \(t\) where \(\mu = \alpha t\) and \(\alpha\) is the rate of the event process, the expected number of events occuring in unit time. Also see Poisson Probability

\[ P(X_1 \leq t) = 1 - P(X_1 \gt t) = 1 - e^{-\alpha t} \]

Also,

And,

\[ F_X(x) = \frac{\Gamma(x + 1, \mu)}{x!} \]

where \(\Gamma\) is the upper incomplete gamma function, a special function that is normally defined in terms of an integral

\[ \Gamma(s,x) = \int_x^\infty t^{s - 1}e^{-t} dt \]


\[ E(X) = \mu \]
\[ V(X) = \mu \]

Poisson distribution approaches normal distribution when where \(\frac{X - \mu}{\sqrt\mu}\) is the standardized random variable

Students T-Distribution

Degrees of Freedom

\[df = n − 1\]

where \(n\) is the sample size

Also see CI for Difference of Means and CI for Pooled Procedures for T Distribution for degrees of freedom for two sample tests.

Sampling

Finite Correction Factor (FPC)

\[ \frac{N - n}{N - 1} \]
\[ \sqrt \frac{N - n}{N - 1} \]

Sampling Distribution of the Sample Mean(SDSM)

\[ \mu_{\overline X} = \mu_{\overline x} = \mu \]
\[ \sigma_{\overline X}^2 = \sigma_{\overline x}^2 = \frac{\sigma^2}{n} \approx \frac{s^2}{n} \]

This is also called the Standard Error (SE)

\[ z = \frac{\overline x - \mu}{\frac {\sigma}{\sqrt n}} \]

Sampling Distribution of the Sample Proportion(SDSP)

\[ \mu_{\hat p} = p \]
\[ \sigma_{\hat p}^2 = \frac{p (1 - p)}{n} \]
\[ \sigma_{\hat p} = \sqrt \frac{p (1 - p)}{n} \]
\[ z_{\hat p} = \frac{\hat p - p}{\sigma_{\hat p}} \]

Sample Distribution of the Difference of Means (SDDM)

\[ \mu_{\overline x_1 - \overline x_2} = \mu_{\overline x_1} - \mu_{\overline x_2} \]
\[ \sigma_{\overline x_1 - \overline x_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]

Pooled Procedures

\[ \sigma_p^2 = \frac{(n_1 - 1)\sigma_1^2 + (n_2 - 1)\sigma_2^2}{n_1 + n_2 - 2} \]
\[ \sigma_{\overline x_1 - \overline x_2} = \sigma_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]

where

Sample Distribution of the Difference of Proportions (SDDP)

\[ \hat p_1 - \hat p_2 = \frac{x_1}{n_1} - \frac{x_2}{n_2} \]
\[ \sigma_{\hat p_1 - \hat p_2} = \sqrt{\frac{\hat p_1 (1 - \hat p_1)}{n_1} + \frac{\hat p_2 (1 - \hat p_2)}{n_2}} \]

Confidence Interval

where CL is the Confidence Level

For a normal distribution with mean \(\mu\) and standard deviation \(\sigma\) where \(\overline x\) is the point estimate of mean \(\mu\)

A 100(1 - \(\alpha\))% CI for the \(\mu\) of a normal distribution with standard deviation \(\sigma\) is

\[\overline x - z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n} \lt \mu \lt \overline x + z_{\alpha/2} \cdot \frac{\sigma}{\sqrt n}\]
\[ CI = \overline x \pm z_ {\alpha/2} \cdot \frac{\sigma}{\sqrt n} \sqrt{\frac{N - n}{N - 1}} \]
\[ CI = \overline x \pm t_{\alpha, n - 1} \cdot \frac{s}{\sqrt n} \]

Also see Confidence Interval for One and Two Tailed Tests

Bound on the error of estimation, B or Margin of Error, ME

\[ \implies CI = \overline x \pm ME \]

Required sample size to estimate \(\mu\) with a \(100(1 - \alpha)\)% confidence and a fixed ME

Confidence Interval for the Proportion

\[ CI = \hat p \pm z_ {\alpha/2} \cdot \sqrt {\frac {\hat p (1 - \hat p)}{n}} \]

where \(\hat p\) is the percentage of the subjects that meet the criteria

See SDSP Condition for Inferences

\[ CI = \hat p \pm z_ {\alpha/2} \cdot \sqrt {\frac {\hat p (1 - \hat p)}{n}} \sqrt{\frac{N - n}{N - 1}} \]

Margin of Error

\[ z_ {\alpha/2} \cdot \sqrt{\frac {\hat p (1 - \hat p)}{n}} \]

Confidence Interval for Difference of Means

\[ CI = (\overline x_1 - \overline x_2) \pm z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]
\[ CI = (\overline x_1 - \overline x_2) \pm t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

with degree of freedom

\[ df = \frac{\biggl(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \biggl)^2}{\frac{1}{n_1 - 1}\biggl(\frac{s_1^2}{n_1}\biggl)^2 + \frac{1}{n_2 - 1}\biggl(\frac{s_2^2}{n_2}\biggl)^2} \]

where \(se_1 = \frac{s_1}{\sqrt n_1}, se_2 = \frac{s_2}{\sqrt n_2}\)
Note: Round down when degree of freedom is not an integer, so that the estimate is more conservative

Confidence Interval for Pooled Procedures

\[ CI = (\overline x_1 - \overline x_2) \pm t_{\alpha/2} * s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]

with degree of freedom

Confidence Interval for Matched-Pair Test

where \(\overline d\) is the mean difference for the matched pair.

\[ CI = \overline d \pm t_{\alpha/2} * \frac{s_d}{\sqrt n} \]

with degree of freedom

Confidence Interval for Difference of Proportions

\[ CI = (\hat p_1 - \hat p_2) \pm z_{\alpha/2}\sqrt{\frac{\hat p_1 (1 - \hat p_1)}{n_1} + \frac{\hat p_2 (1 - \hat p_2)}{n_2}} \]

Hypothesis Testing

Type 1 Error Rate

\[ \alpha=P(rejecting \ H_0|H_0) \]
\[ =P(p-value \leq \text{significance level} | H_0) \]

Power

\[ 1 - \beta \]

One and Two Tailed Tests

\[ H_A: \mu \gt \mu_0 \]
\[ H_0: \mu \leq \mu_0 \]
\[ H_A: \mu \lt \mu_0 \]
\[ H_0: \mu \geq \mu_0 \]
\[ H_A: \mu \neq \mu_0 \]
\[ H_0: \mu = \mu_0 \]

Confidence Interval for One and Two Tailed Tests

Normal Distribution

T Distribution

Normal Distribution

T Distribution

Normal Distribution

T Distribution

Test Statistic

where \(\mu_0\) is the Null Hypothesis Mean

where \(df\) is the degrees of freedom

Test Statistic for the Proportion

\[ z = \frac{\hat p - p_0}{\sqrt{\frac{p_0 (1 - p_0)}{n}}} \]

Test Statistic for Difference of Means

\[ z = \frac{(\overline x_1 - \overline x_2) - (\mu_1 - \mu_2)}{\sqrt {\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

with degrees of freedom as shown in CI for Difference of Means

Test Statistic for Pooled Procedures

\[ z = \frac{(\overline x_1 - \overline x_2) - (\mu_1 - \mu_2)}{s_p\sqrt {\frac{1}{n_1} + \frac{1}{n_2}}} \]

with degrees of freedom as shown in CI for Pooled Procedures

Test Statistic for Matched-Pair Test

where \(\mu_d\) is the Null Hypothesis Mean Difference

\[ t = \frac{\overline d - \mu_d}{s_d/\sqrt n} \]

Test Statistic for Difference of Proportions

\[ z = \frac{(\hat p_1 - \hat p_2) - (p_1 - p_2)}{\sqrt {\hat p (1 - \hat p)(\frac{1}{n_1} + \frac{1}{n_2})}} \]

with

where \(\hat p_1 n_1\) and \(\hat p_2 n_2\) are the number of successes in each sample

p-value

The p-value is the area under the standard normal curve to the right of \(z\)

The p-value is the area under the standard normal curve to the left of \(z\)

The p-value is sum of the area under the standard normal curve to the left and right of \(z\)

Simple Linear Regression Model

\[ y = \beta_0 + \beta_1 x + u \]

If all other factors (disturbance) are fixed, i.e. if \(\Delta u=0\), then

\[ \Delta y = \beta_1 \Delta x \]

Simplified Form (Equation of Line)

\[ y = \beta_0 + \beta_1 x = a + bx \]

Slope

\[ \beta_1 = b = \frac{n \sum {xy} - \sum x \sum y}{n \sum x^2 - (\sum x)^2} \]

where \(n\) is the number of data points

Y-Intercept

\[ \beta_0 = a = \frac{\sum y - b \sum x}{n} \]

Errors

\[ E(u) = 0 \]
\[ E(u|x)=E(u) \implies E(u|x) = 0 \]

Residual

\[ \hat u = y_i - \hat y_i = y_i - \hat \beta_0 - \hat \beta_1x_i \]

where \(\hat u\), \(\hat y_i\),\(\hat \beta_0\) and \(\hat \beta_1\) are the estimated values. Here \(\hat u\) is different from the error term \(u\).

Residual Properties

\[ \sum_{i=1}^n \hat u_i = 0 \]
\[ \overline {\hat u} = 0 \]
\[ \sum_{i=1}^n x_i \hat u_i = 0 \]
\[ \overline y = \hat \beta_0 + \hat \beta_1 \overline x \]

Measures of Variation

Sum of Squared Residuals

\[ \sum_{i=1}^n \hat u_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2= \sum_{i=1}^n (y_i - \hat \beta_0 - \hat \beta_1 x_i)^2 \]

Explained Sum of Squares

\[ SSE = \sum_{i=1}^n(\hat y_i - \overline y)^2 \]

Residual Sum of Squares

\[ SSR = \sum_{i=1}^n \hat u_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2 \]

Total Sum of Squares

\[ SST = \sum_{i=1}^n(y_i - \overline y)^2 \]
\[ SST = SSE + SSR \]

Coefficient of Determination

\[ R^2 = 1 - \frac{SSR}{SST} \]

Adjusted R-Squared

where \(n\) is the sample size and \(k\) is the number of independent variables

Mean Square of Regression

\[ MSR = \frac{SSR}{degrees \ of \ freedom \ of \ SSR} \]

Root Mean Square Error

\[ RMSE = \sqrt {\frac{SSR}{n}} \]

Multiple Linear Regression Model

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n \]

Tolerance

where \(R\) is the coefficient of determination

Variance Inflation Factor

\[ VIF = \frac{1}{T} = \frac{1}{1 – R^2} \]

Durbin Watson Statistic

\[ d = \frac{\sum_{t=2}^T (e_t - e_{t-1})^2}{\sum_{t=1}^T e_t^2} \]

Homoscedasticity

\[ var(u|x_1,...,x_k) = \sigma^2\]

Polynomial Regression Model

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_1^2 + ... + \beta_n x_1^n \]

Chi Square Tests

Expected Values

\[ Expected_i = \frac{Observed_{x_{itot}} * Observed_{y_{itot}}}{Observed_{tot}} \]

where \(Observed_{x_{itot}}\) and \(Observed_{y_{itot}}\) are the total observed counts corresponding to the ith group of the categorical variables \(x\) and \(y\) respectively and \(Observed_{tot}\) is the total count for all variables

\[ \chi^2 = \sum{\frac{(observed - expected)^2}{expected}} \]

Degrees of Freedom

Expected Values where \(Observed_{tot}\) is the total count for all variables

Degrees of Freedom

Support Vector Regression Model

\[ \frac{1}{2}||w||^2 + C \sum_{i=1}^m (\xi_i + \xi_i^*) \]

Minimize

\[ C \sum_{i=1}^m (\xi_i + \xi_i^*) \]

Logistic Regression Model

\[ logit(p_i) = ln\biggl(\frac{p_i}{1 - p_i}\biggl) = \beta_0 + \beta_1x_{1,i}+\dots+\beta_Mx_{m,i} \]

where \(p_i\) is the probability \(\frac{p_i}{1 - p_i}\) is the odds of success and \(ln \frac{p_i}{1 - p_i}\) is the log of the odds of success

Sigmoid Function

Solving the above for p, we get where \(z = \beta_0 + \beta_1x_1+ \dots\)

Likelihood Function

Intuition

Also see Mean tab for Binomial RV and Bernoulli RV

Likelihood for samples labelled as 1:

where \(x_i\) represents the feature vector for the \(i^{th}\) sample

Likelihood for samples labelled as 0:

Overall Likelihood:

Log Likelihood:

KMeans Clustering

WCSS

where \(P_{i1}\) is the \(i^{th}\) point in cluster 1, \(C_1\) is the center of cluster 1, \(m\) is the number of points in a cluster, \(n\) is the number of clusters and distance is the Euclidean distance between a point and the center of the cluster

Gradient Descent

\[ f'(m, b) = \begin{bmatrix} \frac{df}{dm} \\ \frac{df}{db} \end{bmatrix} = \begin{bmatrix} \frac{1}{N} \sum_i^N { -2x_i(y_i - (mx_i + b)) } \\ \frac{1}{N} \sum_i^N { -2(y_i - (mx_i + b)) } \end{bmatrix} \]

Cost Function

\[ f(m, b) = \frac{1}{N} \sum_i^N { (y_i - (mx_i + b))^2 }\]

Time Series Concepts

Exponential Smoothing

\[ F_{t + 1} = \alpha y_t + \alpha (1 - \alpha) y_{t -1} + \alpha (1 - \alpha)^2 y_{t -2} + \dots\]

where \(0 \leq \alpha \leq 1\) is the smoothing constant

Lag k Differencing

\[ Difference = Y_t - Y_{t-k} \]

Autoregressive Model (AR)

\[ x_t = \phi_0 + \phi_1 x_{t - 1} + \epsilon_t \]

where \(t\) is the current period, \(t - 1\) is the previous period, \(x_t\) is the value for the current period, \(x_{t - 1}\) is the value for the previous period, \(\phi_1\) is the coefficient for the previous period value and \(\epsilon_t\) is the residual error and should be just some unpredictable "white noise"

and \(-1 \lt \phi_1 \lt 1\)

Pct Change

Moving Average Model (MA)

\[ x_t = \phi_0 + \phi_1 \epsilon_{t - 1} + \epsilon_t \]

where \(t\) is the current period, \(t - 1\) is the previous period, \(\epsilon_t\) is the residual for the current period, \(\epsilon_{t - 1}\) is the residual for the previous period and \(\phi_1\) is the coefficient for the previous period value

and \(-1 \lt \phi_1 \lt 1\)