Cheatsheet
Basics
Mean
Also see Random Variables: Expected or Mean Value
Weighted Mean
Grouped Data Mean
where \(M_i\) is the midpoint and \(f_i\) is the frequency of each class
where \(M_i\) is the midpoint and \(f_i\) is the frequency of each class
Variance
Biased
Unbiased
Also see Random Variables: Variance
Grouped Data Variance
Standard Deviation
Covariance
Also see Random Variables: Covariance
Variance Covariance Matrix
Correlation
where \(S_x\) and \(S_y\) are the standard deviations with respect to x and y and \(z_x\) and \(z_y\) are the z-scores for x and y.
Pearson Correlation Coefficient
where \(\sigma_{xy}\) is the covariance and \(\sigma_x\) and \(\sigma_y\) are the standard deviations of the variables.
Also see Random Variables: Correlation
Correlation Matrix
Z-Score
where \(x\) is the data point, \(\mu\) is the mean and \(\sigma\) is the standard deviation.
Random Variables
Discrete Random Variables
Probability Mass Function (pmf), \(p(x)\) for a discrete random variable \(X\) is given by where \(x\) is a value of the discrete RV \(X\) and belongs to the set of real numbers and maps to the outcomes w and \((X = x)\) is the corresponding event. Also, and
Continuous Random Variables
For any two numbers a and b, the probability density function (pdf) of a continuous RV \(X\) is given by where for all \(x\) and is the area under the entire graph of \(f(x)\).
\(f(x)dx\) is the probability that \(X\) is in an infinitesimal range around \(x\) of width \(dx\).
Also,
Continuous RV at any single value
Expected or Mean value
where X is a discrete RV with set of possible values D and pmf p(x)
If \(p(x_i)=p(x_j) \ \forall \ i,j\), i.e. each outcome has the same probability and hence equal likelihood of happening, then where 1/k is the probability of k terms with equal likelihood
For any linear function \(h(X) = aX + b\),
Rules of Expected Value
If \(X\) and \(Y\) are independent,
For any linear function \(h(X) = aX + b\),
Variance
For any linear function \(h(X) = aX + b\),
Standard Deviation
Rules of Variance & Standard Deviation
If \(X\) and \(Y\) are independent,
Standard Deviation,
For any linear function \(h(X) = aX + b\),
\(E[(X - a)^2]\) is a minumum when
Variance of any constant is zero and if a random variable has zero variance, then it is essentially constant.
Covariance
Rules of Covariance
Correlation
Rules of Correlation
- If \(a\) and \(c\) are either both positive or both negative,
- If \(ac \lt 0\) ,
- For any two rv’s X and Y,
Binomial RV
where \(n\) is the number of trials and \(p\) is the probability of success and \(q\) is the probability of failure in a single trial.
where \(q=1-p\)
Bernoulli RV
where \(q=1-p\)
Geometric RV
Distributions
Normal Distribution
where \(-\infty \lt x \lt \infty, -\infty \lt \mu \lt \infty, 0 \lt \sigma\)
For n independent RVs,
Standard Normal Distribution (z-Distribution), \(Z \sim N(\mu=0, \sigma = 1)\)
where \(-\infty \lt z \lt \infty\)
The CDF is obtained as the area under \(\phi\), to the left of \(z\).
where the area of the standard normal curve to the right of 0 (between \(-\infty\) to 0) is 1/2.
For any \(c \gt 0\),
Binomial Distribution
** Use cumulative binomial probability table to find out the PMF and CDF values
Binomial distribution approaches normal distribution when
provided \(np \geq 10\) and \(nq \geq 10\)
Bernoulli Distribution
If \(P(X=1)=\alpha\), then \(P(X=0) = 1 - \alpha\). Hence
Geometric Distribution
where \(x\) is the number of failures before the first success
where \([x]\) is the largest integer \(\leq x\)
Poisson Distribution
From Maclaurin series expansion of \(e^\mu\):
Probability that \(k\) events will be observed during any particular time interval of length \(t\) where \(\mu = \alpha t\) and \(\alpha\) is the rate of the event process, the expected number of events occuring in unit time. Also see Poisson Probability
Also,
And,
where \(\Gamma\) is the upper incomplete gamma function, a special function that is normally defined in terms of an integral
Poisson distribution approaches normal distribution when where \(\frac{X - \mu}{\sqrt\mu}\) is the standardized random variable
Students T-Distribution
Degrees of Freedom
where \(n\) is the sample size
Also see CI for Difference of Means and CI for Pooled Procedures for T Distribution for degrees of freedom for two sample tests.
Sampling
Finite Correction Factor (FPC)
Sampling Distribution of the Sample Mean(SDSM)
This is also called the Standard Error (SE)
Sampling Distribution of the Sample Proportion(SDSP)
Sample Distribution of the Difference of Means (SDDM)
Pooled Procedures
where
Sample Distribution of the Difference of Proportions (SDDP)
Confidence Interval
where CL is the Confidence Level
For a normal distribution with mean \(\mu\) and standard deviation \(\sigma\) where \(\overline x\) is the point estimate of mean \(\mu\)
A 100(1 - \(\alpha\))% CI for the \(\mu\) of a normal distribution with standard deviation \(\sigma\) is
Also see Confidence Interval for One and Two Tailed Tests
Bound on the error of estimation, B or Margin of Error, ME
Required sample size to estimate \(\mu\) with a \(100(1 - \alpha)\)% confidence and a fixed ME
Confidence Interval for the Proportion
where \(\hat p\) is the percentage of the subjects that meet the criteria
Margin of Error
Confidence Interval for Difference of Means
with degree of freedom
where \(se_1 = \frac{s_1}{\sqrt n_1}, se_2 = \frac{s_2}{\sqrt n_2}\)
Note: Round down when degree of freedom is not an integer, so that the estimate is more conservative
Confidence Interval for Pooled Procedures
with degree of freedom
Confidence Interval for Matched-Pair Test
where \(\overline d\) is the mean difference for the matched pair.
with degree of freedom
Confidence Interval for Difference of Proportions
Hypothesis Testing
Type 1 Error Rate
Power
One and Two Tailed Tests
Confidence Interval for One and Two Tailed Tests
Normal Distribution
T Distribution
Normal Distribution
T Distribution
Normal Distribution
T Distribution
Test Statistic
where \(\mu_0\) is the Null Hypothesis Mean
where \(df\) is the degrees of freedom
Test Statistic for the Proportion
Test Statistic for Difference of Means
with degrees of freedom as shown in CI for Difference of Means
Test Statistic for Pooled Procedures
with degrees of freedom as shown in CI for Pooled Procedures
Test Statistic for Matched-Pair Test
where \(\mu_d\) is the Null Hypothesis Mean Difference
Test Statistic for Difference of Proportions
with
where \(\hat p_1 n_1\) and \(\hat p_2 n_2\) are the number of successes in each sample
p-value
The p-value is the area under the standard normal curve to the right of \(z\)
The p-value is the area under the standard normal curve to the left of \(z\)
The p-value is sum of the area under the standard normal curve to the left and right of \(z\)
Simple Linear Regression Model
If all other factors (disturbance) are fixed, i.e. if \(\Delta u=0\), then
Simplified Form (Equation of Line)
Slope
where \(n\) is the number of data points
Y-Intercept
Errors
Residual
where \(\hat u\), \(\hat y_i\),\(\hat \beta_0\) and \(\hat \beta_1\) are the estimated values. Here \(\hat u\) is different from the error term \(u\).
Residual Properties
Measures of Variation
Sum of Squared Residuals
Explained Sum of Squares
Residual Sum of Squares
Total Sum of Squares
Coefficient of Determination
Adjusted R-Squared
where \(n\) is the sample size and \(k\) is the number of independent variables
Mean Square of Regression
Root Mean Square Error
Multiple Linear Regression Model
Tolerance
where \(R\) is the coefficient of determination
Variance Inflation Factor
Durbin Watson Statistic
Homoscedasticity
Polynomial Regression Model
Chi Square Tests
Expected Values
where \(Observed_{x_{itot}}\) and \(Observed_{y_{itot}}\) are the total observed counts corresponding to the ith group of the categorical variables \(x\) and \(y\) respectively and \(Observed_{tot}\) is the total count for all variables
Degrees of Freedom
Expected Values where \(Observed_{tot}\) is the total count for all variables
Degrees of Freedom
Support Vector Regression Model
Minimize
Logistic Regression Model
where \(p_i\) is the probability \(\frac{p_i}{1 - p_i}\) is the odds of success and \(ln \frac{p_i}{1 - p_i}\) is the log of the odds of success
Sigmoid Function
Solving the above for p, we get where \(z = \beta_0 + \beta_1x_1+ \dots\)
Likelihood Function
Intuition
Also see Mean tab for Binomial RV and Bernoulli RV
Likelihood for samples labelled as 1:
where \(x_i\) represents the feature vector for the \(i^{th}\) sample
Likelihood for samples labelled as 0:
Overall Likelihood:
Log Likelihood:
KMeans Clustering
WCSS
where \(P_{i1}\) is the \(i^{th}\) point in cluster 1, \(C_1\) is the center of cluster 1, \(m\) is the number of points in a cluster, \(n\) is the number of clusters and distance is the Euclidean distance between a point and the center of the cluster
Gradient Descent
Cost Function
Time Series Concepts
Exponential Smoothing
where \(0 \leq \alpha \leq 1\) is the smoothing constant
Lag k Differencing
Autoregressive Model (AR)
where \(t\) is the current period, \(t - 1\) is the previous period, \(x_t\) is the value for the current period, \(x_{t - 1}\) is the value for the previous period, \(\phi_1\) is the coefficient for the previous period value and \(\epsilon_t\) is the residual error and should be just some unpredictable "white noise"
and \(-1 \lt \phi_1 \lt 1\)
Pct Change
Moving Average Model (MA)
where \(t\) is the current period, \(t - 1\) is the previous period, \(\epsilon_t\) is the residual for the current period, \(\epsilon_{t - 1}\) is the residual for the previous period and \(\phi_1\) is the coefficient for the previous period value
and \(-1 \lt \phi_1 \lt 1\)