Classification
Classification is the process of estimating the value of the dependent categorical variable from the independent variable(s).
Logistic Regression
- Predict a categorical dependent variable from a set of independent variables
- Curve is defined by the sigmoid function
- Parameters are estimated using the Maximum Likelihood Estimation
Likelihood Function
- To estimate \(\beta\) vectors consider a sample of data points with labels either 0 or 1
- Projection of data points on the sigmoid curve gives the probability of success (probability of the point being labelled as 1) for the data point
- Likelihood for samples labeled as
1- we try to estimate \(\beta\) such that the product of probability of all data points is as close to 1 as possible
- the projection of the data point on the sigmoid curve gives the probability of success, so \(p(x)\) gives the probability of being close to 1
- Likelihood for samples labeled as
0- we try to estimate \(\beta\) such that the product of probability of all data points is as close to 0 as possible
- the projection of the data point on the sigmoid curve gives the probability of success, so \(1 - p(x)\) gives the probability of being close to 0
- The likelihood function returns the probability of how well the model(curve) predicts the label
- Combination (Product) of conditions for samples labeled as
1and0
- Combination (Product) of conditions for samples labeled as
- Likelihood for samples labeled as
- The likelihood function across multiple models (sigmoid curves) is optimized to get the maximum likelihood
K-Nearest Neighbor
- Steps
- Choose number of neighbors (K) - usually an odd number
- Take the K nearest neighbors measured by the Euclidean distance (most common distance calculation used)
- Count the number of data points in each category among the neighbors
- Predict dependent data point to be in the category with the most neighbors
Support Vector Machine (SVM)
- Helps deine the best decision boundary to separate the categories
- Determines the plane for the Maximum Margin Hyperplane (Maximum Margin Classifier)
- Equidistant from the Support Vectors
- Support Vectors are the vectors that are closest to the Maximum Margin Hyperplane
- Think of these as the vectors which are the most similar (most challenging to differentiate) across the categories
- Objective is to maximize the sum of the distances of the Maximum Margin plane from the support vectors
Using Kernels
- Mapping non linear data points to higher planes allows us to define linear boundaries to separate them
- We can then project them back to the original plane to get the non linear separator in the original plane
- Kernels help avoid the high compute intensive requirements that comes with mapping data points to higher planes
Naive Bayes
- Assumes the dependent variables to be independent ( and are not correlated)
Decision Tree Classification
- Also See Decision Tree Regression
- Uses entropy for splitting data