Classification

Model Types

Logistic Regression

Predict a categorical dependent variable from a set of independent variables
It is a linear classifier
Requires Feature Scaling
Uses probabilistic approach
- Useful for ranking predictions by their probability
- Provides info on statistical significance of features

Sample Code

from sklearn.linear_model import LogisticRegression
c_lr = LogisticRegression(random_state = 0)
c_lr.fit(X_train, y_train)
y_pred = c_lr.predict(sc.transform(X_test))

K-Nearest Neighbor

Predict which category a random point falls in
Not a linear classifier
Requires Feature Scaling

Sample Code

K Neighbors Classifier

from sklearn.neighbors import KNeighborsClassifier
c_knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
c_knn.fit(X_train, y_train)
y_pred = c_knn.predict(sc.transform(X_test))

Support Vector Machine (SVM)

Works with both linear and non linear problems
Requires Feature Scaling
Not preferred for large number of features
Linear SVM
- Not preferred for non linear problems
Kernel SVM
- High performance on non linear problems
- Not biased by outliers
- Not sensitive to overfitting
Good for segmentation use cases

Sample Code

Support Vector Classifier

from sklearn.svm import SVC
c_svc_rbf = SVC(kernel = 'rbf', random_state=0)   # rbf kernel
c_svc_rbf.fit(X_train, y_train)
# Predict
y_pred_rbf = c_svc_rbf.predict(sc.transform(X_test))

Naive Bayes

Works best when there are two independent variables
Not a linear classifier
Not biased by outliers
Requires Feature Scaling
Uses probabilistic approach
- Useful for ranking predictions by their probability

Sample Code

Gaussian Naive Bayes Classifier

from sklearn.naive_bayes import GaussianNB
c_gnb = GaussianNB()
c_gnb.fit(X_train, y_train)
# Predict
y_pred_gnb = c_gnb.predict(sc.transform(X_test))

Decision Tree Classification

Works with both linear and non linear problems
Preferred for better interpretability
Feature scaling not needed
Not good with small datasets
- May result in overfitting

Sample Code

Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier
c_dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
c_dt.fit(X_train, y_train)
# Predict
y_pred_dt = c_dt.predict(sc.transform(X_test))

Random Forest Classification

Works with both linear and non linear problems
Also see Random Forest Regression

Sample Code

Decision Tree Classifier

from sklearn.ensemble import RandomForestClassifier
c_rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
c_rf.fit(X_train, y_train)
# Predict
y_pred_dt = c_rf.predict(sc.transform(X_test))

```

XGB Classification

Sample Code

XGBoost Classifier

from xgboost import XGBClassifier
c_xgb = XGBClassifier()
c_xgb.fit(X_train, y_train)
y_pred_xgb = c_xgb.predict(X_test)

CatBoost Classification

Sample Code

CatBoost Classifier

from catboost import CatBoostClassifier
c_cb = CatBoostClassifier()
c_cb.fit(X_train, y_train)
y_pred_cb = c_cb.predict(X_test)