Classification
Model Types
Logistic Regression
- Predict a categorical dependent variable from a set of independent variables
- It is a linear classifier
- Requires Feature Scaling
- Uses probabilistic approach
- Useful for ranking predictions by their probability
- Provides info on statistical significance of features
Sample Code
from sklearn.linear_model import LogisticRegression
c_lr = LogisticRegression(random_state = 0)
c_lr.fit(X_train, y_train)
y_pred = c_lr.predict(sc.transform(X_test))
K-Nearest Neighbor
- Predict which category a random point falls in
- Not a linear classifier
- Requires Feature Scaling
Sample Code
from sklearn.neighbors import KNeighborsClassifier
c_knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
c_knn.fit(X_train, y_train)
y_pred = c_knn.predict(sc.transform(X_test))
Support Vector Machine (SVM)
- Works with both linear and non linear problems
- Requires Feature Scaling
- Not preferred for large number of features
- Linear SVM
- Not preferred for non linear problems
- Kernel SVM
- High performance on non linear problems
- Not biased by outliers
- Not sensitive to overfitting
- Good for segmentation use cases
Sample Code
from sklearn.svm import SVC
c_svc_rbf = SVC(kernel = 'rbf', random_state=0) # rbf kernel
c_svc_rbf.fit(X_train, y_train)
# Predict
y_pred_rbf = c_svc_rbf.predict(sc.transform(X_test))
Naive Bayes
- Works best when there are two independent variables
- Not a linear classifier
- Not biased by outliers
- Requires Feature Scaling
- Uses probabilistic approach
- Useful for ranking predictions by their probability
Sample Code
Gaussian Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
c_gnb = GaussianNB()
c_gnb.fit(X_train, y_train)
# Predict
y_pred_gnb = c_gnb.predict(sc.transform(X_test))
Decision Tree Classification
- Works with both linear and non linear problems
- Preferred for better interpretability
- Feature scaling not needed
- Not good with small datasets
- May result in overfitting
Sample Code
from sklearn.tree import DecisionTreeClassifier
c_dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
c_dt.fit(X_train, y_train)
# Predict
y_pred_dt = c_dt.predict(sc.transform(X_test))
Random Forest Classification
- Works with both linear and non linear problems
- Also see Random Forest Regression
Sample Code
from sklearn.ensemble import RandomForestClassifier
c_rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
c_rf.fit(X_train, y_train)
# Predict
y_pred_dt = c_rf.predict(sc.transform(X_test))
XGB Classification
Sample Code
from xgboost import XGBClassifier
c_xgb = XGBClassifier()
c_xgb.fit(X_train, y_train)
y_pred_xgb = c_xgb.predict(X_test)
CatBoost Classification
Sample Code
from catboost import CatBoostClassifier
c_cb = CatBoostClassifier()
c_cb.fit(X_train, y_train)
y_pred_cb = c_cb.predict(X_test)