Skip to content

Classification

Model Types

Logistic Regression

  • Predict a categorical dependent variable from a set of independent variables
  • It is a linear classifier
  • Requires Feature Scaling
  • Uses probabilistic approach
    • Useful for ranking predictions by their probability
    • Provides info on statistical significance of features

Sample Code

Logistic Regression

from sklearn.linear_model import LogisticRegression
c_lr = LogisticRegression(random_state = 0)
c_lr.fit(X_train, y_train)
y_pred = c_lr.predict(sc.transform(X_test))

K-Nearest Neighbor

  • Predict which category a random point falls in
  • Not a linear classifier
  • Requires Feature Scaling

Sample Code

K Neighbors Classifier

from sklearn.neighbors import KNeighborsClassifier
c_knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
c_knn.fit(X_train, y_train)
y_pred = c_knn.predict(sc.transform(X_test))

Support Vector Machine (SVM)

  • Works with both linear and non linear problems
  • Requires Feature Scaling
  • Not preferred for large number of features
  • Linear SVM
    • Not preferred for non linear problems
  • Kernel SVM
    • High performance on non linear problems
    • Not biased by outliers
    • Not sensitive to overfitting
  • Good for segmentation use cases

Sample Code

Support Vector Classifier

from sklearn.svm import SVC
c_svc_rbf = SVC(kernel = 'rbf', random_state=0)   # rbf kernel
c_svc_rbf.fit(X_train, y_train)
# Predict
y_pred_rbf = c_svc_rbf.predict(sc.transform(X_test))

Naive Bayes

  • Works best when there are two independent variables
  • Not a linear classifier
  • Not biased by outliers
  • Requires Feature Scaling
  • Uses probabilistic approach
    • Useful for ranking predictions by their probability

Sample Code

Gaussian Naive Bayes Classifier

from sklearn.naive_bayes import GaussianNB
c_gnb = GaussianNB()
c_gnb.fit(X_train, y_train)
# Predict
y_pred_gnb = c_gnb.predict(sc.transform(X_test))

Decision Tree Classification

  • Works with both linear and non linear problems
  • Preferred for better interpretability
  • Feature scaling not needed
  • Not good with small datasets
    • May result in overfitting

Sample Code

Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier
c_dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
c_dt.fit(X_train, y_train)
# Predict
y_pred_dt = c_dt.predict(sc.transform(X_test))

Random Forest Classification

Sample Code

Decision Tree Classifier

from sklearn.ensemble import RandomForestClassifier
c_rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
c_rf.fit(X_train, y_train)
# Predict
y_pred_dt = c_rf.predict(sc.transform(X_test))
```

XGB Classification

Sample Code

XGBoost Classifier

from xgboost import XGBClassifier
c_xgb = XGBClassifier()
c_xgb.fit(X_train, y_train)
y_pred_xgb = c_xgb.predict(X_test)

CatBoost Classification

Sample Code

CatBoost Classifier

from catboost import CatBoostClassifier
c_cb = CatBoostClassifier()
c_cb.fit(X_train, y_train)
y_pred_cb = c_cb.predict(X_test)