User Guide¶
Author: Rolf Carlson, Carlson Research LLC, hrolfrc@gmail.com, License: 3-clause BSD
Make a classification problem¶
[32]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn.metrics import roc_auc_score
from calfcv import CalfCV
[33]:
seed = 45
X, y = make_classification(
n_samples=100,
n_features=5,
n_informative=2,
n_redundant=2,
n_classes=2,
random_state=seed
)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=seed)
Train and predict¶
[34]:
cls = CalfCV().fit(X_train, y_train)
The score for unseen data¶
[35]:
cls.score(X_test, y_test)
[35]:
0.92
Class probabilities¶
We vertically stack the ground truth on the top with the probabilities of class 1 on the bottom. The first five entries are shown.
[36]:
np.round(np.vstack((y_train, cls.predict_proba(X_train).T))[:, 0:5], 2)
[36]:
array([[1. , 1. , 0. , 0. , 0. ],
[0.35, 0.49, 0.73, 0.65, 0.59],
[0.65, 0.51, 0.27, 0.35, 0.41]])
[37]:
roc_auc_score(y_true=y_train, y_score=cls.predict_proba(X_train)[:, 1])
[37]:
0.968705547652916
Predict the classes¶
The ground truth is on the top and the predicted classes are on the bottom. The first five entries are shown.
[38]:
y_pred = cls.predict(X_test)
np.vstack((y_test, y_pred))[:, 0:5]
[38]:
array([[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0]])
The class prediction is expected to be lower than the probability prediction.
[39]:
roc_auc_score(y_true=y_test, y_score=y_pred)
[39]:
0.9198717948717948