The AUC from example 1 of the Calf paper

While calfpy yields an auc of 0.875 in example 1 from the Calf paper [1], calfcv produces an auc of 0.82.

Author: Rolf Carlson, Carlson Research LLC, hrolfrc@gmail.com

License: 3-clause BSD

Get the data

[16]:
import pandas as pd
from sklearn.metrics import roc_auc_score
from calfcv import CalfCV, Calf
[17]:
input_file = "../../data/n2.csv"
df = pd.read_csv(input_file, header=0, sep=",")

# The input data is everything except the first column
X = df.loc[:, df.columns != 'ctrl/case']
# The outcome or diagnoses are in the first ctrl/case column
Y = df['ctrl/case']

# The header row is the feature set
features = list(X.columns)

# label the outcomes
Y_names = Y.replace({0: 'non_psychotic', 1: 'pre_psychotic'})

# glmnet requires float64
x = X.to_numpy(dtype='float64')
y = Y.to_numpy(dtype='float64')

Data overview

Here we look at the feature names, number of features, shape, and category balance.

[18]:
features[0:5]
[18]:
['ADIPOQ', 'SERPINA3', 'AMBP', 'A2M', 'ACE']
[19]:
x.size
[19]:
9720
[20]:
x.shape
[20]:
(72, 135)
[21]:
print(list(Y).count(1), list(Y).count(0))
32 40
[22]:
len(y)
[22]:
72

Predict diagnoses

[23]:
y_pred = Calf().fit(x, y).predict_proba(x)
roc_auc_score(y, y_pred[:, 1])
[23]:
0.78359375

The class probabilities predicted by Calf

[24]:
y_pred
[24]:
array([[0.73105858, 0.26894142],
       [0.65967676, 0.34032324],
       [0.67352505, 0.32647495],
       [0.47684355, 0.52315645],
       [0.49403688, 0.50596312],
       [0.63393725, 0.36606275],
       [0.55006102, 0.44993898],
       [0.66281609, 0.33718391],
       [0.62506968, 0.37493032],
       [0.59501702, 0.40498298],
       [0.71191589, 0.28808411],
       [0.55821739, 0.44178261],
       [0.52240219, 0.47759781],
       [0.71537548, 0.28462452],
       [0.6175692 , 0.3824308 ],
       [0.63651384, 0.36348616],
       [0.62584695, 0.37415305],
       [0.55156817, 0.44843183],
       [0.60770431, 0.39229569],
       [0.64037289, 0.35962711],
       [0.44494103, 0.55505897],
       [0.71615345, 0.28384655],
       [0.39693984, 0.60306016],
       [0.52301264, 0.47698736],
       [0.45921654, 0.54078346],
       [0.4121336 , 0.5878664 ],
       [0.62750285, 0.37249715],
       [0.33856961, 0.66143039],
       [0.43123335, 0.56876665],
       [0.59092639, 0.40907361],
       [0.59729627, 0.40270373],
       [0.44895471, 0.55104529],
       [0.41576818, 0.58423182],
       [0.49810242, 0.50189758],
       [0.55482325, 0.44517675],
       [0.44060927, 0.55939073],
       [0.5029    , 0.4971    ],
       [0.58781519, 0.41218481],
       [0.40889401, 0.59110599],
       [0.54580572, 0.45419428],
       [0.26894142, 0.73105858],
       [0.42625083, 0.57374917],
       [0.36021738, 0.63978262],
       [0.52808518, 0.47191482],
       [0.35790598, 0.64209402],
       [0.44528908, 0.55471092],
       [0.49834102, 0.50165898],
       [0.47800375, 0.52199625],
       [0.71257885, 0.28742115],
       [0.59224187, 0.40775813],
       [0.3884741 , 0.6115259 ],
       [0.50976535, 0.49023465],
       [0.51407848, 0.48592152],
       [0.46116416, 0.53883584],
       [0.47545834, 0.52454166],
       [0.56882478, 0.43117522],
       [0.38211921, 0.61788079],
       [0.43890178, 0.56109822],
       [0.5723152 , 0.4276848 ],
       [0.39136154, 0.60863846],
       [0.41389145, 0.58610855],
       [0.36282398, 0.63717602],
       [0.40762356, 0.59237644],
       [0.36091886, 0.63908114],
       [0.36917046, 0.63082954],
       [0.4016057 , 0.5983943 ],
       [0.45714141, 0.54285859],
       [0.38248224, 0.61751776],
       [0.54324483, 0.45675517],
       [0.43954232, 0.56045768],
       [0.43985635, 0.56014365],
       [0.5057918 , 0.4942082 ]])
[25]:
y_pred = CalfCV().fit(x, y).predict_proba(x)
roc_auc_score(y, y_pred[:, 1])
[25]:
0.8242187500000001

The classe probabilities predicted by CalfCV

[26]:
y_pred
[26]:
array([[0.57255396, 0.42744604],
       [0.57190911, 0.42809089],
       [0.49905191, 0.50094809],
       [0.37396104, 0.62603896],
       [0.43673843, 0.56326157],
       [0.42049986, 0.57950014],
       [0.58727104, 0.41272896],
       [0.73105858, 0.26894142],
       [0.55325599, 0.44674401],
       [0.53275445, 0.46724555],
       [0.60855296, 0.39144704],
       [0.71632012, 0.28367988],
       [0.47160497, 0.52839503],
       [0.6000901 , 0.3999099 ],
       [0.5182401 , 0.4817599 ],
       [0.61569573, 0.38430427],
       [0.44520824, 0.55479176],
       [0.61067857, 0.38932143],
       [0.43740406, 0.56259594],
       [0.50806646, 0.49193354],
       [0.45940988, 0.54059012],
       [0.5844512 , 0.4155488 ],
       [0.6167177 , 0.3832823 ],
       [0.45002242, 0.54997758],
       [0.48864786, 0.51135214],
       [0.51369577, 0.48630423],
       [0.37743322, 0.62256678],
       [0.45003696, 0.54996304],
       [0.61891498, 0.38108502],
       [0.5935698 , 0.4064302 ],
       [0.45187866, 0.54812134],
       [0.57869656, 0.42130344],
       [0.59731614, 0.40268386],
       [0.54665274, 0.45334726],
       [0.68688645, 0.31311355],
       [0.44670906, 0.55329094],
       [0.4693245 , 0.5306755 ],
       [0.38162746, 0.61837254],
       [0.37643876, 0.62356124],
       [0.57096094, 0.42903906],
       [0.51437333, 0.48562667],
       [0.38786473, 0.61213527],
       [0.46310646, 0.53689354],
       [0.39080126, 0.60919874],
       [0.43044281, 0.56955719],
       [0.39303232, 0.60696768],
       [0.42280034, 0.57719966],
       [0.51340885, 0.48659115],
       [0.26894142, 0.73105858],
       [0.40361923, 0.59638077],
       [0.36135889, 0.63864111],
       [0.39733642, 0.60266358],
       [0.33561026, 0.66438974],
       [0.3574579 , 0.6425421 ],
       [0.52855035, 0.47144965],
       [0.44461878, 0.55538122],
       [0.42918205, 0.57081795],
       [0.36705095, 0.63294905],
       [0.45142257, 0.54857743],
       [0.34602883, 0.65397117],
       [0.39609809, 0.60390191],
       [0.45418881, 0.54581119],
       [0.29541473, 0.70458527],
       [0.53079061, 0.46920939],
       [0.45560206, 0.54439794],
       [0.36134196, 0.63865804],
       [0.34390344, 0.65609656],
       [0.59190982, 0.40809018],
       [0.36853881, 0.63146119],
       [0.41398225, 0.58601775],
       [0.4007513 , 0.5992487 ],
       [0.49950364, 0.50049636]])
[27]:
y_pred = Calf().fit(x, y).predict(x)
roc_auc_score(y, y_pred)
[27]:
0.696875

The classes predicted by Calf

[28]:
y_pred
[28]:
array([0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1.,
       0., 1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1.,
       0., 0., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       0., 1., 1., 0.])
[29]:
y_pred = CalfCV().fit(x, y).predict(x)
roc_auc_score(y, y_pred)
[29]:
0.709375

The classes predicted by CalfCV

[30]:
y_pred
[30]:
array([0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
       0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0.,
       0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1.,
       1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0.,
       1., 1., 1., 1.])

References:

[1] Jeffries, C.D., Ford, J.R., Tilson, J.L. et al. A greedy regression algorithm with coarse weights offers novel advantages. Sci Rep 12, 5440 (2022). https://doi.org/10.1038/s41598-022-09415-2