The N2 dataset¶
The N2 dataset provides features that are hypothesized to be informative for a progression to psychosis. This notebook provides an overview of N2.
[ ]:
# Author: Rolf Carlson, Carlson Research LLC, <hrolfrc@gmail.com>
# License: 3-clause BSD
[13]:
import pandas
Get the input file path from the calf project
[14]:
input_file_path = "../../../data/n2.csv"
Read the input file into a DataFrame
[15]:
df = pandas.read_csv(input_file_path, header=0, sep=",")
df.head()
[15]:
| ctrl/case | ADIPOQ | SERPINA3 | AMBP | A2M | ACE | AGT | APOA1 | APOA2 | APOA4 | ... | CALCA | IL6 | LTA | CSF3 | PGF | GCG_0001 | IL1B | TGFB3 | FGF2 | MDA-LDL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1.1538 | -1.008 | 0.4650 | -0.6181 | -0.9350 | 1.7169 | 0.974 | 1.7821 | -0.2580 | ... | -0.3688 | 0.8739 | -0.2390 | 2.8335 | 0.4469 | 0.101 | 0.1688 | -0.1861 | 1.9591 | -0.0720 |
| 1 | 0 | -0.7661 | -1.039 | 1.2479 | 0.2220 | -0.7140 | 2.6709 | -0.275 | 0.1680 | 0.9759 | ... | -0.3688 | 1.5408 | 3.5482 | -0.6669 | -0.7770 | 1.015 | 0.1688 | 0.2689 | -0.3498 | -0.5491 |
| 2 | 0 | -0.2721 | -0.766 | -0.7480 | -1.0371 | 0.0459 | -0.2940 | 0.046 | 0.9320 | -0.5050 | ... | 0.2562 | 0.2060 | -0.8099 | -0.6669 | -0.7770 | 0.372 | -0.5152 | -0.1861 | -0.3498 | -0.5491 |
| 3 | 0 | -0.8201 | -1.281 | 0.4650 | -0.1980 | 0.8059 | 0.8980 | -0.954 | -0.4270 | -0.5050 | ... | -0.3688 | -0.5718 | -0.8099 | -0.6669 | -0.5050 | -0.543 | 0.1688 | 0.2689 | -0.5978 | -0.5491 |
| 4 | 0 | 0.0019 | -1.188 | -0.7090 | 0.6421 | 0.2039 | -0.3500 | -0.275 | 0.5070 | 0.2349 | ... | 0.0612 | 1.8737 | 1.0388 | -0.6669 | 0.4469 | 0.575 | -0.2332 | -0.4131 | 0.4472 | -0.5491 |
5 rows × 136 columns
Remove the outcome column to get the independent variables
[16]:
X = df.loc[:, df.columns != 'ctrl/case']
X.head()
[16]:
| ADIPOQ | SERPINA3 | AMBP | A2M | ACE | AGT | APOA1 | APOA2 | APOA4 | APOH | ... | CALCA | IL6 | LTA | CSF3 | PGF | GCG_0001 | IL1B | TGFB3 | FGF2 | MDA-LDL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.1538 | -1.008 | 0.4650 | -0.6181 | -0.9350 | 1.7169 | 0.974 | 1.7821 | -0.2580 | 2.6529 | ... | -0.3688 | 0.8739 | -0.2390 | 2.8335 | 0.4469 | 0.101 | 0.1688 | -0.1861 | 1.9591 | -0.0720 |
| 1 | -0.7661 | -1.039 | 1.2479 | 0.2220 | -0.7140 | 2.6709 | -0.275 | 0.1680 | 0.9759 | 0.4610 | ... | -0.3688 | 1.5408 | 3.5482 | -0.6669 | -0.7770 | 1.015 | 0.1688 | 0.2689 | -0.3498 | -0.5491 |
| 2 | -0.2721 | -0.766 | -0.7480 | -1.0371 | 0.0459 | -0.2940 | 0.046 | 0.9320 | -0.5050 | -0.0990 | ... | 0.2562 | 0.2060 | -0.8099 | -0.6669 | -0.7770 | 0.372 | -0.5152 | -0.1861 | -0.3498 | -0.5491 |
| 3 | -0.8201 | -1.281 | 0.4650 | -0.1980 | 0.8059 | 0.8980 | -0.954 | -0.4270 | -0.5050 | 1.3320 | ... | -0.3688 | -0.5718 | -0.8099 | -0.6669 | -0.5050 | -0.543 | 0.1688 | 0.2689 | -0.5978 | -0.5491 |
| 4 | 0.0019 | -1.188 | -0.7090 | 0.6421 | 0.2039 | -0.3500 | -0.275 | 0.5070 | 0.2349 | -0.6120 | ... | 0.0612 | 1.8737 | 1.0388 | -0.6669 | 0.4469 | 0.575 | -0.2332 | -0.4131 | 0.4472 | -0.5491 |
5 rows × 135 columns
[17]:
# computing number of rows
rows = len(X.axes[0])
# computing number of columns
cols = len(X.axes[1])
print("Number of Rows (data points): ", rows)
print("Number of Columns (features or variables): ", cols)
Number of Rows (data points): 72
Number of Columns (features or variables): 135
[18]:
Y = df['ctrl/case']
Y represents whether the individuals became psychotic (1) or not (0). Y is a Pandas series.
[19]:
Y.head()
[19]:
0 0
1 0
2 0
3 0
4 0
Name: ctrl/case, dtype: int64
[20]:
Y.describe()
[20]:
count 72.000000
mean 0.444444
std 0.500391
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 1.000000
Name: ctrl/case, dtype: float64
The individuals who did not progress to psychosis are labeled non_psychotic.
[21]:
non_psychotic = Y[Y == 0]
non_psychotic.head()
[21]:
0 0
1 0
2 0
3 0
4 0
Name: ctrl/case, dtype: int64
The individuals who progressed to psychosis are labeled pre_psychotic.
[22]:
pre_psychotic = Y[Y == 1]
[23]:
pre_psychotic.head()
[23]:
40 1
41 1
42 1
43 1
44 1
Name: ctrl/case, dtype: int64
[24]:
Y_names = Y.replace({0: 'non_psychotic', 1: 'pre_psychotic'})
Y_names
[24]:
0 non_psychotic
1 non_psychotic
2 non_psychotic
3 non_psychotic
4 non_psychotic
...
67 pre_psychotic
68 pre_psychotic
69 pre_psychotic
70 pre_psychotic
71 pre_psychotic
Name: ctrl/case, Length: 72, dtype: object