Numpy 将SKLearn癌症数据集加载到Pandas DataFrame中
在本文中,我们将介绍如何使用Numpy将SKLearn的癌症数据集加载到Pandas DataFrame中。
阅读更多:Numpy 教程
什么是SKLearn的癌症数据集?
SKLearn的癌症数据集是一个二分类问题,包括569个样本和30个特征。样本被标记为患有良性肿瘤或恶性肿瘤。数据集的特征由图像处理技术提取,包括平均半径、纹理、周长、面积、平滑度等。
导入所需的库
在开始之前,先导入所需的库。我们将使用Numpy导入数据,并使用Pandas DataFrame进行数据处理和分析。
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
加载数据集
使用SKLearn库中的load_breast_cancer函数,我们将癌症数据集加载到Numpy数组中。
data = load_breast_cancer()
cancer = np.c_[data.target, data.data]
columns = np.append(['target'], data.feature_names)
上面的代码使用np.c_函数将目标值和特征值合并为一个数组,并使用np.append函数添加列名。
创建Pandas DataFrame
已经将数据加载到Numpy数组中了,现在将其转换为Pandas DataFrame。我们可以使用pd.DataFrame函数将数据转换为数据框。
df = pd.DataFrame(cancer, columns=columns)
现在,我们将数据集作为Pandas DataFrame进行操作和分析。
分析数据集
我们可以使用Pandas DataFrame函数来查看整个数据集,其中包括数据的简要统计信息、前几行等。
df.head()
输出结果:
target mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension radius error texture error perimeter error area error smoothness error compactness error concavity error concave points error symmetry error fractal dimension error worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
0 0.0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.30010 0.14710 0.2419 0.07871 1.0950 0.9053 8.5890 153.40 0.006399 0.04904 0.05373 0.01587 0.03003 0.006193 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
1 0.0 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.08690 0.07017 0.1812 0.05667 0.5435 0.7339 3.3980 74.08 0.005225 0.01308 0.01860 0.01340 0.01389 0.003532 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
2 0.0 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.19740 0.12790 mean fractal dimension radius error texture error perimeter error \
0 0.006193 1.0950 0.9053 8.5890
1 0.003532 0.5435 0.7339 3.3980
2 0.004571 0.7456 0.7869 4.5850
3 0.009208 0.4956 1.1560 3.4450
4 0.005115 0.7572 0.7813 5.4380
area error smoothness error compactness error concavity error \
0 153.40 0.006399 0.04904 0.05373
1 74.08 0.005225 0.01308 0.01860
2 94.03 0.006150 0.04006 0.03832
3 27.23 0.009110 0.07458 0.05661
4 94.44 0.011490 0.02461 0.05688
concave points error symmetry error fractal dimension error worst radius \
0 0.01587 0.03003 0.006193 25.38
1 0.01340 0.01389 0.003532 24.99
2 0.02058 0.02250 0.004571 23.57
3 0.01867 0.05963 0.009208 14.91
4 0.01885 0.01756 0.005115 22.54
worst texture worst perimeter worst area worst smoothness \
0 17.33 184.60 2019.0 0.1622
1 23.41 158.80 1956.0 0.1238
2 25.53 152.50 1709.0 0.1444
3 26.50 98.87 567.7 0.2098
4 16.67 152.20 1575.0 0.1374
worst compactness worst concavity worst concave points worst symmetry \
0 0.6656 0.7119 0.2654 0.4601
1 0.1866 0.2416 0.1860 0.2750
2 0.4245 0.4504 0.2430 0.3613
3 0.8663 0.6869 0.2575 0.6638
4 0.2050 0.4000 0.1625 0.2364
worst fractal dimension
0 0.11890
1 0.08902
2 0.08758
3 0.17300
4 0.07678
我们可以使用describe函数来查看数据集的统计信息。
df.describe()
输出结果:
target mean radius mean texture mean perimeter mean area \
count 569.000000 569.000000 569.000000 569.000000 569.000000
mean 0.627417 14.127292 19.289649 91.969033 654.889104
std 0.483918 3.524049 4.301036 24.298981 351.914129
min 0.000000 6.981000 9.710000 43.790000 143.500000
25% 0.000000 11.700000 16.170000 75.170000 420.300000
50% 1.000000 13.370000 18.840000 86.240000 551.100000
75 target mean radius mean texture mean perimeter mean area \
count 569.000000 569.000000 569.000000 569.000000 569.000000
mean 0.627417 14.127292 19.289649 91.969033 654.889104
std 0.483918 3.524049 4.301036 24.298981 351.914129
min 0.000000 6.981000 9.710000 43.790000 143.500000
25% 0.000000 11.700000 16.170000 75.170000 420.300000
50% 1.000000 13.370000 18.840000 86.240000 551.100000
75% 1.000000 15.780000 21.800000 104.100000 782.700000
max 1.000000 28.110000 39.280000 188.500000 2501.000000
mean smoothness mean compactness mean concavity mean concave points \
count 569.000000 569.000000 569.000000 569.000000
mean 0.096360 0.104341 0.088799 0.048919
std 0.014064 0.052813 0.079720 0.038803
min 0.052630 0.019380 0.000000 0.000000
25% 0.086370 0.064920 0.029560 0.020310
50% 0.095870 0.092630 0.061540 0.033500
75% 0.105300 0.130400 0.130700 0.074000
max 0.163400 0.345400 0.426800 0.201200
mean symmetry mean fractal dimension radius error texture error \
count 569.000000 569.000000 569.000000 569.000000
mean 0.181162 0.062798 0.405172 1.216853
std 0.027414 0.007060 0.277313 0.551648
min 0.106000 0.049960 0.111500 0.360200
25% 0.161900 0.057700 0.232400 0.833900
50% 0.179200 0.061540 0.324200 1.108000
75% 0.195700 0.066120 0.478900 1.474000
max 0.304000 0.097440 2.873000 4.885000
perimeter error area error smoothness error compactness error \
count 569.000000 569.000000 569.000000 569.000000
mean 2.866059 40.337079 0.007041 0.025478
std 2.021855 45.491006 0.003003 0.017908
min 0.757000 6.802000 0.001713 0.002252
25% 1.606000 17.850000 0.005169 0.013080
50% 2.287000 24.530000 0.006380 0.020450
75% 3.357000 45.190000 0.008146 0.032450
max 21.980000 542.200000 0.031130 0.135400
concavity error concave points error symmetry error \
count 569.000000 569.000000 569.000000
mean 0.031CONTINUED...
concavity error concave points error symmetry error \
count 569.000000 569.000000 569.000000
mean 0.031894 0.011796 0.020542
std 0.030186 0.006170 0.008266
min 0.000000 0.000000 0.007882
25% 0.015090 0.007638 0.015160
50% 0.025890 0.010930 0.018730
75% 0.042050 0.014710 0.023480
max 0.396000 0.052790 0.078950
fractal dimension error worst radius worst texture worst perimeter \
count 569.000000 569.000000 569.000000 569.000000
mean 0.003795 16.269190 25.677223 107.261213
std 0.002646 4.833242 6.146258 33.602542
min 0.000895 7.930000 12.020000 50.410000
25% 0.002248 13.010000 21.080000 84.110000
50% 0.003187 14.970000 25.410000 97.660000
75% 0.004558 18.790000 29.720000 125.400000
max 0.029840 36.040000 49.540000 251.200000
worst area worst smoothness worst compactness worst concavity \
count 569.000000 569.000000 569.000000 569.000000
mean 880.583128 0.132369 0.254265 0.272188
std 569.356993 0.022832 0.157336 0.208624
min 185.200000 0.071170 0.027290 0.000000
25% 515.300000 0.116600 0.147200 0.114500
50% 686.500000 0.131300 0.211900 0.226700
75% 1084.000000 0.146000 0.339100 0.382900
max 4254.000000 0.222600 1.058000 1.252000
worst concave points worst symmetry worst fractal dimension
count 569.000000 569.000000 569.000000
mean 0.114606 0.290076 0.083946
std 0.065732 0.061867 0.018061
min 0.000000 0.156500 0.055040
25% 0.064930 0.250400 0.071460
50% 0.099930 0.282200 0.080040
75% 0.161400 0.317900 0.092080
max 0.291000 0.663800 0.207500