Numpy 将SKLearn癌症数据集加载到Pandas DataFrame中

Numpy 将SKLearn癌症数据集加载到Pandas DataFrame中

在本文中,我们将介绍如何使用Numpy将SKLearn的癌症数据集加载到Pandas DataFrame中。

阅读更多:Numpy 教程

什么是SKLearn的癌症数据集?

SKLearn的癌症数据集是一个二分类问题,包括569个样本和30个特征。样本被标记为患有良性肿瘤或恶性肿瘤。数据集的特征由图像处理技术提取,包括平均半径、纹理、周长、面积、平滑度等。

导入所需的库

在开始之前,先导入所需的库。我们将使用Numpy导入数据,并使用Pandas DataFrame进行数据处理和分析。

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer

加载数据集

使用SKLearn库中的load_breast_cancer函数,我们将癌症数据集加载到Numpy数组中。

data = load_breast_cancer()
cancer = np.c_[data.target, data.data]
columns = np.append(['target'], data.feature_names)

上面的代码使用np.c_函数将目标值和特征值合并为一个数组,并使用np.append函数添加列名。

创建Pandas DataFrame

已经将数据加载到Numpy数组中了,现在将其转换为Pandas DataFrame。我们可以使用pd.DataFrame函数将数据转换为数据框。

df = pd.DataFrame(cancer, columns=columns)

现在,我们将数据集作为Pandas DataFrame进行操作和分析。

分析数据集

我们可以使用Pandas DataFrame函数来查看整个数据集,其中包括数据的简要统计信息、前几行等。

df.head()

输出结果:

   target  mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  mean fractal dimension  radius error  texture error  perimeter error  area error  smoothness error  compactness error  concavity error  concave points error  symmetry error  fractal dimension error  worst radius  worst texture  worst perimeter  worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension
0     0.0        17.99         10.38          122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419                 0.07871        1.0950         0.9053           8.5890     153.40          0.006399           0.04904         0.05373              0.01587         0.03003                 0.006193        25.38          17.33           184.60      2019.0           0.1622            0.6656          0.7119               0.2654          0.4601                  0.11890
1     0.0        20.57         17.77          132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812                 0.05667        0.5435         0.7339           3.3980      74.08          0.005225           0.01308         0.01860              0.01340         0.01389                 0.003532        24.99          23.41           158.80      1956.0           0.1238            0.1866          0.2416               0.1860          0.2750                  0.08902
2     0.0        19.69         21.25          130.00     1203.0          0.10960           0.15990         0.19740              0.12790   mean fractal dimension  radius error  texture error  perimeter error  \
0                 0.006193        1.0950         0.9053           8.5890   
1                 0.003532        0.5435         0.7339           3.3980   
2                 0.004571        0.7456         0.7869           4.5850   
3                 0.009208        0.4956         1.1560           3.4450   
4                 0.005115        0.7572         0.7813           5.4380   

   area error  smoothness error  compactness error  concavity error  \
0      153.40          0.006399            0.04904          0.05373   
1       74.08          0.005225            0.01308          0.01860   
2       94.03          0.006150            0.04006          0.03832   
3       27.23          0.009110            0.07458          0.05661   
4       94.44          0.011490            0.02461          0.05688   

   concave points error  symmetry error  fractal dimension error  worst radius  \
0               0.01587         0.03003                 0.006193         25.38   
1               0.01340         0.01389                 0.003532         24.99   
2               0.02058         0.02250                 0.004571         23.57   
3               0.01867         0.05963                 0.009208         14.91   
4               0.01885         0.01756                 0.005115         22.54   

   worst texture  worst perimeter  worst area  worst smoothness  \
0          17.33           184.60      2019.0            0.1622   
1          23.41           158.80      1956.0            0.1238   
2          25.53           152.50      1709.0            0.1444   
3          26.50            98.87       567.7            0.2098   
4          16.67           152.20      1575.0            0.1374   

   worst compactness  worst concavity  worst concave points  worst symmetry  \
0             0.6656           0.7119                0.2654          0.4601   
1             0.1866           0.2416                0.1860          0.2750   
2             0.4245           0.4504                0.2430          0.3613   
3             0.8663           0.6869                0.2575          0.6638   
4             0.2050           0.4000                0.1625          0.2364   

   worst fractal dimension  
0                  0.11890  
1                  0.08902  
2                  0.08758  
3                  0.17300  
4                  0.07678  

我们可以使用describe函数来查看数据集的统计信息。

df.describe()

输出结果:

           target  mean radius  mean texture  mean perimeter    mean area  \
count  569.000000   569.000000    569.000000      569.000000   569.000000   
mean     0.627417    14.127292     19.289649       91.969033   654.889104   
std      0.483918     3.524049      4.301036       24.298981   351.914129   
min      0.000000     6.981000      9.710000       43.790000   143.500000   
25%      0.000000    11.700000     16.170000       75.170000   420.300000   
50%      1.000000    13.370000     18.840000       86.240000   551.100000   
75       target  mean radius  mean texture  mean perimeter    mean area  \
count  569.000000   569.000000    569.000000      569.000000   569.000000   
mean     0.627417    14.127292     19.289649       91.969033   654.889104   
std      0.483918     3.524049      4.301036       24.298981   351.914129   
min      0.000000     6.981000      9.710000       43.790000   143.500000   
25%      0.000000    11.700000     16.170000       75.170000   420.300000   
50%      1.000000    13.370000     18.840000       86.240000   551.100000   
75%      1.000000    15.780000     21.800000      104.100000   782.700000   
max      1.000000    28.110000     39.280000      188.500000  2501.000000   

       mean smoothness  mean compactness  mean concavity  mean concave points  \
count       569.000000        569.000000      569.000000           569.000000   
mean          0.096360          0.104341        0.088799             0.048919   
std           0.014064          0.052813        0.079720             0.038803   
min           0.052630          0.019380        0.000000             0.000000   
25%           0.086370          0.064920        0.029560             0.020310   
50%           0.095870          0.092630        0.061540             0.033500   
75%           0.105300          0.130400        0.130700             0.074000   
max           0.163400          0.345400        0.426800             0.201200   

       mean symmetry  mean fractal dimension  radius error  texture error  \
count     569.000000              569.000000    569.000000     569.000000   
mean        0.181162                0.062798      0.405172       1.216853   
std         0.027414                0.007060      0.277313       0.551648   
min         0.106000                0.049960      0.111500       0.360200   
25%         0.161900                0.057700      0.232400       0.833900   
50%         0.179200                0.061540      0.324200       1.108000   
75%         0.195700                0.066120      0.478900       1.474000   
max         0.304000                0.097440      2.873000       4.885000   

       perimeter error   area error  smoothness error  compactness error  \
count       569.000000   569.000000        569.000000         569.000000   
mean          2.866059    40.337079          0.007041           0.025478   
std           2.021855    45.491006          0.003003           0.017908   
min           0.757000     6.802000          0.001713           0.002252   
25%           1.606000    17.850000          0.005169           0.013080   
50%           2.287000    24.530000          0.006380           0.020450   
75%           3.357000    45.190000          0.008146           0.032450   
max          21.980000   542.200000          0.031130           0.135400   

       concavity error  concave points error  symmetry error  \
count       569.000000            569.000000      569.000000   
mean          0.031CONTINUED...

       concavity error  concave points error  symmetry error  \
count       569.000000            569.000000      569.000000   
mean          0.031894              0.011796        0.020542   
std           0.030186              0.006170        0.008266   
min           0.000000              0.000000        0.007882   
25%           0.015090              0.007638        0.015160   
50%           0.025890              0.010930        0.018730   
75%           0.042050              0.014710        0.023480   
max           0.396000              0.052790        0.078950   

       fractal dimension error  worst radius  worst texture  worst perimeter  \
count               569.000000    569.000000     569.000000       569.000000   
mean                  0.003795     16.269190      25.677223       107.261213   
std                   0.002646      4.833242       6.146258        33.602542   
min                   0.000895      7.930000      12.020000        50.410000   
25%                   0.002248     13.010000      21.080000        84.110000   
50%                   0.003187     14.970000      25.410000        97.660000   
75%                   0.004558     18.790000      29.720000       125.400000   
max                   0.029840     36.040000      49.540000       251.200000   

        worst area  worst smoothness  worst compactness  worst concavity  \
count   569.000000        569.000000         569.000000       569.000000   
mean    880.583128          0.132369           0.254265         0.272188   
std     569.356993          0.022832           0.157336         0.208624   
min     185.200000          0.071170           0.027290         0.000000   
25%     515.300000          0.116600           0.147200         0.114500   
50%     686.500000          0.131300           0.211900         0.226700   
75%    1084.000000          0.146000           0.339100         0.382900   
max    4254.000000          0.222600           1.058000         1.252000   

       worst concave points  worst symmetry  worst fractal dimension  
count            569.000000      569.000000               569.000000  
mean               0.114606        0.290076                 0.083946  
std                0.065732        0.061867                 0.018061  
min                0.000000        0.156500                 0.055040  
25%                0.064930        0.250400                 0.071460  
50%                0.099930        0.282200                 0.080040  
75%                0.161400        0.317900                 0.092080  
max                0.291000        0.663800                 0.207500  

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程