Pandas 查看数据

Pandas 查看数据,本章介绍如何查看DataFrame顶部和尾部的数据;显示DataFrame索引和列名;DataFrame类型转换成 array;describe() 方法显示数据的快速统计信息;转置数据,按轴排序和按值排序。

显示 DataFrame 顶部和尾部的数据

Pandas DataFrame 属性和方法有简单说明,示例如下:

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print("显示DataFrame前两行:")
print(df.head(2))
print("显示DataFrame最后两行:")
print(df.tail(2))

输入结果如下:

                   A         B         C         D
2013-01-01 -0.581774  1.586989  1.567304  0.453327
2013-01-02  1.007445 -0.147305 -1.265887 -2.202020
2013-01-03  0.823082 -0.225769 -0.980484 -0.009522
2013-01-04 -1.842688 -0.541232  0.353693 -0.266758
2013-01-05 -0.777706 -0.948812 -0.089518 -1.362889
2013-01-06 -0.558783  0.816770 -1.685613 -1.257817
显示DataFrame前两行:
                   A         B         C         D
2013-01-01 -0.581774  1.586989  1.567304  0.453327
2013-01-02  1.007445 -0.147305 -1.265887 -2.202020
显示DataFrame最后两行:
                   A         B         C         D
2013-01-05 -0.777706 -0.948812 -0.089518 -1.362889
2013-01-06 -0.558783  0.816770 -1.685613 -1.257817

显示 DataFrame 索引和列名

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print("\n")
print("显示DataFrame的索引:")
print(df.index)
print("\n")
print("显示DataFrame的列名:")
print(df.columns)

输出结果如下:

                   A         B         C         D
2013-01-01  1.356387  1.301684 -1.805471  0.207932
2013-01-02 -0.432469  0.301927 -0.781012 -0.181690
2013-01-03  0.394773  1.378638 -1.212205 -0.099059
2013-01-04 -0.882216 -0.970615  0.025216 -0.569211
2013-01-05  2.869555 -1.559304  1.344623  1.029604
2013-01-06 -0.472778  0.963512  0.763589  0.519504


显示DataFrame的索引:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')


显示DataFrame的列名:
Index(['A', 'B', 'C', 'D'], dtype='object')

DataFrame类型转换成array

在用pandas包和numpy包对数据进行分析和计算时,经常用到DataFrame和array类型的数据。在对DataFrame类型的数据进行处理时,需要将其转换成array类型,如下所示:

import numpy as np
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)
print('------------')
print(df.values)
print('------------')
print(np.array(df))

输出结果如下:
Pandas 查看数据

describe() 函数

describe() 方法显示数据的快速统计信息:

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.describe())

输出结果如下:

                   A         B         C         D
2013-01-01  0.957576  0.158356  0.599062  1.058058
2013-01-02 -0.594968 -3.091030 -0.740165  2.431990
2013-01-03 -0.992010 -1.896315 -1.655470 -0.624458
2013-01-04  0.690256 -0.708464 -0.710131 -0.686201
2013-01-05 -0.661400 -0.201837 -1.930525  0.349621
2013-01-06  0.448648 -0.438307  0.744847 -1.624868
              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean  -0.025316 -1.029599 -0.615397  0.150690
std    0.820532  1.228811  1.110048  1.450591
min   -0.992010 -3.091030 -1.930525 -1.624868
25%   -0.644792 -1.599352 -1.426644 -0.670765
50%   -0.073160 -0.573385 -0.725148 -0.137418
75%    0.629854 -0.260954  0.271764  0.880949
max    0.957576  0.158356  0.744847  2.431990

转置数据

Pandas 旋转数据有简单说明,示例如下:

print(df.T)

输出结果如下:

   2013-01-01  2013-01-02     ...      2013-01-05  2013-01-06
A    0.957576   -0.594968     ...       -0.661400    0.448648
B    0.158356   -3.091030     ...       -0.201837   -0.438307
C    0.599062   -0.740165     ...       -1.930525    0.744847
D    1.058058    2.431990     ...        0.349621   -1.624868

[4 rows x 6 columns]

按轴排序

Pandas 排序有简单说明,示例如下:

print(df.sort_index(axis=1, ascending=False))

输出结果如下:

                   D         C         B         A
2013-01-01  1.058058  0.599062  0.158356  0.957576
2013-01-02  2.431990 -0.740165 -3.091030 -0.594968
2013-01-03 -0.624458 -1.655470 -1.896315 -0.992010
2013-01-04 -0.686201 -0.710131 -0.708464  0.690256
2013-01-05  0.349621 -1.930525 -0.201837 -0.661400
2013-01-06 -1.624868  0.744847 -0.438307  0.448648

按值排序

print(df.sort_values(by='B'))

输出结果如下:

                   A         B         C         D
2013-01-02 -0.594968 -3.091030 -0.740165  2.431990
2013-01-03 -0.992010 -1.896315 -1.655470 -0.624458
2013-01-04  0.690256 -0.708464 -0.710131 -0.686201
2013-01-06  0.448648 -0.438307  0.744847 -1.624868
2013-01-05 -0.661400 -0.201837 -1.930525  0.349621
2013-01-01  0.957576  0.158356  0.599062  1.058058

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程