Pandas 查看数据,本章介绍如何查看DataFrame
顶部和尾部的数据;显示DataFrame索引和列名;DataFrame类型转换成 array;describe()
方法显示数据的快速统计信息;转置数据,按轴排序和按值排序。
显示 DataFrame 顶部和尾部的数据
Pandas DataFrame 属性和方法有简单说明,示例如下:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print("显示DataFrame前两行:")
print(df.head(2))
print("显示DataFrame最后两行:")
print(df.tail(2))
输入结果如下:
A B C D
2013-01-01 -0.581774 1.586989 1.567304 0.453327
2013-01-02 1.007445 -0.147305 -1.265887 -2.202020
2013-01-03 0.823082 -0.225769 -0.980484 -0.009522
2013-01-04 -1.842688 -0.541232 0.353693 -0.266758
2013-01-05 -0.777706 -0.948812 -0.089518 -1.362889
2013-01-06 -0.558783 0.816770 -1.685613 -1.257817
显示DataFrame前两行:
A B C D
2013-01-01 -0.581774 1.586989 1.567304 0.453327
2013-01-02 1.007445 -0.147305 -1.265887 -2.202020
显示DataFrame最后两行:
A B C D
2013-01-05 -0.777706 -0.948812 -0.089518 -1.362889
2013-01-06 -0.558783 0.816770 -1.685613 -1.257817
显示 DataFrame 索引和列名
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print("\n")
print("显示DataFrame的索引:")
print(df.index)
print("\n")
print("显示DataFrame的列名:")
print(df.columns)
输出结果如下:
A B C D
2013-01-01 1.356387 1.301684 -1.805471 0.207932
2013-01-02 -0.432469 0.301927 -0.781012 -0.181690
2013-01-03 0.394773 1.378638 -1.212205 -0.099059
2013-01-04 -0.882216 -0.970615 0.025216 -0.569211
2013-01-05 2.869555 -1.559304 1.344623 1.029604
2013-01-06 -0.472778 0.963512 0.763589 0.519504
显示DataFrame的索引:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
显示DataFrame的列名:
Index(['A', 'B', 'C', 'D'], dtype='object')
DataFrame类型转换成array
在用pandas包和numpy包对数据进行分析和计算时,经常用到DataFrame和array类型的数据。在对DataFrame类型的数据进行处理时,需要将其转换成array类型,如下所示:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)
print('------------')
print(df.values)
print('------------')
print(np.array(df))
输出结果如下:
describe() 函数
describe()
方法显示数据的快速统计信息:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.describe())
输出结果如下:
A B C D
2013-01-01 0.957576 0.158356 0.599062 1.058058
2013-01-02 -0.594968 -3.091030 -0.740165 2.431990
2013-01-03 -0.992010 -1.896315 -1.655470 -0.624458
2013-01-04 0.690256 -0.708464 -0.710131 -0.686201
2013-01-05 -0.661400 -0.201837 -1.930525 0.349621
2013-01-06 0.448648 -0.438307 0.744847 -1.624868
A B C D
count 6.000000 6.000000 6.000000 6.000000
mean -0.025316 -1.029599 -0.615397 0.150690
std 0.820532 1.228811 1.110048 1.450591
min -0.992010 -3.091030 -1.930525 -1.624868
25% -0.644792 -1.599352 -1.426644 -0.670765
50% -0.073160 -0.573385 -0.725148 -0.137418
75% 0.629854 -0.260954 0.271764 0.880949
max 0.957576 0.158356 0.744847 2.431990
转置数据
Pandas 旋转数据有简单说明,示例如下:
print(df.T)
输出结果如下:
2013-01-01 2013-01-02 ... 2013-01-05 2013-01-06
A 0.957576 -0.594968 ... -0.661400 0.448648
B 0.158356 -3.091030 ... -0.201837 -0.438307
C 0.599062 -0.740165 ... -1.930525 0.744847
D 1.058058 2.431990 ... 0.349621 -1.624868
[4 rows x 6 columns]
按轴排序
Pandas 排序有简单说明,示例如下:
print(df.sort_index(axis=1, ascending=False))
输出结果如下:
D C B A
2013-01-01 1.058058 0.599062 0.158356 0.957576
2013-01-02 2.431990 -0.740165 -3.091030 -0.594968
2013-01-03 -0.624458 -1.655470 -1.896315 -0.992010
2013-01-04 -0.686201 -0.710131 -0.708464 0.690256
2013-01-05 0.349621 -1.930525 -0.201837 -0.661400
2013-01-06 -1.624868 0.744847 -0.438307 0.448648
按值排序
print(df.sort_values(by='B'))
输出结果如下:
A B C D
2013-01-02 -0.594968 -3.091030 -0.740165 2.431990
2013-01-03 -0.992010 -1.896315 -1.655470 -0.624458
2013-01-04 0.690256 -0.708464 -0.710131 -0.686201
2013-01-06 0.448648 -0.438307 0.744847 -1.624868
2013-01-05 -0.661400 -0.201837 -1.930525 0.349621
2013-01-01 0.957576 0.158356 0.599062 1.058058