Python – 如何按月份对Pandas DataFrame进行分组?
我们将使用 groupby 对Pandas DataFrame进行分组。使用grouper函数选择要使用的列。我们将按月份分组,用于我们在汽车销售记录中显示的示例每月计算“注册价格”的总和。
首先,假设以下是我们的Pandas DataFrame,有三列 –
dataFrame=pd.DataFrame(
{
"Car":["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley",
"Mustang"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"), pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"), pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"), pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"), pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
Python
使用Grouper选择groupby()函数中的 Date_of_Purchase 列,将频率 freq 设置为 “M” 以按月份分组-
print("\n按月分组数据框...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum())
Python
更多Pandas文章,请阅读:Pandas教程
例子
以下是代码-
import pandas as pd
# dataframe 中其中一列为 Date_of_Purchase
dataFrame=pd.DataFrame(
{
"Car":["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley",
"Mustang"],
"Date_of_Purchase": [pd.Timestamp("2021-06-10"), pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"), pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"), pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"), pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
print("数据框...\n",dataFrame)
# Grouper 通过 groupby 函数选择 Date_of_Purchase 列
print("\n按月份对数据框进行分组...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum())
Python
输出
将产生以下输出。计算每个月的注册价格 –
DataFrame...
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2021-07-11 1400
2 Tesla 2021-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2021-03-20 1700
5 Toyota 2021-01-22 1800
6 Nissan 2021-01-06 1300
7 Bentley 2021-01-04 1150
8 Mustang 2021-05-09 1350
按月份对数据框进行分组...
Reg_Price
Date_of_Purchase
2021-01-31 4250.0
2021-02-28 NaN
2021-03-31 1700.0
2021-04-30 NaN
2021-05-31 1350.0
2021-06-30 3000.0
2021-07-31 1400.0
Python