Python – 如何按天分组Pandas DataFrame?
我们将使用groupby()函数对Pandas DataFrame进行分组。使用grouper函数选择要使用的列。我们将以按天进行分组,并以我们的示例中的日期间隔计算注册价格总和,用于汽车销售记录。
在groupby()函数中的grouper方法中设置频率为日的间隔,这意味着,如果频率为7D,那么这将意味着每个月以7天为间隔对数据进行分组,直到日期列中给出的最后日期。
首先,让我们假设以下是我们的Pandas DataFrame,其中有三列 –
import pandas as pd
# dataframe with one of the columns as Date_of_Purchase
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
Python
接下来,使用Grouper在groupby函数中选择Date_of_Purchase列。将频率设置为7D,即每7天分组一次,直到列中提到的最后日期 –
print"\n按7天分组数据框...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='7D')).sum()
Python
示例
以下是代码
import pandas as pd
# dataframe with one of the columns as Date_of_Purchase
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
}
)
print"DataFrame...\n",dataFrame
# Grouper to select Date_of_Purchase column within groupby function
print("\n按7天分组数据框...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='7D')).sum()
)
Python
输出
这将产生以下输出 –
DataFrame...
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2021-07-11 1400
2 Tesla 2021-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2021-03-20 1700
5 Toyota 2021-01-22 1800
6 Nissan 2021-01-06 1300
7 Bentley 2021-01-04 1150
8 Mustang 2021-05-09 1350
按7天分组数据框...
Reg_Price
Date_of_Purchase
2021-01-04 2450.0
2021-01-11 NaN
2021-01-18 1800.0
2021-01-25 NaN
2021-02-01 NaN
2021-02-08 NaN
2021-02-15 NaN
2021-02-22 NaN
2021-03-01 NaN
2021-03-08 NaN
2021-03-15 1700.0
2021-03-22 NaN
2021-03-29 NaN
2021-04-05 NaN
2021-04-12 NaN
2021-04-19 NaN
2021-04-26 NaN
2021-05-03 1350.0
2021-05-10 NaN
2021-05-17 NaN
2021-05-24 NaN
2021-05-31 NaN
2021-06-07 1000.0
2021-06-14 NaN
2021-06-21 1100.0
2021-06-28 900.0
2021-07-05 1400.0
Python