Python – 如何按年对Pandas DataFrame进行分组?

Python – 如何按年对Pandas DataFrame进行分组?

我们将使用groupby()对PandasDataFrame进行分组。使用grouper函数选择要使用的列。我们将按年分组,并计算年间的注册价格总和,例如我们下面的汽车销售记录示例。

首先,假设以下是我们的PandasDataFrame具有三列-

# dataframe with one of the columns as Date_of_Purchase
dataFrame = pd.DataFrame(
   {
      "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],

      "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
         pd.Timestamp("2019-07-11"),
         pd.Timestamp("2016-06-25"),
         pd.Timestamp("2021-06-29"),
         pd.Timestamp("2020-03-20"),
         pd.Timestamp("2019-01-22"),
         pd.Timestamp("2011-01-06"),
         pd.Timestamp("2013-01-04"),
         pd.Timestamp("2014-05-09")
      ],
      "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
   }
)
Python

接下来,在groupby函数中使用Grouper选择Date_of_Purchase列。 将频率设置为3Y,即分组3年的区间。

更多Pandas文章,请阅读:Pandas教程

示例

参考下面的代码-

import pandas as pd

# dataframe with one of the columns as Date_of_Purchase
dataFrame = pd.DataFrame(
   {
      "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],

      "Date_of_Purchase": [pd.Timestamp("2021-06-10"),
         pd.Timestamp("2019-07-11"),
         pd.Timestamp("2016-06-25"),
         pd.Timestamp("2021-06-29"),
         pd.Timestamp("2020-03-20"),
         pd.Timestamp("2019-01-22"),
         pd.Timestamp("2011-01-06"),
         pd.Timestamp("2013-01-04"),
         pd.Timestamp("2014-05-09")
      ],

      "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
   }
)

print("DataFrame...\n",dataFrame)

# Grouper to select Date_of_Purchase column within groupby function
print("\nGroup Dataframe by 3 years...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='3Y')).sum())
Python

输出

这将产生以下输出 –

DataFrame...
        Car      Date_of_Purchase Reg_Price
0     Audi    2021-06-10    1000
1    Lexus    2019-07-11    1400
2    Tesla    2016-06-25    1100
3 Mercedes    2021-06-29     900
4      BMW    2020-03-20    1700
5   Toyota    2019-01-22    1800
6   Nissan    2011-01-06    1300
7  Bentley    2013-01-04    1150
8  Mustang    2014-05-09    1350

Group Dataframe by 3 years...
Reg_Price
Date_of_Purchase
2011-12-31    1300
2014-12-31    2500
2017-12-31    1100
2020-12-31    4900
2023-12-31    1900
Python

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程

登录

注册