Python 组和计算 Pandas DataFrame 的列值之和
我们将考虑汽车销售记录的例子,并按月分组以计算每月登记价格的总和。为了求和,我们使用 sum() 方法。
首先,假设以下是我们的 Pandas DataFrame,含有三列 −
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
Python
在 groupby() 函数中,使用 Grouper 选择 Date_of_Purchase 列。频率 freq 设置为“M”以按月分组,使用 sum() 函数计算求和 −
print"\n按月分组数据框...\n", dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()
Python
更多Pandas相关文章,请阅读:Pandas 教程
示例
以下是代码−
import pandas as pd
# 以 Date_of_Purchase 为其中一列的数据框
dataFrame = pd.DataFrame(
{
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
print"数据框...\n",dataFrame
# 使用 Grouper 选择 Date_of_Purchase 列以在 groupby 函数中计算每月的总和
print"\n按月分组数据框...\n", dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()
Python
输出
这将产生以下输出−
数据框...
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2021-07-11 1400
2 Tesla 2021-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2021-03-20 1700
5 Toyota 2021-01-22 1800
6 Nissan 2021-01-06 1300
7 Bentley 2021-01-04 1150
8 Mustang 2021-05-09 1350
按月分组数据框...
Reg_Price
Date_of_Purchase
2021-01-31 4250.0
2021-02-28 NaN
2021-03-31 1700.0
2021-04-30 NaN
2021-05-31 1350.0
2021-06-30 3000.0
2021-07-31 1400.0
Python