如何在 Pandas DataFrame 中计算项集频率
使用 Series.value_counts() 方法可以计算项集的频率。首先,让我们创建一个 DataFrame −
# 创建 DataFrame
dataFrame = pd.DataFrame({'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'Lamborghini', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai','Pune'],
'UnitsSold': [95, 80, 80, 75, 92, 90, 95, 50 ]})
使用 value_counts() 方法计算 car 列的频率 −
# 计算列 Car 的频率
count1 = dataFrame['Car'].value_counts()
print("\nCount in column Car")
print(count1)
同样,可以计算其他列的频率。以下是在 Pandas DataFrame 中计算项集频率的完整代码 −
示例
import pandas as pd
# 创建 DataFrame
dataFrame = pd.DataFrame({'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'Lamborghini', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Pune'],
'UnitsSold': [95, 80, 80, 75, 92, 90, 95, 50 ]})
print("Dataframe...")
print(dataFrame)
# 计算列 Car 的频率
count1 = dataFrame['Car'].value_counts()
print("\nCount in column Car")
print(count1)
# 计算列 Place 的频率
count2 = dataFrame['Place'].value_counts()
print("\nCount in column Place")
print(count2)
# 计算列 UnitsSold 的频率
count3 = dataFrame['UnitsSold'].value_counts()
print("\nCount in column UnitsSold")
print(count3)
输出
这将产生以下输出
Dataframe...
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 80
2 Lamborghini Chandigarh 80
3 Audi Bangalore 75
4 Mercedes Hyderabad 92
5 Porsche Mumbai 90
6 Lamborghini Mumbai 95
7 BMW Pune 50
Count in column Car
BMW 2
Lamborghini 2
Mercedes 2
Audi 1
Porsche 1
Name: Car, dtype: int64
Count in column Place
Mumbai 2
Hyderabad 2
Chandigarh 1
Pune 1
Delhi 1
Bangalore 1
Name: Place, dtype: int64
Count in column UnitsSold
95 2
80 2
92 1
75 1
90 1
50 1
Name: UnitsSold, dtype: int64