Pandas GroupBy 计算列中的出现次数
使用pandas.DataFrame.groupby()的size()或count()方法将生成数据框架中某一列数据出现的次数的计数。然而,这个操作也可以使用pandas.Series.value_counts()和pandas.Index.value_counts()进行。
步骤
- Import module
- 创建或导入数据框架
- Apply groupby
- 使用两种方法中的任何一种
- Display result
方法1:使用pandas.groupyby().si ze()
使用这种方法的基本方法是在groupby()方法中指定列名作为参数,然后使用size()。下面是各种例子,描述了如何为不同的数据集计算一列中的出现次数。
示例 1:
在这个例子中,我们分别计算数据集中所有列的出现次数。
# import module
import pandas as pd
# assign data
data = pd.DataFrame({'Section': ['A', 'A', 'A', 'B', 'B',
'B', 'C', 'C', 'C'],
'Teacher': ['Kakeshi', 'Kakeshi', 'Iruka',
'Kakeshi', 'Kakeshi', 'Kakeshi',
'Iruka', 'Iruka', 'Guy']})
# display dataframe
print('Data:')
display(data)
print('Occurrence counts of particular columns:')
# count occurrences a particular column
occur = data.groupby(['Section']).size()
# display occurrences of a particular column
display(occur)
# count occurrences a particular column
occur = data.groupby(['Teacher']).size()
# display occurrences of a particular column
display(occur)
输出:
示例 2:
在下面的程序中,我们从上一个程序中使用的同一数据集中计算所有列的出现次数。
# import module
import pandas as pd
# assign data
data = pd.DataFrame({'Section': ['A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C'],
'Teacher': ['Kakeshi', 'Kakeshi', 'Iruka',
'Kakeshi', 'Kakeshi', 'Kakeshi',
'Iruka', 'Iruka', 'Guy']})
# display dataframe
print('Data:')
display(data)
print('Occurrence counts of combined columns:')
# count occurrences of combined columns
occur = data.groupby(['Section', 'Teacher']).size()
# display occurrences of combined columns
display(occur)
输出:
示例 3:
在这里,我们将CSV文件中的分类列的出现次数和合并出现次数分开计算。
# import module
import pandas as pd
# assign data
data = pd.read_csv('diamonds.csv')
# display dataframe
print('Data:')
display(data.sample(10))
print('Occurrence counts of particular column:')
# count occurrences a particular column
occur = data.groupby(['cut']).size()
# display occurrences of a particular column
display(occur)
print('Occurrence counts of combined columns:')
# count occurrences of combined columns
occur = data.groupby(['clarity', 'color', 'cut']).size()
# display occurrences of combined columns
display(occur)
输出:
方法2:使用pandas.groupyby().count()
使用这种方法的基本方法是在groupby()方法中指定列名作为参数,然后使用count()。下面是各种例子,描述了如何为不同的数据集计算一列中的出现次数。
示例 1:
在这个例子中,我们分别计算数据集中所有列的出现次数。
# import module
import pandas as pd
# assign data
data = pd.DataFrame({'Section': ['A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C'],
'Teacher': ['Kakeshi', 'Kakeshi', 'Iruka',
'Kakeshi', 'Kakeshi', 'Kakeshi',
'Iruka', 'Iruka', 'Guy']})
# display dataframe
print('Data:')
display(data)
print('Occurrence counts of particular columns:')
# count occurrences a particular column
occur = data.groupby(['Section']).size()
# display occurrences of a particular column
display(occur)
# count occurrences a particular column
occur = data.groupby(['Teacher']).size()
# display occurrences of a particular column
display(occur)
输出:
示例 2:
在下面的程序中,我们从上一个程序中使用的同一数据集中计算所有列的出现次数。
# import module
import pandas as pd
# assign data
data = pd.DataFrame({'Section': ['A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C'],
'Teacher': ['Kakeshi', 'Kakeshi', 'Iruka',
'Kakeshi', 'Kakeshi', 'Kakeshi',
'Iruka', 'Iruka', 'Guy']})
# display dataframe
print('Data:')
display(data)
print('Occurrence counts of combined columns:')
# count occurrences of combined columns
occur = data.groupby(['Section', 'Teacher']).size()
# display occurrences of combined columns
display(occur)
输出:
示例 3:
在这里,我们将CSV文件中的分类列的出现次数和合并出现次数分开计算。
# import module
import pandas as pd
# assign data
data = pd.read_csv('diamonds.csv')
# display dataframe
print('Data:')
display(data.sample(10))
print('Occurrence counts of particular column:')
# count occurrences a particular column
occur = data.groupby(['cut']).size()
# display occurrences of a particular column
display(occur)
print('Occurrence counts of combined columns:')
# count occurrences of combined columns
occur = data.groupby(['clarity', 'color', 'cut']).size()
# display occurrences of combined columns
display(occur)
输出: