用Pandas计算每组的唯一值

用Pandas计算每组的唯一值

在这篇文章中,我们将用Pandas寻找并计算组/列中存在的唯一值。唯一值是指在数据集中只出现一次的不同值,或者是重复值的第一次出现也被算作唯一值。

步骤:

  • 导入pandas库。
  • 使用DataFrame()函数导入或创建数据框架,该函数将数据作为参数传递给你想创建的数据框架,并将其命名为 “df”,或者使用pandas.read_csv()函数导入数据集,其中传递数据集的路径和名称。
  • 选择你想检查或计算唯一值的列。
  • 为了找到唯一的值,我们使用pandas提供的unique()函数,并将其存储在一个变量中,让其命名为 “unique_values”。

语法: pandas.unique(df(column_name)) or df['column_name'].unique()

  • 它将给出该组/列中存在的唯一值。
  • 为了计算唯一值的数量,我们必须首先将名为 “count “的变量初始化为0,然后为 “unique_values “运行for循环,并计算循环运行的次数,将 “count “的值增加1。
  • 然后打印’count’,这个存储值是该特定组/列中存在的唯一值的数量。
  • 为了找到唯一值在特定列中重复出现的次数,我们使用Pandas提供的value_counts()函数。

语法: pandas.value_counts(df[‘column_name’] or df[‘column_name’].value_counts()

  • 这将给出每个独特值在该特定列中重复的次数。

为了更好地理解这个主题。让我们举一些例子,按照上面讨论的方法来实现这些功能。

实例1:使用pandas库创建DataFrame。

# importing library
import pandas as pd
 
# storing the data of cars in the dictionary
car_data = {'Model Name': ['Valiant',
                           'Duster 360',
                           'Merc 240D',
                           'Merc 230',
                           'Merc 280',
                           'Merc 280C',
                           'Merc 450SE',
                           'Merc 450SL',
                           'Merc 450SLC',
                           'Cadillac Fleetwood',
                           'Lincoln Continental',
                           'Chrysler Imperial',
                           'Fiat 128',
                           'Honda Civic',
                           'Toyota Corolla'],
             
            'Gear': [3, 3, 4, 4, 5, 4, 3, 3,
                     3, 3, 3, 3, 4, 4, 4],
             
            'Cylinder': [6, 8, 4, 4, 6, 6, 8,
                         8, 8, 8, 8, 8, 4, 4, 4]}
 
# creating DataFrame for the data using
# pandas DataFrame function.
car_df = pd.DataFrame(car_data)
 
# printing the dataframe
car_df

输出:

用Pandas计算每组的唯一值

例子2:打印每组中存在的唯一值。

# importing libraries
import pandas as pd
 
# storing the data of cars in the dictionary
car_data = {'Model Name': ['Valiant',
                           'Duster 360',
                           'Merc 240D',
                           'Merc 230',
                           'Merc 280',
                           'Merc 280C',
                           'Merc 450SE',
                           'Merc 450SL',
                           'Merc 450SLC',
                           'Cadillac Fleetwood',
                           'Lincoln Continental',
                           'Chrysler Imperial',
                           'Fiat 128',
                           'Honda Civic',
                           'Toyota Corolla'],
             
            'Gear': [3, 3, 4, 4, 5, 4, 3, 3,
                     3, 3, 3, 3, 4, 4, 4],
             
            'Cylinder': [6, 8, 4, 4, 6, 6, 8,
                         8, 8, 8, 8, 8, 4, 4, 4]}
 
# creating DataFrame for the data using pandas
car_df = pd.DataFrame(car_data)
 
# printing the unique values present in the Gear column
# finding unique values present
# in the Gear column using unique() function
print(f"Unique values present in Gear column are: {car_df['Gear'].unique()}")
 
# printing the unique values present
# in the Cylinder column
# finding unique values present in the
# Cylinder column using unique() function
print(f"Unique values present in Cylinder column are: {car_df['Cylinder'].unique()}")

输出:

用Pandas计算每组的唯一值

从上面的输出图像中,我们可以观察到,我们从两组中得到了三个独特的值。

例子3:寻找每组中存在的独特值的另一种方法。

# importing libraries
import pandas as pd
 
# storing the data of cars in the dictionary
car_data = {'Model Name': ['Valiant',
                           'Duster 360',
                           'Merc 240D',
                           'Merc 230',
                           'Merc 280',
                           'Merc 280C',
                           'Merc 450SE',
                           'Merc 450SL',
                           'Merc 450SLC',
                           'Cadillac Fleetwood',
                           'Lincoln Continental',
                           'Chrysler Imperial',
                           'Fiat 128',
                           'Honda Civic',
                           'Toyota Corolla'],
             
            'Gear': [3, 3, 4, 4, 5, 4, 3, 3,
                     3, 3, 3, 3, 4, 4, 4],
             
            'Cylinder': [6, 8, 4, 4, 6, 6, 8, 8,
                         8, 8, 8, 8, 4, 4, 4]}
 
# creating DataFrame for the data using pandas
car_df = pd.DataFrame(car_data)
 
# finding unique values present in the
# groups using unique() function
unique_gear = pd.unique(car_df.Gear)
unique_cyl = pd.unique(car_df.Cylinder)
 
# printing the unique values present in the Gear column
print(f"Unique values present in Gear column are: {unique_gear}")
 
# printing the unique values present in the Cylinder column
print(f"Unique values present in Cylinder column are: {unique_cyl}")

输出:

用Pandas计算每组的唯一值

输出结果是相似的,但不同的是,在这个例子中,我们通过使用pd.unique()函数建立了每个组中存在的唯一值,我们在其中传递了我们的数据框架列。

例子4:计算每个独特值重复的次数。

# importing libraries
import pandas as pd
 
# storing the data of cars in the dictionary
car_data = {'Model Name': ['Valiant',
                           'Duster 360',
                           'Merc 240D',
                           'Merc 230',
                           'Merc 280',
                           'Merc 280C',
                           'Merc 450SE',
                           'Merc 450SL',
                           'Merc 450SLC',
                           'Cadillac Fleetwood',
                           'Lincoln Continental',
                           'Chrysler Imperial',
                           'Fiat 128',
                           'Honda Civic',
                           'Toyota Corolla'],
             
            'Gear': [3, 3, 4, 4, 5, 4, 3,
                     3, 3, 3, 3, 3, 4, 4, 4],
             
            'Cylinder': [6, 8, 4, 4, 6, 6, 8,
                         8, 8, 8, 8, 8, 4, 4, 4]}
 
# creating DataFrame for the data using pandas
car_df = pd.DataFrame(car_data)
 
# counting number of times each unique values
# present in the particular group using
# value_counts() function
gear_count = pd.value_counts(car_df.Gear)
cyl_count = pd.value_counts(car_df.Cylinder)
 
# another way of obtaining the same output
g_count = car_df['Gear'].value_counts()
cy_count = car_df['Cylinder'].value_counts()
print('----Output from first method-----')
 
# printing number of times each unique
# values present in the particular group
print(gear_count)
print(cyl_count)
 
# printing output from the second method
print('----Output from second method----')
print(g_count)
print(cy_count)

输出:

用Pandas计算每组的唯一值

从上面的输出图像来看,我们从两种编写代码的方法中得到了相同的结果。

我们可以观察到,在Gear列中,我们得到的唯一值是3、4和5,分别重复了8、6和1次,而在Cylinder列中,我们得到的唯一值是8、4和6,分别重复了7、5和3次。

例子5:计算组中存在的唯一值的数量。

# importing libraries
import pandas as pd
 
# storing the data of cars in the dictionary
car_data = {'Model Name': ['Valiant',
                           'Duster 360',
                           'Merc 240D',
                           'Merc 230',
                           'Merc 280',
                           'Merc 280C',
                           'Merc 450SE',
                           'Merc 450SL',
                           'Merc 450SLC',
                           'Cadillac Fleetwood',
                           'Lincoln Continental',
                           'Chrysler Imperial',
                           'Fiat 128',
                           'Honda Civic',
                           'Toyota Corolla'],
             
            'Gear': [3, 3, 4, 4, 5, 4, 3, 3,
                     3, 3, 3, 3, 4, 4, 4],
             
            'Cylinder': [6, 8, 4, 4, 6, 6, 8,
                         8, 8, 8, 8, 8, 4, 4, 4]}
 
# creating DataFrame for the data using pandas
car_df = pd.DataFrame(car_data)
 
# finding unique values present in the particular group.
name_count = pd.unique(car_df['Model Name'])
gear_count = pd.unique(car_df.Gear)
cyl_count = pd.unique(car_df.Cylinder)
 
# initializing variable to 0 for counting
name_unique = 0
gear_unique = 0
cyl_unique = 0
 
# writing separate for loop of each groups
for item in name_count:
    name_unique += 1
 
for item in gear_count:
    gear_unique += 1
 
for item in gear_count:
    cyl_unique += 1
 
# printing the number of unique values present in each group
print(f'Number of unique values present in Model Name: {name_unique}')
print(f'Number of unique values present in Gear: {gear_unique}')
print(f'Number of unique values present in Cylinder: {cyl_unique}')

输出:

用Pandas计算每组的唯一值

从上面的输出图像中,我们可以观察到,我们在模型名称、齿轮和气缸列中分别得到15、3和3个独特的值。

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程