如何计算Pandas数据框架列的不同值

让我们看看如何计算Pandas数据框架列的不同值？

考虑一个表格结构，如下图所示，它必须被创建为数据框架。列是身高，体重和年龄。8个学生的记录构成行。

name	身高	体重	年龄
Steve	165	63.5	20
Ria	164	63.5	22
Jane	158	54	21
Kate	167	63.5	23
Lucy	160	62	22
Ram	158	64	20
Niki	165	64	21

第一步是为上述表格创建Dataframe。请看下面的代码片断。

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# show the Dataframe
df

输出:

如何计算Pandas数据框架列的不同值？

方法1：使用for循环。

数据框架已经创建，人们可以使用for循环进行硬编码，并计算特定列中唯一值的数量。例如在上表中，如果希望计算列height中的唯一值的数量。我们的想法是用一个变量cnt来存储计数，用一个列表visit来存储以前访问的值。然后用for循环遍历’height’列，对于每一个值，它检查相同的值是否已经在被访问列表中被访问过。如果该值以前没有被访问过，那么计数将增加1。

以下是实现情况。

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)

输出 :

No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]

但是，当数据框架的规模越来越大，包含成千上万的行和列时，这种方法就不那么有效了。为了提供一个有效的方法，有三种方法可用，下面列出。

pandas.unique()
Dataframe.nunique()
Series.value_counts()

方法2：使用unique().

unique方法接受一个一维数组或系列作为输入，并返回其中的唯一项目列表。返回值是一个NumPy数组和其中的内容，基于所传递的输入。如果索引作为输入被提供，那么返回值也将是唯一值的索引。

语法: pandas.unique(Series)

示例:

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :", 
      n)

输出:

No.of.unique values : 5

方法3：使用 Dataframe.nunique() .

该方法返回指定axis中唯一值的数量。语法是:

语法: Dataframe.nunique (axis=0/1, dropna=True/False)

示例:

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)

输出:

No.of.unique values in each column :
height    5
weight    4
age       4
dtype: int64

要获得指定列中唯一值的数量。

语法: Dataframe.col_name.nunique()

示例:

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)

输出:

No.of.unique values in height column : 5

方法3：使用Series.value_counts()

该方法返回指定列中所有唯一值的计数。

语法: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

示例:

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))