Pandas – 从多列中寻找唯一值

Pandas – 从多列中寻找唯一值

在这篇文章中,我们将讨论从Pandas DataFrame的多列获取唯一值的各种方法。

方法1:使用pandas Unique()和Concat()方法

Pandas系列又名列,有一个unique()方法,可以从一列中只过滤出唯一的值。第一个输出只显示了唯一的FirstNames。我们可以使用pandas concat()方法来扩展这个方法,将所有需要的列并入一个单一的列,然后找到结果列的唯一性。

import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# To get unique values in 1 series/column
print(f"Unique FN: {df['FirstName'].unique()}")
 
# Extending the idea from 1 column to multiple columns
print(f"Unique Values from 3 Columns:\
{pd.concat([df['FirstName'],df['LastName'],df['Age']]).unique()}")

输出:

Unique FN: [‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’]

Unique Values from 3 Columns:[‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’ ‘Singh’ ‘Yadav’ ‘Shukla’

‘Lal’ ‘Mishra’ 26 25 27 28 30]

方法2:使用Numpy.unique()方法

在np.unique()方法的帮助下,我们可以从np.unique()方法中作为参数给出的数组中获得唯一的值。

注意:这种方法有一个限制,即我们不能将str和数字列结合在一起,因此,如果出现这种情况,我们需要将不同的数据类型列结合在一起,那么就采用方法1。

import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
print(np.unique(df[['LastName', 'FirstName']].values))
 
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))

输出:

[‘Arun’ ‘Lal’ ‘Mishra’ ‘Navneet’ ‘Prateek’ ‘Pyare’ ‘Shilpa’ ‘Shukla’

‘Singh’ ‘Yadav’]

方法3:在Python中使用套装

Set有一个只包含唯一值的属性,因此我们将单个系列转换为Set对象,然后取它们的集合联合。与方法2不同,这也适用于所有数据类型的组合。

import pandas as pd
import numpy as np
 
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))

输出:

{‘Singh’, ‘Pyare’, ‘Mishra’, 27, ‘Navneet’, ‘Arun’, ‘Lal’, ‘Shukla’, 30, 25, 26, ‘Yadav’, 28, ‘Shilpa’, ‘Prateek’}

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程