Python – 从Pandas DataFrame中删除重复的值
要从Pandas DataFrame中删除重复的值,请使用drop_duplicates()方法。首先,创建一个具有3列的DataFrame –
dataFrame = pd.DataFrame({'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'],'UnitsSold': [95, 70, 80, 95, 70, 90]})
删除重复值 –
dataFrame = dataFrame.drop_duplicates()
示例
下面是完整的代码 –
import pandas as pd
# 创建DataFrame
dataFrame = pd.DataFrame({'Car': ['BMW', 'Mercedes', 'Lamborghini', 'BMW', 'Mercedes', 'Porsche'],'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Delhi', 'Hyderabad', 'Mumbai'], 'UnitsSold': [95, 70, 80, 95, 70, 90]})
print"数据框...\n", dataFrame
# 计算列Car的频率
count = dataFrame['Car'].value_counts()
print"\n列Car中的频率"
print(count)
# 删除重复项
dataFrame = dataFrame.drop_duplicates()
print"\n删除重复项后的更新的DataFrame...\n",dataFrame
# 删除重复项后计算列Car的频率
count = dataFrame['Car'].value_counts()
print"\n列Car中的频率"
print(count)
输出
这将产生以下输出 –
数据框...
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 70
2 Lamborghini Chandigarh 80
3 BMW Delhi 95
4 Mercedes Hyderabad 70
5 Porsche Mumbai 90
Count in column Car
BMW 2
Mercedes 2
Porsche 1
Lamborghini 1
Name: Car, dtype: int64
删除重复项后的更新的DataFrame...
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 70
2 Lamborghini Chandigarh 80
5 Porsche Mumbai 90
Count in column Car
BMW 1
Porsche 1
Lamborghini 1
Mercedes 1
Name: Car, dtype: int64