在某些列上合并两个Pandas DataFrames
我们可以使用merge函数在某些列上合并两个Pandas DataFrames,只需指定某些列进行合并。
语法: DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, copy=True, indicator=False, validate=None)
示例1:让我们创建一个数据框架,然后将它们合并成一个数据框架。
创建一个数据框架:
# importing modules
import pandas as pd
# creating a dataframe
df1 = pd.DataFrame({'Name':['Raju', 'Rani', 'Geeta', 'Sita', 'Sohit'],
'Marks':[80, 90, 75, 88, 59]})
# creating another dataframe with different data
df2 = pd.DataFrame({'Name':['Raju', 'Divya', 'Geeta', 'Sita'],
'Grade':['A', 'A', 'B', 'A'],
'Rank':[3, 1, 4, 2 ],
'Gender':['Male', 'Female', 'Female', 'Female']})
# display df1
display(df1)
# display df2
display(df2)
输出:
df1
df2
现在合并数据框架:。
# applying merge
df1.merge(df2[['Name', 'Grade', 'Rank']])
输出:
Merged Dataframe
结果数据框包含了df1的所有列,但包含了df2的某些指定列和关键列Name,即结果列包含了Name, Marks, Grade, Rank列。两个数据框架都有不同数量的值,但合并后只显示两个数据框架的共同值。
示例2:在结果的数据框架中,df2的Grade列与df1基于关键列Name合并,合并类型为left,即左边数据框架(df1)的所有值将被显示。
# importing modules
import pandas as pd
# creating a dataframe
df1 = pd.DataFrame({'Name':['Raju', 'Rani', 'Geeta', 'Sita', 'Sohit'],
'Marks':[80, 90, 75, 88, 59]})
# creating another dataframe with different data
df2 = pd.DataFrame({'Name':['Raju', 'Divya', 'Geeta', 'Sita'],
'Grade':['A', 'A', 'B', 'A'],
'Rank':[3, 1, 4, 2 ],
'Gender':['Male', 'Female', 'Female', 'Female']})
# display df1
display(df1)
# display df2
display(df2)
# applying merge with more parameters
df1.merge(df2[['Grade', 'Name']], on = 'Name', how = 'left')
输出:
df1
df2
Merged Dataframe
例子3:在这个例子中,我们将df1与df2合并。df1的Marks列与df2合并,这里只显示两个数据框架中基于关键列Name的共同值。
# importing modules
import pandas as pd
# creating a dataframe
df1 = pd.DataFrame({'Name':['Raju', 'Rani', 'Geeta', 'Sita', 'Sohit'],
'Marks':[80, 90, 75, 88, 59]})
# creating another dataframe with different data
df2 = pd.DataFrame({'Name':['Raju', 'Divya', 'Geeta', 'Sita'],
'Grade':['A', 'A', 'B', 'A'],
'Rank':[3, 1, 4, 2 ],
'Gender':['Male', 'Female', 'Female', 'Female']})
# display df1
display(df1)
# display df2
display(df2)
# applying merge with more parameters
df2.merge(df1[['Marks', 'Name']])
输出:
df1
df2
Merged Dataframe