R语言 查找不在其他数据框中的行
寻找一个数据框架中不存在的行,被称为 “集差”。在这篇文章中,我们将看到不同的方法来做到这一点。
方法1:使用sqldf()
在这个方法中,只需通过sql查询来找到set-difference。
语法
sqldf(“sql query”)
我们的查询将是sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2')
。它将排除所有在df1中也存在于df2中的记录,只返回只存在于df1中的记录。
例1 :
require(sqldf)
df1 <- data.frame(a = 1:5, b=letters[1:5])
df2 <- data.frame(a = 1:3, b=letters[1:3])
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2')
print("rows from df1 which are not in df2")
print(res)
例2 :
require(sqldf)
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil"), age = c(23))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM a2')
print("rows from df1 which are not in df2")
print(res)
方法2:使用setdiff( )
这是一个R的内置函数,用于查找两个数据帧的集合差异。
语法
setdiff(df1,df2)
它将返回df1中不存在于df2中的行。
例1 :
df1 <- data.frame(a = 1:5, b=letters[1:5], c= c(1,3,5,7,9))
df2 <- data.frame(a = 1:5, b=letters[1:5], c = c(2,4,6,8,10))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <-setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
输出
例2 :
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil","rahul", "sachin"), age = c(23, 22, 26))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
输出