R语言如何用嵌套条件提取R数据框架中的随机行样本

在这篇文章中，我们将学习如何在R编程语言中用嵌套条件提取DataFrame中的随机行样本。

方法1：使用sample()

我们将使用sample()函数来完成这项任务。R语言中的 sample() 函数根据函数调用中提供的参数来创建随机样本。它需要一个矢量或一个正整数作为函数参数中的对象。

我们将使用的另一个函数是 which()， 这个函数将帮助我们提供提取样本的条件。 which() 函数返回满足参数中给出的条件的元素（以及这些元素的索引）。

语法： df[ sample(which ( conditions ) ,n),] 。

参数。

df： 数据框架
n：要生成的样本数
条件： 根据这个条件提取样本。例：df$year > 5

使用中的数据框架。

    name    year    length  education
1   Welcome 10  40  yes
2   to  51  NA  yes
3   Geeks   19  NA  no
4   for 126 100 no
5   Geeks   99  95  yes

因此，为了实现这种方法，首先要创建数据框架，然后将其与用于从数据框架中提取行的条件一起传递给sample()。下面给出了使用上述数据框架的实现来说明这一点。

例1 :

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]

输出。

   name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes

例2 :

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]

输出。

       name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "3 samples"
     name year length education
5   Geeks   99     95       yes
1 Welcome   10     40       yes
2      to   51     NA       yes

方法2：使用sample_n()函数

R语言中的sample_n()函数是用来从数据框中随机抽取样本的。

语法： sample_n(x, n)

参数 :

x ：数据框
n ：要选择的项目的大小/数量

除了 sample_n ()函数，我们还使用了 filter()函数。R语言中的filter()函数用于选择案例，并根据过滤表达式过滤掉这些数值。

语法：filter(x, expr)

参数 :

x ：要过滤的对象
expr ：作为过滤基础的表达式

我们已经加载了 dplyr 包，因为它包含 filter() 和 sample_n() 函数。在filter函数的参数中，我们传递了我们的样本 dataframe- >df **和我们的 **嵌套条件 作为参数。然后我们用sample_n()函数从满足条件的数据帧中提取 “n “个样本。

语法： filter(df, condition) %>% sample_n(., n)

参数。

df： 数据框架对象
condition： 嵌套条件。例子：df$name != “to”
n: 样本数量

例1 :

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)

输出。

 name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2   Geeks   99     95       yes

例2 :

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)

输出。

 name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
  name year length education
1  for  126    100        no
2   to   51     NA       yes