R语言 使用Dplyr过滤或子集行

R语言 使用Dplyr过滤或子集行

在这篇文章中,我们将使用Dplyr包在R编程语言中对数据框架中的行进行过滤。

使用中的数据框架

使用Dplyr在R中过滤或子集行

方法1:使用filter()对行进行子集或过滤

为了过滤或子集行,我们将使用 filter() 函数。

语法:

filter(dataframe,condition)

这里,dataframe是输入数据框,condition用于过滤数据框中的数据。

例子: 过滤数据框的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
#display actual  dataframe
print(data)
print("==========================")
  
#filter dataframe with department is sales
print(filter(data,department=="sales"))

输出

使用Dplyr在R中过滤或子集行

方法2:用多个条件过滤数据框

我们将使用过滤器函数来过滤行。这里我们必须在过滤函数中指定条件。

语法

filter(dataframe,condition1condition2,.condition n)

这里,dataframe是输入数据框,condition用于过滤数据框中的数据。

例子: 过滤多条记录的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# filter dataframe with department is sales and 
# salary is greater than 27000
print(filter(data,department=="sales" & salary >27000))

输出

使用Dplyr在R中过滤或子集行

例子: 通过OR运算过滤行

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# filter dataframe with department is IT or salary 
# is greater than 27000
print(filter(data,department=="IT" |  salary >27000))

输出

使用Dplyr在R中过滤或子集行

例子: 使用和、或进行过滤的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# filter dataframe with department is sales 
# and salary is greater than 27000 or salary
# less than 5000
print(filter(data,department=="sales" & salary >27000 | salary<5000))

输出

使用Dplyr在R中过滤或子集行

方法3:使用slice_head()函数

这个函数用于从数据框架中获取前n行。

语法:

dataframe %>% slice_head(n)

其中,dataframe是输入的数据框架,%>%是加载数据框架的操作符(管道操作符),n是要显示的行数。

例子: 使用slice_head()来过滤行的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# display top 3 values with slice_head
data %>% slice_head(n=3)
print("==========================")
  
# display top 5 values with slice_head
data %>% slice_head(n=5)
print("==========================")
  
# display top 1 value  with slice_head
data %>% slice_head(n=1)

输出

使用Dplyr在R中过滤或子集行

方法4:使用slice_tail()函数

这个函数用于从数据框架中获取最后的n条记录

语法:

dataframe %>% slice_tail(n)

其中,dataframe是输入的数据框架,%>%是加载数据框架的操作符(管道操作符),n是最后显示的行数。

例子: R程序通过使用slice_tail()方法来过滤最后几行

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# display last 3 values with slice_tail
data %>% slice_tail(n=3)
print("==========================")
  
  
# display last 5 values with slice_tail
data %>% slice_tail(n=5)
print("==========================")
  
# display last 1 value  with slice_tail
data %>% slice_tail(n=1)

输出

使用Dplyr在R中过滤或子集行

方法5:使用top_n()函数

这个函数用来获取前n行。

语法:

data %>% top_n(n=5)

例子: 使用top_n()函数过滤行的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,78900.00,
                         25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# display last 3 values with top_n
data %>% top_n(n=3)
print("==========================")
  
# display last 5 values with top_n
data %>% top_n(n=5)
print("==========================")
  
# display last 1 value  with top_n
data %>% top_n(n=1)

输出

使用Dplyr在R中过滤或子集行

方法6:使用slice_sample()函数

在这里,我们将使用slice_sample()函数来过滤行,这将随机返回n条样本行

语法:

slice_sample(n)

例子: 使用slice_sample()函数来过滤行的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# display last 3 values with slice_sample
data %>% slice_sample(n=3)
print("==========================")
  
# display last 5 values with slice_sample
data %>% slice_sample(n=5)
print("==========================")
  
# display last 1 value  with slice_sample
data %>% slice_sample(n=1)

输出

使用Dplyr在R中过滤或子集行

方法7:使用slice_max()函数

这个函数返回基于一列的数据框架的最大n行数

语法

dataframe %>% slice_max(column, n )

其中,dataframe是输入的数据框架,列是数据框架的列,基于该列返回最大行,n是要返回的最大行数。

示例: 使用slice_max()函数进行过滤的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# return top 3 maximum rows based on salary 
# column in the dataframe
print(data %>% slice_max(salary, n = 3))
print("==========================")
  
# return top 5 maximum rows based on department 
# column in the dataframe
print(data %>% slice_max(department, n = 5))
print("==========================")

输出

使用Dplyr在R中过滤或子集行

方法8:使用slice_min()函数

这个函数返回基于一列的数据框架的最小n行数

语法

dataframe %>% slice_min(column, n )

其中,dataframe是输入的数据框架,列是数据框架的列,基于此列返回最大行,n是要返回的最小行数。

示例: 使用slice_min()进行过滤的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# return top 3 minimum rows based on salary 
# column in the dataframe
print(data %>% slice_min(salary, n = 3))
print("==========================")
  
# return top 5 minimum rows based on department 
# column in the dataframe
print(data %>% slice_min(department, n = 5))
print("==========================")

输出

使用Dplyr在R中过滤或子集行

方法9:使用sample_frac()函数

sample_frac()函数从一个数据框(或表)中随机选择n百分比的行。第一个参数包含数据框的名称,第二个参数告诉我们要选择多少百分比的行。

语法

sample_frac(dataframe,n)

其中dataframe是输入数据框,n是分数值。

示例: 使用sample_frac()函数过滤数据的R程序

# load the package
library(dplyr)
  
# create the dataframe with three columns
# id , department and salary with 8 rows
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034),
                  
                department=c('IT','sales','finance','IT','finance',
                             'sales','HR','HR'),
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,
                         78900.00,25000.00,45000.00,90000))
  
# display actual  dataframe
print(data)
print("==========================")
  
# return 2 rows
print(sample_frac(data,0.2))
print("==========================")
  
# return 4 rows
print(sample_frac(data,0.4))
print("==========================")
  
# return 7 rows
print(sample_frac(data,0.7))
print("==========================")

输出

使用Dplyr在R中过滤或子集行

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程