R语言 使用Dplyr按一个或多个变量分组
group_by() 方法是用来根据特定列中包含的组来划分和隔离日期。需要分组的列被指定为该函数的参数。它可以包含多个列名。
语法
group_by(col1, col2, …)
例1: 按一个变量分组
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# computing difference of each group
data_frame%>%group_by(col1)
R
输出
[1] "Original DataFrame"
col1 col2 col3
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1 [2]
col1 col2 col3
<int> <chr> <dbl>
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
R
分组也可以使用属于数据框架的多个列来完成,为此只需将各列的名称传递给函数。
例2: 按多列分组
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# computing difference of each group
data_frame%>%group_by(col1,col2)
R
输出
[1] "Original DataFrame"
col1 col2 col3
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1, col2 [6]
col1 col2 col3
<int> <chr> <dbl>
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2
R