R语言 按组计算唯一值
在这篇文章中,我们将讨论如何在R编程语言中按组计算唯一值的数量。让我们来看看下面的例子。
假设你有一个多列的数据集,像这样。
编号 | 类 | 年龄 | age_group |
---|---|---|---|
1 | A | 20 | 年轻 |
2 | B | 15 | 小孩 |
3 | C | 45 | 老的 |
4 | B | 14 | 小孩 |
5 | A | 21 | 年轻 |
6 | A | 22 | 杨 |
7 | C | 47 | 老龄 |
8 | A | 19 | 年轻 |
9 | B | 16 | 小孩 |
10 | C | 50 | 老人 |
11 | A | 23 | 年轻 |
在这个假数据集中, class 、 age 、 age_group 代表列名,我们的任务是计算age_group的唯一值的数量。
因此,结果的数据集应该是这样的。
编号 | age_group | 唯一值 |
---|---|---|
1 | 年轻 | 5 |
2 | 小孩 | 3 |
3 | 老人 | 3 |
方法1:使用聚合函数
使用聚合函数,我们可以对多行进行操作(通过对数据分组),并产生一个单一的汇总值。
例子
# Count Unique values by group
# Creating dataset
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
"OLD","YOUNG","KID","OLD","YOUNG")
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
df
# applying aggregate function
aggregate( age~age_group,df, function(x) length(unique(x)))
输出
输出1。
方法2:使用 dplyr 包和group_by函数
” dplyr “ 是最广泛使用的 R 包。 它主要用于数据处理目的。它提供了一套数据处理的工具。
例子
# Count Unique values by group
# loading dplyr
library("dplyr")
# Creating dataset
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
"OLD","YOUNG","KID","OLD","YOUNG")
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
# grouping age_group column
# counting all the unique
# value based on the age_group
# column
df %>%
group_by(age_group) %>%
summarise(n_distinct(age))
输出
输出2。