R语言 比较相邻的行 data.table
data.table 包用于简化数据处理操作,如子集、分组和R编程语言中的数据表更新操作。
索引方法被用来创建一个新的列,计算与同一组内遇到的前一个值的滞后。使用 “by “属性对该组进行说明。使用c(NA, x[-.N])方法添加新的列,并添加其相应的值,其中x是用来计算新列值的列的指标。特定组中数值的第一个实例用NA替换。
语法
dt[, new-col-name := c(NA, x[-.N]), by ] 。
例子1: 比较R数据中相邻的行。
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
col2 = sample(1:6 , 12, replace = TRUE)
)
print ("original data frame")
print (data_frame)
# computing lag group by column1
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
输出
[1] "original data frame"
col1 col2
1: b 6
2: c 5
3: a 1
4: d 6
5: d 5
6: b 6
7: b 5
8: a 2
9: c 6
10: a 3
11: a 4
12: d 1
[1] "modified data frame"
col1 col2 lag
1: b 6 NA
2: c 5 NA
3: a 1 NA
4: d 6 NA
5: d 5 6
6: b 6 6
7: b 5 6
8: a 2 1
9: c 6 5
10: a 3 2
11: a 4 3
12: d 1 5
现在,相邻行之间的差值是用公式计算的,其中新列和现有列x的值在数据表中使用。
语法
data_frame[, diff-col := x - new-col-name]
例2: R中相邻数据.表之间的差异
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
col2 = sample(1:6 , 12, replace = TRUE)
)
print ("original data frame")
print (data_frame)
# computing lag group by column1
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
data_mod <-data_frame[, difference := col2 - lag]
print ("modified data frame")
print (data_mod)
输出
[1] "original data frame"
col1 col2
1: a 1
2: d 3
3: d 6
4: d 3
5: d 2
6: b 4
7: d 5
8: c 6
9: d 2
10: b 4
11: d 1
12: a 6
[1] "modified data frame"
col1 col2 lag difference
1: a 1 NA NA
2: d 3 NA NA
3: d 6 3 3
4: d 3 6 -3
5: d 2 3 -1
6: b 4 NA NA
7: d 5 2 3
8: c 6 NA NA
9: d 2 5 -3
10: b 4 4 0
11: d 1 2 -1
12: a 6 1 5
例3 :
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],16, replace = TRUE),
col2 = 100:115
)
print ("original data frame")
print (data_frame)
# computing difference
data_frame[, col3 := c(NA, col2[-.N]), by = col1]
data_mod <-data_frame[, difference := col2 - col3]
print ("modified data frame")
print (data_mod)
输出
[1] "original data frame"
col1 col2
1: d 100
2: a 101
3: b 102
4: a 103
5: d 104
6: d 105
7: c 106
8: a 107
9: b 108
10: a 109
11: b 110
12: d 111
13: b 112
14: d 113
15: c 114
16: b 115
[1] "modified data frame"
col1 col2 col3 difference
1: d 100 NA NA
2: a 101 NA NA
3: b 102 NA NA
4: a 103 101 2
5: d 104 100 4
6: d 105 104 1
7: c 106 NA NA
8: a 107 103 4
9: b 108 102 6
10: a 109 107 2
11: b 110 108 2
12: d 111 105 6
13: b 112 110 2
14: d 113 111 2
15: c 114 106 8
16: b 115 112 3