R语言比较相邻的行 data.table

data.table 包用于简化数据处理操作，如子集、分组和R编程语言中的数据表更新操作。

索引方法被用来创建一个新的列，计算与同一组内遇到的前一个值的滞后。使用 “by “属性对该组进行说明。使用c(NA, x[-.N])方法添加新的列，并添加其相应的值，其中x是用来计算新列值的列的指标。特定组中数值的第一个实例用NA替换。

语法

dt[, new-col-name := c(NA, x[-.N]), by ] 。

例子1： 比较R数据中相邻的行。

# importing required packages
library("data.table")
  
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
                         col2 = sample(1:6 , 12, replace = TRUE)
)
  
print ("original data frame")
print (data_frame)
  
# computing lag group by column1 
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)

输出

[1] "original data frame"
   col1 col2
1:    b    6
2:    c    5
3:    a    1
4:    d    6
5:    d    5
6:    b    6
7:    b    5
8:    a    2
9:    c    6
10:    a    3
11:    a    4
12:    d    1
[1] "modified data frame"
    col1 col2 lag
1:    b    6  NA
2:    c    5  NA
3:    a    1  NA
4:    d    6  NA
5:    d    5   6
6:    b    6   6
7:    b    5   6
8:    a    2   1
9:    c    6   5
10:    a    3   2
11:    a    4   3
12:    d    1   5

现在，相邻行之间的差值是用公式计算的，其中新列和现有列x的值在数据表中使用。

语法

data_frame[, diff-col := x - new-col-name]

例2： R中相邻数据.表之间的差异

# importing required packages
library("data.table")
  
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
                         col2 = sample(1:6 , 12, replace = TRUE)
)
  
print ("original data frame")
print (data_frame)
  
# computing lag group by column1 
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
  
data_mod <-data_frame[, difference := col2 - lag]
print ("modified data frame")
print (data_mod)

输出

[1] "original data frame"
   col1 col2
1:    a    1
2:    d    3
3:    d    6
4:    d    3
5:    d    2
6:    b    4
7:    d    5
8:    c    6
9:    d    2
10:    b    4
11:    d    1
12:    a    6
[1] "modified data frame"
   col1 col2 lag difference
1:    a    1  NA         NA
2:    d    3  NA         NA
3:    d    6   3          3
4:    d    3   6         -3
5:    d    2   3         -1
6:    b    4  NA         NA
7:    d    5   2          3
8:    c    6  NA         NA
9:    d    2   5         -3
10:    b    4   4          0
11:    d    1   2         -1
12:    a    6   1          5

例3 :

# importing required packages
library("data.table")
  
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],16, replace = TRUE),
                         col2 = 100:115
)
  
print ("original data frame")
print (data_frame)
  
# computing difference 
data_frame[, col3 := c(NA, col2[-.N]), by = col1]
  
data_mod <-data_frame[, difference := col2 - col3]
print ("modified data frame")
print (data_mod)

输出

[1] "original data frame"
   col1 col2
1:    d  100
2:    a  101
3:    b  102
4:    a  103
5:    d  104
6:    d  105
7:    c  106
8:    a  107
9:    b  108
10:    a  109
11:    b  110
12:    d  111
13:    b  112
14:    d  113
15:    c  114
16:    b  115
[1] "modified data frame"
   col1 col2 col3 difference
1:    d  100   NA         NA
2:    a  101   NA         NA
3:    b  102   NA         NA
4:    a  103  101          2
5:    d  104  100          4
6:    d  105  104          1
7:    c  106   NA         NA
8:    a  107  103          4
9:    b  108  102          6
10:    a  109  107          2
11:    b  110  108          2
12:    d  111  105          6
13:    b  112  110          2
14:    d  113  111          2
15:    c  114  106          8
16:    b  115  112          3