通过GPU在R编程中的距离矩阵

距离测量是统计分析中的一个重要工具。它可以量化样本数据之间的差异，用于数字计算。其中一个流行的距离度量选择是 欧几里得距离 ，它是属性差异平方之和的平方根。特别是，对于两个具有n个数字属性的数据点p和q，它们之间的欧氏距离是。

通过GPU在R编程中的距离矩阵

可用的距离测量方法有（对两个向量x和y写）

欧氏： 两个向量之间的通常距离（2个规范又称L2 ）： **√∑ i (xi -y i )2 **
最大：x和y的两个分量之间的最大距离（上位法）。
曼哈顿： 两个向量之间的绝对距离（1个规范又称L1），∑Ni =1|Pi -Q i | | 。
堪培拉： 分子和分母为零的项从总和中省略，并视其为缺失：∑ i |xi -y i |/(|xi |+|yi **| **)
二进制（又称非对称二进制）： 向量被视为二进制比特，因此非零元素为 “开”，零元素为 “关”。距离是指在那些至少有一个是开的比特中，只有一个是开的比特所占的比例。
闵可夫斯基： p规范，即各部分差值的p次方之和的p根：∑N i =1|Pi -Q i |p）1/p

在R中的实现

对于在R编程中通过GPU计算距离矩阵，我们可以使用 dist() 函数。 dist() 函数计算并返回通过使用指定的距离测量方法计算的距离矩阵，以计算数据矩阵中各行之间的距离。

语法

dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

参数

x：一个数字矩阵、数据框或 “dist “对象

method ： 要使用的距离测量。必须是 “euclidean”、”maximum”、”manhattan”、”canberra”、”binary “或 “minkowski “之一。可以给出任何不明确的子串。

diag： 逻辑值，表示是否应通过print.dist打印距离矩阵的对角线。

upper： 逻辑值，表示是否应通过print.dist打印距离矩阵的上三角。

p：闵可夫斯基距离的幂。

例子

# number of rows should be a multiple of rnorm
x <- matrix(rnorm(150), nrow = 5)
dist(x)
dist(x, diag = TRUE)
dist(x, upper = TRUE)
m <- as.matrix(dist(x))
d <- as.dist(m)
stopifnot(d == dist(x))
 
# showing all the six distance measures
x <- c(0, 0, 1, 1, 1, 1)
y <- c(1, 0, 1, 1, 0, 1)
 
dist(rbind(x, y), method = "binary")
 
dist(rbind(x, y), method = "canberra")
 
dist(rbind(x, y), method = "manhattan")
 
dist(rbind(x, y), method = "euclidean")
 
dist(rbind(x, y), method = "maximum")
 
dist(rbind(x, y), method = "minkowski")

输出

> dist(x)
         1        2        3        4
2 6.772630                           
3 7.615303 7.390410                  
4 6.460424 6.759275 7.773421         
5 6.551426 7.688254 7.886380 7.039102

> dist(x, diag = TRUE)
         1        2        3        4        5
1 0.000000                                    
2 6.772630 0.000000                           
3 7.615303 7.390410 0.000000                  
4 6.460424 6.759275 7.773421 0.000000         
5 6.551426 7.688254 7.886380 7.039102 0.000000

> dist(x, upper = TRUE)
         1        2        3        4        5
1          6.772630 7.615303 6.460424 6.551426
2 6.772630          7.390410 6.759275 7.688254
3 7.615303 7.390410          7.773421 7.886380
4 6.460424 6.759275 7.773421          7.039102
5 6.551426 7.688254 7.886380 7.039102 

> dist(rbind(x, y), method = "binary")
    x
y 0.4

> dist(rbind(x, y), method = "canberra")
    x
y 2.4

> dist(rbind(x, y), method = "manhattan")
  x
y 2

> dist(rbind(x, y), method = "euclidean")
         x
y 1.414214

> dist(rbind(x, y), method = "maximum")
  x
y 1

> dist(rbind(x, y), method = "minkowski")
         x
y 1.414214