R语言使用稀疏矩阵

稀疏矩阵是稀疏的元素集合，其中非空元素的数量非常少。在一个完全密集的矩阵中存储稀疏的数据会导致时间和空间的复杂度增加。因此，数据结构被优化以更有效地存储这些数据并减少元素的访问时间。

创建一个稀疏的矩阵

R有一个内置的 “矩阵 “包，提供了创建和处理稀疏矩阵的类。

library(Matrix)

下面的代码片段说明了矩阵库的用法。

# installing the matrix library 
library('Matrix')
  
# declaring matrix of 1000 rows and 1000 cols
mat1 <- Matrix(0, nrow = 1000, 
                  ncol = 1000, 
                  sparse = TRUE)
  
# setting the value at 1st row 
# and 1st col to be 1
mat1[1][1]<-5
  
print ("Size of sparse mat1")
print (object.size(mat1))

输出

[1] "Size of sparse mat1"
5440 bytes

稀疏矩阵占用的空间大大减少，因为它只为非零值节省了空间。

从密集矩阵构建稀疏矩阵

密集矩阵可以通过R中内置的 matrix() 命令简单地创建，然后将密集矩阵作为输入输入到隐含在R中的 as() 函数中，该函数的签名如下。

语法：as(dense_matrix, type = )

参数

dense_matrix : 一个数字或逻辑数组。

type : 默认评估为dgCMatrix，如果我们提到sparseMatrix的话。这可以将矩阵转换为压缩稀疏列（CSC）格式。另一个可用的类型是dgRMatrix，它将密集矩阵转换为稀疏行格式。

下面的代码片断显示了密集矩阵到稀疏矩阵的转换。

library(Matrix)
  
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 6, 7), 
  prob = c(0.8, 0.1, 0.1), 
  size = rows * cols, 
  replace = TRUE
)
  
dense_mat <- matrix(vals, nrow = rows)
print("Dense Matrix")
print(dense_mat)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, 
                "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)

输出

[1] "Dense Matrix"
    [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    7    6    0    0    0    0
[2,]    0    0    0    0    0    6
[3,]    0    7    0    0    6    0
[4,]    0    6    0    0    0    0
[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 7 6 . . . .
[2,] . . . . . 6
[3,] . 7 . . 6 .
[4,] . 6 . . . .

对稀疏矩阵的操作

可以对稀疏矩阵进行各种算术和绑定操作。

标量值的加法和减法

标量值被添加或减去稀疏矩阵的所有元素。由于标量值被所有元素所操作，因此产生的矩阵是一个密集矩阵。下面的代码显示了+或-运算符的用法。

# Loading Library
library(Matrix)
  
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
   
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 10), 
  prob = c(0.85, 0.15), 
  size = rows * cols, 
  replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
print("Addition")
  
# adding a scalar value 5 
# to the sparse matrix 
print(sparse_mat + 5)
print("Subtraction")
  
# subtracting a scalar value 1 
# to the sparse matrix 
print(sparse_mat - 1)

输出

[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[1] "Addition"
4 x 6 Matrix of class "dgeMatrix"
    [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   15   15    5    5    5    5
[2,]    5    5    5    5    5   15
[3,]    5   15    5    5   15    5
[4,]    5   15    5    5    5    5
[1] "Subtraction"
4 x 6 Matrix of class "dgeMatrix"
    [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    9    9   -1   -1   -1   -1
[2,]   -1   -1   -1   -1   -1    9
[3,]   -1    9   -1   -1    9   -1
[4,]   -1    9   -1   -1   -1   -1

标量的乘法或除法

这些操作是对矩阵的所有非零元素进行的。由此产生的矩阵是一个稀疏的矩阵。

# library(Matrix)
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 10), 
  prob = c(0.85, 0.15), 
  size = rows * cols, 
  replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
print("Multiplication")
  
# multiplying a scalar value 10 
# to the sparse matrix 
print(sparse_mat * 10)
print("Division")
  
# dividing a scalar value 10
# to the sparse matrix 
print(sparse_mat / 10)

输出

[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[1] "Multiplication"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 100 100 . .   .   .
[2,]   .   . . .   . 100
[3,]   . 100 . . 100   .
[4,]   . 100 . .   .   .
[1] "Division"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 1 1 . . . .
[2,] . . . . . 1
[3,] . 1 . . 1 .
[4,] . 1 . . . .

矩阵乘法

矩阵可以彼此相乘，不管是稀疏的还是密集的。但是，第一个矩阵的列应该等于第二个矩阵的行。

library(Matrix)
  
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 10), 
  prob = c(0.85, 0.15), 
  size = rows * cols, 
  replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
  
# computing transpose of matrix 
transpose_mat = t(sparse_mat)
  
# computing multiplication of matrix
# and its transpose
mul_mat = sparse_mat %*% transpose_mat
print("Multiplication of Matrices")
print(mul_mat)

输出

[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"                  
[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[1] "Multiplication of Matrices"
4 x 4 sparse Matrix of class "dgCMatrix"

[1,] 200   . 100 100
[2,]   . 100   .   .
[3,] 100   . 200 100
[4,] 100   . 100 100

与矢量的乘法

矩阵可以与一维向量相乘，以转换数据。行与矢量的相应元素相乘，即第一行与矢量的第一个索引元素相乘，直到矢量的长度。

library(Matrix)
  
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 10), 
  prob = c(0.85, 0.15), 
  size = rows * cols, 
  replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
  
# declaring a vector 
vec <- c(3, 2)
print("Multiplication by vector")
print(sparse_mat * vec)

输出

[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"                  
[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[1] "Multiplication by vector"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 30 30 . .  .  .
[2,]  .  . . .  . 20
[3,]  . 30 . . 30  .
[4,]  . 20 . .  .  .

矩阵的组合

矩阵可以通过列绑定 cbind( ) 或行绑定 rbind( ) 操作与向量或其他矩阵结合。结果矩阵的行是 rbind() 函数中输入矩阵的行的总和，列是 cbind() 中输入矩阵的列的总和。

library(Matrix)
  
# construct a matrix with values
#   0 with probability 0.80
#   6 with probability 0.10
#   7 with probability 0.10
set.seed(0)
rows <- 4L
cols <- 6L
vals <- sample(
  x = c(0, 10), 
  prob = c(0.85, 0.15), 
  size = rows * cols, 
  replace = TRUE
)
dense_mat <- matrix(vals, nrow = rows)
  
# Convert to sparse 
sparse_mat <- as(dense_mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)
  
# combining matrix through rows
row_bind <- rbind(sparse_mat,
                  sparse_mat)
  
# printing matrix after row bind
print ("Row Bind")
print (row_bind)

输出

[1] "Sparse Matrix"
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[1] "Row Bind"
8 x 6 sparse Matrix of class "dgCMatrix"

[1,] 10 10 . .  .  .
[2,]  .  . . .  . 10
[3,]  . 10 . . 10  .
[4,]  . 10 . .  .  .
[5,] 10 10 . .  .  .
[6,]  .  . . .  . 10
[7,]  . 10 . . 10  .
[8,]  . 10 . .  .  .

稀疏矩阵的属性

NA值
NA值不被认为等同于稀疏性，因此被当作非零值处理。然而，它们并不参与任何稀疏矩阵的操作。

library(Matrix)
  
# declaring original matrix 
mat <- matrix(data = c(5.5, 0, NA, 
                         0, 0, NA), nrow = 3)
print("Original Matrix")
print(mat)
sparse_mat <- as(mat, "sparseMatrix")
print("Sparse Matrix")
print(sparse_mat)

输出

[1] "Original Matrix"
    [,1] [,2]
[1,]  5.5    0
[2,]  0.0    0
[3,]   NA   NA
[1] "Sparse Matrix"
3 x 2 sparse Matrix of class "dgCMatrix"

[1,] 5.5  .
[2,] .    .
[3,]  NA NA