R语言把因子转换成数字和数字转换成因子

因子是一种数据结构，用于对数据进行分类或表示分类数据，并将其存储在多个层面。它们可以被存储为整数，每一个独特的整数都有一个相应的标签。尽管因子看起来类似于字符向量，但它们是整数，在将它们作为字符串使用时必须注意。因子只接受有限数量的独特值。它有助于对数据进行分类并在多个层次上进行存储。

将因子转换为数值

有时，你需要明确地将因子改变为数字或文本。为了实现这一点，必须使用 as.character() 或 as.numeric() 函数。将因子转换为数字有两个步骤：

第1步： 将数据向量转换为一个因子。

factor() 命令用于在R中创建和修改因子。

第2步： 使用 as.numeric() 将因子转换为数字向量。当因子被转换为数字向量时，将返回与因子水平相对应的数字代码。

例如： 取一个由方向组成的数据向量’V’，其因子将被转换为数字。

# Data Vector 'V'
V = c("North", "South", "East", "East")
 
# Convert vector 'V' into a factor
drn <- factor(V)
 
# Converting a factor into a numeric vector
as.numeric(drn)

输出:

[1] 2 3 1 1

转换为数字的因子： 如果因子是数字，首先将其转换为字符向量，然后再转换为数字。如果一个因子是一个字符，那么你不需要把它转换为字符。如果你试图将一个字母字符转换为数字，它将返回NA。

例子： 假设我们正在计算各种品牌的肥皂的成本，其数值为s(29, 28, 210, 28, 29)。

# Creating a Factor
soap_cost <- factor(c(29, 28, 210, 28, 29))
 
# Converting Factor to numeric
as.numeric(as.character(soap_cost))

输出:

[1]  29  28 210  28  29

在R编程中把因子转换成数字和数字转换成因子

# Creating a Factor
soap_cost <- factor(c(29, 28, 210, 28, 29))
 
# Converting Factor to Numeric
as.numeric(soap_cost)

输出:

[1] 2 1 3 1 2

将数字值转换为因子

为了将数字转换为因子，我们使用 cut() 函数。 cut() 将要转换的数字向量(假设为x)的范围划分为若干个区间，并根据它们所在的区间对其值(x)进行编码。第一级对应于最左边的，第二级对应于下一个最左边的，以此类推。

语法： cut.default(x, breaks, labels = NULL, include.below = FALSE, right = TRUE, dig.lab = 3)

其中

当通过’break=’参数给出一个数字时，输出因子是通过将变量范围划分为该数量的等长区间来创建的。
在语法中include.lower表示是否应该包括等于最低的（对于right= TRUE）break的值的’x[i]’。语法中的’right’表示区间是否应该在左边打开，在右边关闭，或者反之亦然。
如果没有提供标签，则使用dig.lab。通过它来确定用于格式化断裂数的位数。

例1： 让我们假设一个由年龄、工资和性别组成的雇员数据集。为了创建一个与年龄相对应的、具有三个等距水平的因子，我们可以在R中写如下。

# Creating vectors
age <- c(40, 49, 48, 40, 67, 52, 53) 
salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
            "female", "male", "female", "transgender")
 
# Creating data frame named employee
employee<- data.frame(age, salary, gender) 
 
# Creating a factor corresponding to age
# with three equally spaced levels
wfact = cut(employee$age, 3)
table(wfact)

输出:

wfact
(40,49] (49,58] (58,67] 
      4       2       1

例2： 我们现在将贴上标签–年轻、中年和老年。

# Creating vectors
age <- c(40, 49, 48, 40, 67, 52, 53) 
salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
            "female", "male", "female", "transgender")
 
# Creating data frame named employee
employee<- data.frame(age, salary, gender) 
 
# Creating a factor corresponding to age with labels
wfact = cut(employee$age, 3, labels=c('Young', 'Medium', 'Aged'))
table(wfact)

输出:

wfact
 Young Medium   Aged 
     4      2      1

接下来的例子将使用 ‘norm() ‘来生成指定空间内的多变量正态分布随机变量。rnorm()有三个参数。

n：需要生成的随机变量的数量
mean： 如果没有提到，其值默认为0
sd： 需要提及标准差值，否则默认为1。

语法:

norm(n, mean, sd)

# Generating a vector with random numbers
y <- rnorm(100)
 
# the output factor is created by the division
# of the range of variables into pi/3*(-3:3)
# 4 equal-length intervals
table(cut(y, breaks = pi/3*(-3:3)))

输出:

(-3.14,-2.09] (-2.09,-1.05]     (-1.05,0]      (0,1.05]   (1.05,2.09] 
            1            11            26            48            10 
  (2.09,3.14] 
            4

输出因子是通过break参数将变量范围划分为5个等长的区间而产生的。

age <- c(40, 49, 48, 40, 67, 52, 53) 
gender <- c("male", "male", "transgender", "female", "male", "female", "transgender")
 
# Data frame generated from the above vectors
employee<- data.frame(age, gender) 
 
# the output factor is created by the division
# of the range of variables into 5 equal-length intervals
wfact = cut(employee$age, breaks=5)
table(wfact)

输出:

wfact
  (40,45.4] (45.4,50.8] (50.8,56.2] (56.2,61.6]   (61.6,67] 
          2           2           2           0           1

y <- rnorm(100)
table(cut(y, breaks = pi/3*(-3:3), dig.lab=5))

输出:

(-3.1416,-2.0944] (-2.0944,-1.0472]       (-1.0472,0]        (0,1.0472] 
                5                13                33                28 
  (1.0472,2.0944]   (2.0944,3.1416] 
               19                 2