R语言如何将DataFrame列从字符转换为数字

在这篇文章中，我们将讨论如何在R编程语言中把DataFrame列从字符转换为数字。

所有的数据框架列都与一个类相关联，这个类是该列元素所属的数据类型的指标。因此，为了模拟数据类型转换，在这种情况下，数据元素必须被转换为所需的数据类型，即该列的所有元素都应该有资格成为数值。

注意： sapply()方法可以用来检索向量形式的列变量的数据类型。

方法1：使用 transform()方法

字符类型的列，无论是单个字符还是字符串，只有在这些转换是可能的情况下才能转换成数字值。否则，数据就会丢失，并在执行时被编译器胁迫为缺失或NA值。

这种方法描述了由于插入缺失或NA值来代替字符而造成的数据损失。这些NA值的引入是因为相互转换是不可能直接实现的。

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6 : 9),
               col2 = factor(4 : 7),
               col3 = letters[2 : 5],
               col4 = 97 : 100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of 
# each variable
sapply(data_frame, class)
  
# converting character type 
# column to numeric
data_frame_col1 <- transform(data_frame,
                             col1 = as.numeric(col1))
print("Modified col1 DataFrame")
print (data_frame_col1)
  
# indicating the data type of 
# each variable
sapply(data_frame_col1, class)
  
# converting character type column
# to numeric
data_frame_col3 <- transform(data_frame, 
                             col3 = as.numeric(col3))
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable
sapply(data_frame_col3, class)

输出。

如何将R语言中的DataFrame列从字符转换为数字？

解释：使用sapply()方法，数据框架col3的类是一个字符，也就是由单字节的字符值组成，但是在应用transform()方法时，这些字符值被转换成缺失或NA值，因为字符不能直接转换为数字数据。所以，这导致了数据丢失。

可以通过不使用stringAsFactors=FALSE来进行转换，然后首先使用as.factor()隐含地将字符转换为因子，然后使用as.numeric()转换为数字数据类型。即使在这种情况下，关于实际字符串的信息也会完全丢失。然而，数据变得模糊不清，并可能导致实际数据丢失。数据只是根据列值的lexicographic排序结果被赋予数字值。

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6 : 9),
               col2 = factor(4 : 7),
               col3 = c("Geeks", "For", "Geeks", "Gooks"),
               col4 = 97 : 100)
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of
# each variable
sapply(data_frame, class)
  
# converting character type column 
# to numeric
data_frame_col3 <- transform(data_frame,
                             col3 = as.numeric(as.factor(col3)))
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable
sapply(data_frame_col3, class)

输出。

[1] "Original DataFrame"
col1 col2  col3 col4
1    6    4 Geeks   97
2    7    5   For   98
3    8    6 Geeks   99
4    9    7 Gooks  100
   col1      col2      col3      col4
"factor"  "factor"  "factor" "integer"
[1] "Modified col3 DataFrame"
col1 col2 col3 col4
1    6    4    2   97
2    7    5    1   98
3    8    6    2   99
4    9    7    3  100
   col1      col2      col3      col4
"factor"  "factor" "numeric" "integer"

解释：col3中的第一个字符串和第三个字符串是相同的，因此分配了相同的数字值。总的来说，这些数值是按升序排序的，然后分配给相应的整数值。”For “是以词法顺序出现的最小的字符串，因此，分配的数值是1，然后是 “Geeks”，这两个实例都被映射为2，”Gooks “被分配的数值是3，因此，col3的类型变成了数值。

方法2：使用 apply()方法

R中的apply()方法允许将一个函数同时应用于多个列。该函数可以是用户定义的，也可以是内置的，取决于用户的需要。

语法： apply ( df , axis , FUN)

参数 :

df – 要应用该函数的数据框架
axis – 应用该函数的轴
FUN– 用户定义的应用方法

例子。

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6:9),
               col2 = as.character(4:7),
               col3 = c("Geeks","For","Geeks","Gooks"),
               col4 = letters[1:4])
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable
sapply(data_frame, class)
  
# defining the vector of columns to 
# convert to numeric
vec <- c(1,2)
  
# apply the conversion on columns
data_frame[ , vec] <- apply(data_frame[ , vec,drop=F], 2,           
                    function(x) as.numeric(as.character(x)))
print("Modified DataFrame")
print (data_frame)
  
# indicating the data type of each variable
sapply(data_frame, class)

输出。

[1] "Original DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks    a
2    7    5   For    b
3    8    6 Geeks    c
4    9    7 Gooks    d
   col1     col2     col3     col4
"factor" "factor" "factor" "factor"
[1] "Modified DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks    a
2    7    5   For    b
3    8    6 Geeks    c
4    9    7 Gooks    d
    col1      col2      col3      col4
"numeric" "numeric"  "factor"  "factor"