R语言 正则化
正则化是回归技术的一种形式,它将系数估计值向0(或零)收缩或正则化或约束。在这种技术中,为了减少给定模型的自由度,在模型的各种参数中加入了惩罚。正则化的概念可以大致分为以下几类。
- 岭回归
- 拉索回归
- 弹性网回归
R语言 的实现
在R语言中,为了执行正则化,在我们开始工作之前,我们需要安装一些软件包。所需的包是
- 用于岭回归和拉索回归的 glmnet 包
- 用于数据清理的 dplyr 包
- psych 包,用于执行或计算矩阵的跟踪函数
- caret 包
为了安装这些包,我们必须使用R控制台中的 install.packages() 。在成功安装软件包后,我们使用 library() 命令将这些软件包纳入我们的R脚本中。为了实现正则化回归技术,我们需要遵循三种类型的正则化技术中的一种。
岭回归
岭回归 是线性回归的一个修正版本,也被称为 L2正则化。 与线性回归不同,损失函数被修改,以使模型的复杂性最小化,这是通过增加一些惩罚参数来实现的,这些参数相当于系数的平方值或大小。基本上,为了在R中实现Ridge回归,我们将使用 “glmnet “包。 cv.glmnet() 函数将被用来确定岭回归的结果。
例子
在这个例子中,为了更好地说明问题,我们将在 MTCars 数据集上实现山脊回归技术。我们的任务是根据汽车的其他特征来预测每加仑英里数。我们将使用 set.seed() 函数来设置种子,以实现可重复性。我们将通过三种方式设置lambda的值。
- 通过执行10倍交叉验证
- 基于得出的信息
- 基于两个标准的最佳lambda
# Regularization
# Ridge Regression in R
# Load libraries, get data & set
# seed for reproducibility
set.seed(123)
library(glmnet)
library(dplyr)
library(psych)
data("mtcars")
# Center y, X will be standardized
# in the modelling function
y <- mtcars %>% select(mpg) %>%
scale(center = TRUE, scale = FALSE) %>%
as.matrix()
X <- mtcars %>% select(-mpg) %>% as.matrix()
# Perform 10-fold cross-validation to select lambda
lambdas_to_try <- 10^seq(-3, 5, length.out = 100)
# Setting alpha = 0 implements ridge regression
ridge_cv <- cv.glmnet(X, y, alpha = 0,
lambda = lambdas_to_try,
standardize = TRUE, nfolds = 10)
# Plot cross-validation results
plot(ridge_cv)
# Best cross-validated lambda
lambda_cv <- ridge_cv$lambda.min
# Fit final model, get its sum of squared
# residuals and multiple R-squared
model_cv <- glmnet(X, y, alpha = 0, lambda = lambda_cv,
standardize = TRUE)
y_hat_cv <- predict(model_cv, X)
ssr_cv <- t(y - y_hat_cv) %*% (y - y_hat_cv)
rsq_ridge_cv <- cor(y, y_hat_cv)^2
# selecting lambda based on the information
X_scaled <- scale(X)
aic <- c()
bic <- c()
for (lambda in seq(lambdas_to_try))
{
# Run model
model <- glmnet(X, y, alpha = 0,
lambda = lambdas_to_try[lambda],
standardize = TRUE)
# Extract coefficients and residuals (remove first
# row for the intercept)
betas <- as.vector((as.matrix(coef(model))[-1, ]))
resid <- y - (X_scaled %*% betas)
# Compute hat-matrix and degrees of freedom
ld <- lambdas_to_try[lambda] * diag(ncol(X_scaled))
H <- X_scaled %*% solve(t(X_scaled) %*% X_scaled + ld)
%*% t(X_scaled)
df <- tr(H)
# Compute information criteria
aic[lambda] <- nrow(X_scaled) * log(t(resid) %*% resid)
+ 2 * df
bic[lambda] <- nrow(X_scaled) * log(t(resid) %*% resid)
+ 2 * df * log(nrow(X_scaled))
}
# Plot information criteria against tried values of lambdas
plot(log(lambdas_to_try), aic, col = "orange", type = "l",
ylim = c(190, 260), ylab = "Information Criterion")
lines(log(lambdas_to_try), bic, col = "skyblue3")
legend("bottomright", lwd = 1, col = c("orange", "skyblue3"),
legend = c("AIC", "BIC"))
# Optimal lambdas according to both criteria
lambda_aic <- lambdas_to_try[which.min(aic)]
lambda_bic <- lambdas_to_try[which.min(bic)]
# Fit final models, get their sum of
# squared residuals and multiple R-squared
model_aic <- glmnet(X, y, alpha = 0, lambda = lambda_aic,
standardize = TRUE)
y_hat_aic <- predict(model_aic, X)
ssr_aic <- t(y - y_hat_aic) %*% (y - y_hat_aic)
rsq_ridge_aic <- cor(y, y_hat_aic)^2
model_bic <- glmnet(X, y, alpha = 0, lambda = lambda_bic,
standardize = TRUE)
y_hat_bic <- predict(model_bic, X)
ssr_bic <- t(y - y_hat_bic) %*% (y - y_hat_bic)
rsq_ridge_bic <- cor(y, y_hat_bic)^2
# The higher the lambda, the more the
# coefficients are shrinked towards zero.
res <- glmnet(X, y, alpha = 0, lambda = lambdas_to_try,
standardize = FALSE)
plot(res, xvar = "lambda")
legend("bottomright", lwd = 1, col = 1:6,
legend = colnames(X), cex = .7)
输出
拉索回归
接下来是 拉索回归。 它也被称为 L1回归、选择运算器 和最小绝对收缩。 它也是线性回归的一个修改版本,其中损失函数再次被修改,以最小化模型的复杂性。这是通过限制模型的系数的绝对值的总和来实现的。在R中,我们可以使用与岭回归相同的 “glmnet “包来实现套索回归。
例子
在这个例子中,我们再次使用 MTCars 数据集。这里我们也要像前面的例子一样设置lambda值。
# Regularization
# Lasso Regression
# Load libraries, get data & set
# seed for reproducibility
set.seed(123)
library(glmnet)
library(dplyr)
library(psych)
data("mtcars")
# Center y, X will be standardized in the modelling function
y <- mtcars %>% select(mpg) %>% scale(center = TRUE,
scale = FALSE) %>%
as.matrix()
X <- mtcars %>% select(-mpg) %>% as.matrix()
# Perform 10-fold cross-validation to select lambda
lambdas_to_try <- 10^seq(-3, 5, length.out = 100)
# Setting alpha = 1 implements lasso regression
lasso_cv <- cv.glmnet(X, y, alpha = 1,
lambda = lambdas_to_try,
standardize = TRUE, nfolds = 10)
# Plot cross-validation results
plot(lasso_cv)
# Best cross-validated lambda
lambda_cv <- lasso_cv$lambda.min
# Fit final model, get its sum of squared
# residuals and multiple R-squared
model_cv <- glmnet(X, y, alpha = 1, lambda = lambda_cv,
standardize = TRUE)
y_hat_cv <- predict(model_cv, X)
ssr_cv <- t(y - y_hat_cv) %*% (y - y_hat_cv)
rsq_lasso_cv <- cor(y, y_hat_cv)^2
# The higher the lambda, the more the
# coefficients are shrinked towards zero.
res <- glmnet(X, y, alpha = 1, lambda = lambdas_to_try,
standardize = FALSE)
plot(res, xvar = "lambda")
legend("bottomright", lwd = 1, col = 1:6,
legend = colnames(X), cex = .7)
输出
如果我们比较Lasso和Ridge回归技术,我们会注意到这两种技术或多或少都是一样的。但它们之间有一些不同的特点。
- 与Ridge不同,Lasso可以将它的一些参数设置为零。
- 在Ridge中,与之相关的预测因子的系数是相似的。而在套索中,只有一个预测因子的系数较大,其余的都趋于零。
- 如果存在许多巨大的或大的参数,而且这些参数的值是相同的,那么山脊法的效果很好。而套索法在只有少数明确或重要的参数,其余参数趋于零的情况下效果良好。
弹性网回归
现在我们将继续讨论 弹性网回归。 弹性网回归可以说是套索和岭回归的凸组合。我们甚至可以在这里使用 glmnet 包。但现在我们将看到如何使用 caret 包来实现Elastic Net回归。
例子
# Regularization
# Elastic Net Regression
library(caret)
# Set training control
train_control <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5,
search = "random",
verboseIter = TRUE)
# Train the model
elastic_net_model <- train(mpg ~ .,
data = cbind(y, X),
method = "glmnet",
preProcess = c("center", "scale"),
tuneLength = 25,
trControl = train_control)
# Check multiple R-squared
y_hat_enet <- predict(elastic_net_model, X)
rsq_enet <- cor(y, y_hat_enet)^2
print(y_hat_enet)
print(rsq_enet)
输出
> print(y_hat_enet)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
2.13185747 1.76214273 6.07598463 0.50410531 -3.15668592 0.08734383
Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
-5.23690809 2.82725225 2.85570982 -0.19421572 -0.16329225 -4.37306992
Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128
-3.83132657 -3.88886320 -8.00151118 -8.29125966 -8.08243188 6.98344302
Honda Civic Toyota Corolla Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
8.30013895 7.74742320 3.93737683 -3.13404917 -2.56900144 -5.17326892
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino
-4.02993835 7.36692700 5.87750517 6.69642869 -2.02711333 0.06597788
Maserati Bora Volvo 142E
-5.90030273 4.83362156
> print(rsq_enet)
[,1]
mpg 0.8485501