R语言曲线拟合

在这篇文章中，我们将讨论如何在R编程语言中对一个数据框架进行曲线拟合。

曲线拟合是统计分析的基本功能之一。它帮助我们确定趋势和数据，并帮助我们在回归模型/函数的基础上预测未知数据。

数据框架的可视化

为了在R语言中对一些数据框架进行曲线拟合，我们首先要在基本散点图的帮助下实现数据的可视化。在R语言中，我们可以通过使用plot()函数创建一个基本的散点图。

语法

plot( df $x, df$ y)

其中。

df： 决定了要使用的数据框架。
x和y： 决定了轴上的变量。

例子

# create sample data
sample_data <- data.frame(x=1:10,
                 y=c(25, 22, 13, 10, 5, 
                     9, 12, 16, 34, 44))
  
#create a basic scatterplot 
plot(sample_data $x, sample_data$ y)

输出

R语言中的曲线拟合

创建几条曲线来拟合数据

然后，我们创建所需程度的线性回归模型，并将其绘制在散点图之上，看哪个模型更适合数据。我们使用lm()函数来创建一个线性模型。然后使用lines()函数，用这些线性模型在散点图的顶部绘制直线图。

语法

lm(  function, data)

其中。

function： 决定了拟合的多项式函数。
data： 决定拟合函数的数据框架。

例子

# create sample data
sample_data <- data.frame(x=1:10,
                 y=c(25, 22, 13, 10, 5, 
                     9, 12, 16, 34, 44))
  
# fit polynomial regression models up to degree 5
linear_model1 <- lm(y~x, data=sample_data)
linear_model2 <- lm(y~poly(x,2,raw=TRUE), data=sample_data)
linear_model3 <- lm(y~poly(x,3,raw=TRUE), data=sample_data)
linear_model4 <- lm(y~poly(x,4,raw=TRUE), data=sample_data)
linear_model5 <- lm(y~poly(x,5,raw=TRUE), data=sample_data)
  
# create a basic scatterplot 
plot(sample_data $x, sample_data$ y)
  
# define x-axis values
x_axis <- seq(1, 10, length=10)
  
# add curve of each model to plot
lines(x_axis, predict(linear_model1, data.frame(x=x_axis)), col='green')
lines(x_axis, predict(linear_model2, data.frame(x=x_axis)), col='red')
lines(x_axis, predict(linear_model3, data.frame(x=x_axis)), col='purple')
lines(x_axis, predict(linear_model4, data.frame(x=x_axis)), col='blue')
lines(x_axis, predict(linear_model5, data.frame(x=x_axis)), col='orange')

输出

R语言中的曲线拟合

最佳拟合曲线与调整后的r平方值

现在，由于我们不能仅仅通过视觉表现来确定更好的拟合模型，我们有一个总结性的变量r.squared，这有助于我们确定最佳拟合模型。调整后的r平方是减去模型的误差后，Y的方差完整的百分比。R平方值越大，该数据框架的模型就越好。为了得到线性模型的调整后的r平方值，我们使用summary()函数，其中包含调整后的r平方值作为变量adj.r.squared。

语法

summary( linear_model )$adj.r.squared

其中。

linear_model： 决定了要提取其摘要的线性模型。

例子

# create sample data
sample_data <- data.frame(x=1:10,
                 y=c(25, 22, 13, 10, 5, 
                     9, 12, 16, 34, 44))
  
# fit polynomial regression models up to degree 5
linear_model1 <- lm(y~x, data=sample_data)
linear_model2 <- lm(y~poly(x,2,raw=TRUE), data=sample_data)
linear_model3 <- lm(y~poly(x,3,raw=TRUE), data=sample_data)
linear_model4 <- lm(y~poly(x,4,raw=TRUE), data=sample_data)
linear_model5 <- lm(y~poly(x,5,raw=TRUE), data=sample_data)
  
# calculated adjusted R-squared of each model
summary(linear_model1) $adj.r.squared summary(linear_model2)$ adj.r.squared
summary(linear_model3) $adj.r.squared summary(linear_model4)$ adj.r.squared
summary(linear_model5)$adj.r.squared

输出

[1] 0.07066085
[2] 0.9406243
[3] 0.9527703
[4] 0.955868
[5] 0.9448878

用数据框显示最佳拟合曲线

现在，从上面的总结中，我们知道四度线性模型最适合曲线，调整后的r平方值为0.955868。因此，我们将用散点图来可视化四度线性模型，这就是数据框的最佳拟合曲线。

例子

# create sample data
sample_data <- data.frame(x=1:10,
                          y=c(25, 22, 13, 10, 5,
                              9, 12, 16, 34, 44))
  
# Create best linear model
best_model <- lm(y~poly(x,4,raw=TRUE), data=sample_data)
  
# create a basic scatterplot 
plot(sample_data $x, sample_data$ y)
  
# define x-axis values
x_axis <- seq(1, 10, length=10)
  
# plot best model
lines(x_axis, predict(best_model, data.frame(x=x_axis)), col='green')

输出

R语言中的曲线拟合