How to plot quadratic regression in R? - r

The following code generates a qudaratic regression in R.
lm.out3 = lm(listOfDataFrames1$avgTime ~ listOfDataFrames1$betaexit + I(listOfDataFrames1$betaexit^2) + I(listOfDataFrames1$betaexit^3))
summary(lm.out3)
Call:
lm(formula = listOfDataFrames1$avgTime ~ listOfDataFrames1$betaexit +
I(listOfDataFrames1$betaexit^2) + I(listOfDataFrames1$betaexit^3))
Residuals:
Min 1Q Median 3Q Max
-14.168 -2.923 -1.435 2.459 28.429
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 199.41 11.13 17.913 < 2e-16 ***
listOfDataFrames1$betaexit -3982.03 449.49 -8.859 1.14e-12 ***
I(listOfDataFrames1$betaexit^2) 32630.86 5370.27 6.076 7.87e-08 ***
I(listOfDataFrames1$betaexit^3) -93042.90 19521.05 -4.766 1.15e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.254 on 63 degrees of freedom
Multiple R-squared: 0.9302, Adjusted R-squared: 0.9269
F-statistic: 279.8 on 3 and 63 DF, p-value: < 2.2e-16
But how to do I plot the curve on the graph am confused.
To get graph:
plot(listOfDataFrames1$avgTime~listOfDataFrames1$betaexit)
But curve?
Is there any to do it without manually copying the values?
Like mso suggested though it works.

This should work.
# not tested
lm.out3 = lm(avgTime ~ poly(betaexit,3,raw=TRUE),listofDataFrames3)
plot(avgTime~betaexit,listofDataDFrames3)
curve(predict(lm.out3,newdata=data.frame(betaexit=x)),add=T)
Since you didn't provide any data, here is a working example using the built-in mtcars dataset.
fit <- lm(mpg~poly(wt,3,raw=TRUE),mtcars)
plot(mpg~wt,mtcars)
curve(predict(fit,newdata=data.frame(wt=x)),add=T)
Some notes:
(1) It is a really bad idea to reference external data structures in the formula=... argument to lm(...). Instead, reference columns of a data frame referenced in the data=... argumennt, as above and as #mso points out.
(2) You can specify the formula as #mso suggests, or you can use the poly(...) function with raw=TRUE.
(3) The curve(...) function takes an expression as its first argument, This expression has to have a variable x, which will be populated automatically by values from the x-axis of the graph. So in this example, the expression is:
predict(fit,newdata=data.frame(wt=x))
which uses predict(...) on the model with a dataframe having wt (the predictor variable) given by x.

Try with ggplot:
library(ggplot)
ggplot(listOfDataFrames1, aes(x=betaexit, y=avgTime)) + geom_point()+stat_smooth(se=F)
Using mtcars data:
ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()+stat_smooth(se=F, method='lm', formula=y~poly(x,3))

Try:
with(listOfDataFrames1, plot(betaexit, avgTime))
with(listOfDataFrames1, lines(betaexit, 199-3982*betaexit+32630*betaexit^2-93042*betaexit^3))

Related

Creating a Line of Best Fit in R

Hi, I was wondering if anyone could help me on how to complete function in R as I am totally new to this computer programming. So apologies if this seems like a silly question to this audience.
So I am currently trying to add a line of best fit onto my scatter graph but I'm not quite understanging how to do this. I've tried many functions like "abline" and "lm" but I'm not sure if I am using the right ideas or whether I am putting incorrect numbers into the functions.
I have cleared the workspace and just left my graph and previous workings so that it looks neater.
thanks in advance for the help.
reproducible data example:
set.seed(2348907)
x <- rnorm(100)
y <- 2*(x+rnorm(100))
then this makes a linear model for the intercept and slope:
lmodel <- lm(y ~ x)
which now contains the intercept (Intercept term) and the slope (as the coefficient of the x variable in the model). Here is a summary of the model:
summary(lmodel)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-3.8412 -1.0767 -0.1808 1.2216 4.1540
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0454 0.1851 0.245 0.807
x 2.1087 0.1814 11.627 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.848 on 98 degrees of freedom
Multiple R-squared: 0.5797, Adjusted R-squared: 0.5754
F-statistic: 135.2 on 1 and 98 DF, p-value: < 2.2e-16
then make the plot using the coef() function to pull out the intercept and slope from the linear model:
plot(x,y) # plots the points
abline(a=coef(lmodel)[1], b=coef(lmodel)[2]) # plots the line, a=intercept, b=slope
personally, I prefer ggplot2 for such things, which would be like this:
library(ggplot2)
ggplot() + geom_point(aes(x,y)) +
geom_smooth(aes(x,y), method="lm", se=F)

Save the result from "summary(lm)" to use in PowerBi

Is it possible to save an summary(lm) object in a format usable in PowerBI?
Lets say the following:
data <- mpg
lm <- lm(hwy ~ displ, data = mpg)
summary(lm)
Output:
Call:
lm(formula = hwy ~ displ, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-7.1039 -2.1646 -0.2242 2.0589 15.0105
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.6977 0.7204 49.55 <2e-16 ***
displ -3.5306 0.1945 -18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.836 on 232 degrees of freedom
Multiple R-squared: 0.5868, Adjusted R-squared: 0.585
F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
I would lake to save this information as gglpot2 object, or picture in general, that I can display in Power BI. So that we can use this as a template for a fast regressions inside Power BI. This since Power BI only can display R code that results in a "plot" and not text.
I have tried:
textplot(capture.output(summary(lm)))
But I first got this error:
>install.packages('textplot')
Warning in install.packages :
package ‘textplot’ is not available (for R version 3.5.3)
And unfortunately Power BI doesn't supports textplot().
EDIT: Clarification I'm not looking to plot a regression line nor plane. I'm looking for a way to save the text output from "summary(lm)" as a plot object that I can display in Power BI.
Try something like this:
fit <- lm(hwy ~ displ, data = mpg)
txt = capture.output(print(summary(fit)))
plot(NULL,xlim=c(-1,1),ylim=c(-1,1),xaxt="n",yaxt="n",bty="n",xlab="",ylab="")
text(x=0,y=0,paste(txt,collapse="\n"))
You might need to look at using stringr::str_pad to make the text prettier.. but this should get you something that works.
This is how you put it on ggplot2:
ggplot() + xlim(c(-1,1)) + ylim(c(-1,1)) +
geom_text(aes(x=0,y=0,label=paste(txt,collapse="\n"))) +
theme_void()
This example should be useful for you. To apply here:
plot(hwy ~ displ, data=mpg)
abline(lm1)
For the ggplot2 solution, this works:
ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(method='lm', formula=y~x)

Adding dummy variables in panel linear model (regression) in R

I have a categorical independent variable (with options of "yes" or "no") that I want to add to my panel linear model. According to the
answer here: After generating dummy variables?, the lm function automatically creates dummy variables for you for categorical variables.
Does this mean that creating dummy variables through i.e. dummy.data.frame is unnecessary, and I can just add in my variable in the plm function and it will automatically be treated like a dummy variable (even if the data is not numerical)? And is this the same for the plm function?
Also, I don't have much data to begin with. Would it hurt if I manually turned the data into numbers (i.e. "yes"=1, "no"=0) without creating a dummy variable?
It is unnecessary to create dummy variables for use with the lm() function. To illustrate, we'll run a regression model on the mtcars data set, using am (0 = automatic, 1 = manual transmission) as a factor variable.
summary(lm(mpg ~ wt + factor(am),data=mtcars))
...and the output:
> summary(lm(mpg ~ wt + factor(am),data=mtcars))
Call:
lm(formula = mpg ~ wt + factor(am), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5295 -2.3619 -0.1317 1.4025 6.8782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.32155 3.05464 12.218 5.84e-13 ***
wt -5.35281 0.78824 -6.791 1.87e-07 ***
factor(am)1 -0.02362 1.54565 -0.015 0.988
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.098 on 29 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
F-statistic: 44.17 on 2 and 29 DF, p-value: 1.579e-09

R - Regression Analysis for Logarthmic

I perform regression analysis and try to find the best fit model for the dataset diamonds.csv in ggplot2. I use price(response variable) vs carat and I perform linear regression, quadratic, and cubic regression. The line is not the best fit. I realize the logarithmic from excel has the best fitting line. However, I couldn't figure out how to code in R to find the logarithmic fitting line. Anyone can help?
Comparing Price vs Carat
model<-lm(price~carat, data = diamonds)
Model 2 uses the polynomial to compare
model2<-lm(price~carat + I(carat^2), data = diamonds)
use cubic in model3
model3 <- lm(price~carat + I(carat^2) + I(carat^3), data = diamonds)
How can I code the log in R to get same result as excel?
y = 0.4299ln(x) - 2.5495
R² = 0.8468
Thanks!
The result you report from excel y = 0.4299ln(x) - 2.5495 does not contain any polynomial or cubic terms. What are you trying to do? price is very skewed and as with say 'income' it is common practice to take the log from that. This also provides the R2 you are referring to, but very different coefficients for the intercept and carat parameter.
m1 <- lm(log(price) ~ carat, data = diamonds)
summary(m1)
Call:
lm(formula = log(price) ~ carat, data = diamonds)
Residuals:
Min 1Q Median 3Q Max
-6.2844 -0.2449 0.0335 0.2578 1.5642
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.215021 0.003348 1856 <2e-16 ***
carat 1.969757 0.003608 546 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3972 on 53938 degrees of freedom
Multiple R-squared: 0.8468, Adjusted R-squared: 0.8468
F-statistic: 2.981e+05 on 1 and 53938 DF, p-value: < 2.2e-16

R: Translate the results from lm() to an equation

I'm using R and I want to translate the results from lm() to an equation.
My model is:
Residuals:
Min 1Q Median 3Q Max
-0.048110 -0.023948 -0.000376 0.024511 0.044190
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.17691 0.00909 349.50 < 2e-16 ***
poly(QPB2_REF1, 2)1 0.64947 0.03015 21.54 2.66e-14 ***
poly(QPB2_REF1, 2)2 0.10824 0.03015 3.59 0.00209 **
B2DBSA_REF1DONSON -0.20959 0.01286 -16.30 3.17e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03015 on 18 degrees of freedom
Multiple R-squared: 0.9763, Adjusted R-squared: 0.9724
F-statistic: 247.6 on 3 and 18 DF, p-value: 8.098e-15
Do you have any idea?
I tried to have something like
f <- function(x) {3.17691 + 0.64947*x +0.10824*x^2 -0.20959*1 + 0.03015^2}
but when I tried to set a x, the f(x) value is incorrect.
Your output indicates that the model includes use of the poly function which be default orthogonalizes the polynomials (includes centering the x's and other things). In your formula there is no orthogonalization done and that is the likely difference. You can refit the model using raw=TRUE in the call to poly to get the raw coefficients that can be multiplied by $x$ and $x^2$.
You may also be interested in the Function function in the rms package which automates creating functions from fitted models.
Edit
Here is an example:
library(rms)
xx <- 1:25
yy <- 5 - 1.5*xx + 0.1*xx^2 + rnorm(25)
plot(xx,yy)
fit <- ols(yy ~ pol(xx,2))
mypred <- Function(fit)
curve(mypred, add=TRUE)
mypred( c(1,25, 3, 3.5))
You need to use the rms functions for fitting (ols and pol for this example instead of lm and poly).
If you want to calculate y-hat based on the model, you can just use predict!
Example:
set.seed(123)
my_dat <- data.frame(x=1:10, e=rnorm(10))
my_dat$y <- with(my_dat, x*2 + e)
my_lm <- lm(y~x, data=my_dat)
summary(my_lm)
Result:
Call:
lm(formula = y ~ x, data = my_dat)
Residuals:
Min 1Q Median 3Q Max
-1.1348 -0.5624 -0.1393 0.3854 1.6814
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5255 0.6673 0.787 0.454
x 1.9180 0.1075 17.835 1e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9768 on 8 degrees of freedom
Multiple R-squared: 0.9755, Adjusted R-squared: 0.9724
F-statistic: 318.1 on 1 and 8 DF, p-value: 1e-07
Now, instead of making a function like 0.5255 + x * 1.9180 manually, I just call predict for my_lm:
predict(my_lm, data.frame(x=11:20))
Same result as this (not counting minor errors from rounding the slope/intercept estimates):
0.5255 + (11:20) * 1.9180
If you are looking for actually visualizing or writing out a complex equation (e.g. something that has restricted cubic spline transformations), I recommend using the rms package, fitting your model, and using the latex function to see it in latex
my_lm <- ols(y~x, data=my_dat)
latex(my_lm)
Note you will need to render the latex code so as to see your equation. There are websites and, if you are using a Mac, Mac Tex software, that will render it for you.

Resources