How can I calculate the slope from a linear regression analysis? - r

In R, I am trying to overlay an abline onto a plot, the result of linear regression. I want to create a scatter plot showing
TrainRegRpt$train.data.Price (original price) on the x-axis,
TrainRegress$fitted.values (the projected price that came from the lm model) on the y-axis and draw the line of best fit through the plotted points.
Here is some of my code:
TrainRegress <- lm(PriceBH.df$Price ~ ., data=PriceBH.df, subset = train.rows)
TrainRegRpt <- data.frame(train.data$Price, TrainRegress$fitted.values, TrainRegress$residuals)
x <- as.vector(TrainRegRpt$TrainRegress.fitted.values) # on the x-axis
y <- as.vector(TrainRegRpt$train.data.Price) #on the y-axis
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
abline(x,y)
The scatter plot came out the same:
x <- as.vector(newdf$fv)
y <- as.vector(newdf$p)
p <-as.vector(TrainRegRpt$train.data.Price) # my y-axis in the scatter plot
fv <- as.vector(round(TrainRegRpt$TrainRegress.fitted.values,2) # my y-axis in the scatter plot
newdf<- dfrm <- data.frame(p,fv)
plot(newdf$p ~ newdf$fv)
abline(x,y)
summary(TrainRegress)
The following is the summary of TrainRegress: Coefficients obtained from the summary of TrainRegress:
Intercept Estimate
................30.318
CRIM.........0.245
CHAS......5.8368
RM..........8.4846
I extracted the y-intercept as follows:
y.interceptval <-summary(TrainRegress)$coefficients[1]
I will use y.interceptval in the abline(y.interceptval,***?slope***) but I need to know how to calculate the slope. How do I calculate the slope to pass to abline(y.interceptval, slope)?
I have 5 textbooks here that are no help and my professor refuses to help me and I really want this to be perfect!
Thank you!!!
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)<br>
abline(x,y)

It looks like you already calculated your slope. The slopes from a linear regression analysis using lm() are the coefficients. So, in this case, 30.318 is your Y-intercept.
This gives you a regression equation of:
Y = 30.318 + 0.245*(CRIM) + 5.8368*(CHAS) + 8.4846*(RM)
The numbers 0.245, 5.8368, and 8.4846 are the coefficients for each variable and they are also the individual slopes.
Also, one thing about your fitted vs reesiduals plot, it looks like you reversed the way abline() is supposed to be (i.e. instead of abline(x,y) it should be abline(y,x).
Edit You used abline(x,y) but your plotted data are
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
(train.data.Price vs. Fitted Values not x vs y).

Related

How to convert the y-axis of a plot from log(y) to y

I'm an R newbie. I want to estimate a regression of log(CONSUMPTION) on INCOME and then make a plot of CONSUMPTION and INCOME.
I can run the following regression and plot the results.
results <- lm(I(log(CONSUMPTION)) ~ INCOME, data=dataset)
effect_plot(results, pred=INCOME)
If I do this, I get log(CONSUMPTION) on the vertical axis rather than CONSUMPTION.
How can I get a plot with CONSUMPTION on the vertical axis?
Another way to ask the question is how do I convert the y-axis of a plot from log(y) to y? While my question is for the function effect_plot(), I would be happy with any plot function.
Thanks for any help you can give me.
Thank you for the responses. I was able to figure out a workaround using Poisson regression:
results1 <- glm(CONSUMPTION ~ INCOME+WEALTH, family=poisson, data=Consumption )
effect_plot(results1,pred=INCOME,data=Consumption)
This allows me to identify the effect of one variable (INCOME) even when the regression has more than one explanatory variable (INCOME+WEALTH), and plots the estimated effect with CONSUMPTION on the vertical axis rather than ln(CONSUMPTION), with INCOME on the horizontal axis.
The associated estimates are virtually identical to what I would get from the log-linear regression:
results2 <- lm(I(log(CONSUMPTION)) ~ INCOME+WEALTH, data=Consumption )
I appreciate you for taking the time to help me with my problem.

plot multiple ROC curves for logistic regression model in R

I have a logistic regression model (using R) as
fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial)
summary(fit6)
I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6.
I have approached this way to plots one ROC.
prob6=predict(fit6,type=c("response"))
records$prob6 = prob6
g6 <- roc(survived~prob6, data=records)
plot(g6)
But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too.
You can use the add = TRUE argument the plot function to plot multiple ROC curves.
Make up some fake data
library(pROC)
a=rbinom(100, 1, 0.25)
b=runif(100)
c=rnorm(100)
Get model fits
fit1=glm(a~b+c, family='binomial')
fit2=glm(a~c, family='binomial')
Predict on the same data you trained the model with (or hold some out to test on if you want)
preds=predict(fit1)
roc1=roc(a ~ preds)
preds2=predict(fit2)
roc2=roc(a ~ preds2)
Plot it up.
plot(roc1)
plot(roc2, add=TRUE, col='red')
This produces the different fits on the same plot. You can get the AUC of the ROC curve by roc1$auc, and can add it either using the text() function in base R plotting, or perhaps just toss it in the legend.
I don't know how to quantify confidence intervals...or if that is even a thing you can do with ROC curves. Someone else will have to fill in the details on that one. Sorry. Hopefully the rest helped though.

ggplot - stat_smooth - linear model on log-transformed data - plotting on non-log scale

I did a lm on log-transformed data, and plotted it with ggplot :
myplot <- myplot + stat_smooth(method="lm", formula=y~x)
Here is my figure:
So I'm happy with that, but now I want to come back on my un-logged data and plot it. Here is my figure:
My question is: How can I add my model to this figure? Because my model is a linear regression on log-transformed data, but now I'd like to plot it on my non log-transformed diagram.
Thanks in advance to those who can help me.
You could un log your predicted values from the model output by raising to power 10.
i.e.
10^(y)
this would transform your predicted values back to actual data as opposed to the log equivalent. you can then plot this new back-transformed data

Plot and report X intercept from linear regression - R

I am using lm in r for linear regression. I would like to plot and report the x intercept. I know that I could use algebra and solve for x by setting y = 0, but is there a way to have r report it to me? Also, how can I 'tell' r to plot the x intercept? Would this just entail extending the x axis range to include it? Thanks.
# example r code
plot(y~x)
fit <- lm(y~x)
abline(fit)
If you want to plot the x-intercept, extend the plot as you said. You might need to extend it in both the x and y dimensions (use xlim=c(0,100) and ylim=c(0,100) or whatever), and you should note that R does not plot lines for the axes. I supposed you can add them in manually with hline and vline if you want.
To get the numerical value of the x-intercept, you'll have to do algebra.
> coef(fit)
(Intercept) x
0.8671534 0.4095524
Gives the y-intercept and the slope, and you can easily find the x-intercept from there.

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

Resources