Plot and report X intercept from linear regression - R - r

I am using lm in r for linear regression. I would like to plot and report the x intercept. I know that I could use algebra and solve for x by setting y = 0, but is there a way to have r report it to me? Also, how can I 'tell' r to plot the x intercept? Would this just entail extending the x axis range to include it? Thanks.
# example r code
plot(y~x)
fit <- lm(y~x)
abline(fit)

If you want to plot the x-intercept, extend the plot as you said. You might need to extend it in both the x and y dimensions (use xlim=c(0,100) and ylim=c(0,100) or whatever), and you should note that R does not plot lines for the axes. I supposed you can add them in manually with hline and vline if you want.
To get the numerical value of the x-intercept, you'll have to do algebra.
> coef(fit)
(Intercept) x
0.8671534 0.4095524
Gives the y-intercept and the slope, and you can easily find the x-intercept from there.

Related

How to convert the y-axis of a plot from log(y) to y

I'm an R newbie. I want to estimate a regression of log(CONSUMPTION) on INCOME and then make a plot of CONSUMPTION and INCOME.
I can run the following regression and plot the results.
results <- lm(I(log(CONSUMPTION)) ~ INCOME, data=dataset)
effect_plot(results, pred=INCOME)
If I do this, I get log(CONSUMPTION) on the vertical axis rather than CONSUMPTION.
How can I get a plot with CONSUMPTION on the vertical axis?
Another way to ask the question is how do I convert the y-axis of a plot from log(y) to y? While my question is for the function effect_plot(), I would be happy with any plot function.
Thanks for any help you can give me.
Thank you for the responses. I was able to figure out a workaround using Poisson regression:
results1 <- glm(CONSUMPTION ~ INCOME+WEALTH, family=poisson, data=Consumption )
effect_plot(results1,pred=INCOME,data=Consumption)
This allows me to identify the effect of one variable (INCOME) even when the regression has more than one explanatory variable (INCOME+WEALTH), and plots the estimated effect with CONSUMPTION on the vertical axis rather than ln(CONSUMPTION), with INCOME on the horizontal axis.
The associated estimates are virtually identical to what I would get from the log-linear regression:
results2 <- lm(I(log(CONSUMPTION)) ~ INCOME+WEALTH, data=Consumption )
I appreciate you for taking the time to help me with my problem.

How can I calculate the slope from a linear regression analysis?

In R, I am trying to overlay an abline onto a plot, the result of linear regression. I want to create a scatter plot showing
TrainRegRpt$train.data.Price (original price) on the x-axis,
TrainRegress$fitted.values (the projected price that came from the lm model) on the y-axis and draw the line of best fit through the plotted points.
Here is some of my code:
TrainRegress <- lm(PriceBH.df$Price ~ ., data=PriceBH.df, subset = train.rows)
TrainRegRpt <- data.frame(train.data$Price, TrainRegress$fitted.values, TrainRegress$residuals)
x <- as.vector(TrainRegRpt$TrainRegress.fitted.values) # on the x-axis
y <- as.vector(TrainRegRpt$train.data.Price) #on the y-axis
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
abline(x,y)
The scatter plot came out the same:
x <- as.vector(newdf$fv)
y <- as.vector(newdf$p)
p <-as.vector(TrainRegRpt$train.data.Price) # my y-axis in the scatter plot
fv <- as.vector(round(TrainRegRpt$TrainRegress.fitted.values,2) # my y-axis in the scatter plot
newdf<- dfrm <- data.frame(p,fv)
plot(newdf$p ~ newdf$fv)
abline(x,y)
summary(TrainRegress)
The following is the summary of TrainRegress: Coefficients obtained from the summary of TrainRegress:
Intercept Estimate
................30.318
CRIM.........0.245
CHAS......5.8368
RM..........8.4846
I extracted the y-intercept as follows:
y.interceptval <-summary(TrainRegress)$coefficients[1]
I will use y.interceptval in the abline(y.interceptval,***?slope***) but I need to know how to calculate the slope. How do I calculate the slope to pass to abline(y.interceptval, slope)?
I have 5 textbooks here that are no help and my professor refuses to help me and I really want this to be perfect!
Thank you!!!
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)<br>
abline(x,y)
It looks like you already calculated your slope. The slopes from a linear regression analysis using lm() are the coefficients. So, in this case, 30.318 is your Y-intercept.
This gives you a regression equation of:
Y = 30.318 + 0.245*(CRIM) + 5.8368*(CHAS) + 8.4846*(RM)
The numbers 0.245, 5.8368, and 8.4846 are the coefficients for each variable and they are also the individual slopes.
Also, one thing about your fitted vs reesiduals plot, it looks like you reversed the way abline() is supposed to be (i.e. instead of abline(x,y) it should be abline(y,x).
Edit You used abline(x,y) but your plotted data are
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
(train.data.Price vs. Fitted Values not x vs y).

R: Fit curve to points: what linear/non-linear model to use?

I have a data which should follow the power law distribution.
x = distance
y = %
I want to create a model and to add the fitted line to my plot.
My aim to recreate something like this:
As author uses R-square; I assume they applied linear models, as R^2 is not suitable for non-linear models. http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression
However, I can't find out how to "curve" my line to the points; how to add the formula y ~ a*x^(-b) to my model.
Instead of curly line I got back the line as from the simple linear regression.
My questions are:
Do I correctly assume the model y ~ a*x^(-b) used by author is linear?
what type of model to use to recreate my example: lm, glm, nls, etc. ?
I generated the dummy data, including the applied power law formula from the plot above:
set.seed(42)
scatt<-runif(10)
x<-seq(1, 1000, 100)
b = 1.8411
a = 133093
y = a*x^(-b) + scatt # add some variability in my dependent variable
plot(y ~ x)
and tried to create a glm model.
# formula for non-linear model
m<-m.glm<-glm(y ~ x^2, data = dat) #
# add predicted line to plot
lines(x,predict(m),col="red",lty=2,lwd=3)
This is my first time to model, so I am really confused and I don't know where to start... thank you for any suggestion or directions, I really appreciate it...
I personally think this question a dupe of this: `nls` fails to estimate parameters of my model but I would be cold-blooded if I close it (as OP put a bounty). Anyway, bounty question can not be closed.
So the best I could think of, is to post a community wiki answer (I don't want to get this bounty).
As you want to fit a model of this form y ~ a*x^(-b), it often benefit from taking log transform on both sides and fit a linear model log(y) ~ log(x).
fit <- lm(log(y) ~ log(x))
As you have already known how to use curve to plot regression curve and are happy with it, I will now show how to make plot.
Some people call this log-log regression. Here are some other links I have for such kind of regression:
How to predict a new value using simple linear regression log(y)=b0+b1*log(x)
How to plot confidence bands for my weighted log-log linear regression?
m <- lm(log(y) ~ log(x), data=dat)
a <- exp(intercept)
b <- -exp(slope)
plot(y ~ x, type="p", lty=3)
lines(x, exp(predict(m)), col="blue", lty=2, lwd=3)

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

fitting a distribution graphically

I am running some tests to try and determine what distribution my data follows. By the look of the density of my data I thought it looked a bit like a logistic distribution. I than used the package MASS to estimate the parameters of the distribution. However when I graph them together although better than the normal, the logistic is still not very good..Is there a way to find what distribution would go better? Thank you for the help !
library(quantmod)
getSymbols("^NDX",src="yahoo", from='1997-6-01', to='2012-6-01')
daily<- allReturns(NDX) [,c('daily')]
dailySerieTemporel<-ts(data=daily)
x<-na.omit(dailySerieTemporel)
library(MASS)
(xFit<-fitdistr(x,"logistic"))
# location scale
# 0.0005210570 0.0106366354
# (0.0002941922) (0.0001444678)
xFitEst<-coef(xFit)
plot(density(x))
set.seed(125)
lines(density(rlogis(length(x), xFitEst['location'], xFitEst['scale'])), col=3)
lines(density(rnorm(length(x), mean(x), sd(x))), col=2)
This is elementary R: plot() creates a new plotting canvas by default, and you should use a command such as lines() to add to an existing plot.
This works for your example:
plot(density(x))
lines(density(rlogis(length(x), location = 0.0005210570,
scale = 0.0106366354)), col="blue")
as it adds the estimated logistic fit in blue to your existing plot.

Resources