I am trying to find a way to get a scatterplot in R of actual values vs. regressed values. Example:
fit = lm(y ~ a + x + z)
I get the results y ~ 2*a + 3*x - 7*z + 4
Now how do I make a scatterplot plotting y against 2*a + 3*x - 7*z + 4? As well as creating a trendline.
(And, by the way, I tried the plot() function. It didn't seem to have what I need)
Look at plot(fit), or the help for lm, which you can access using ?lm.
From your question, it sounds like you want to plot your actual values against the fitted values. There is a plot method for lm which does this out of the box.
You could always build it yourself, say in ggplot2, by accessing the fitted values. Check out your object using str(fit) for all of the data that captured during the regression.
Related
Is there any simple command to show the geom_smooth equation of a non-linear relationship? Something as simple as "show. equation". The equation has to be somewhere, I just want to call the equation used by default.
ggplot(dataset, aes(x=variablex, y=variabley)) +
geom_point()+
geom_smooth()+
theme_bw()
If you look at the documentation for geom_smooth and stat_smooth you can see that it uses stats::loess for small data sets (1,000 observations) and mgcv::gam otherwise:
For method = NULL the smoothing method is chosen based on the size of
the largest group (across all panels). stats::loess() is used for less
than 1,000 observations; otherwise mgcv::gam() is used with formula = y ~ s(x, bs = "cs") with method = "REML". Somewhat anecdotally, loess
gives a better appearance, but is 𝑂(𝑁2) in memory, so does not work
for larger datasets.
So if you want to use the model implied by the geom_smooth fit, you could just call the underlying method (e.g. stats::loess(variabley ~ variablex, data = dataset)) and then use the predict method to calculate values for new data.
I'm trying to understanding polynomial fitting with R. From my research on the internet, there apparently seems to be two methods. Assuming I want to fit a cubic curve ax^3 + bx^2 + cx + d into some dataset, I can either use:
lm(dataset, formula = y ~ poly(x, 3))
or
lm(dataset, formula = y ~ x + I(x^2) + I(x^3))
However, as I try them in R, I ended up with two different curves with complete different intercepts and coefficients. Is there anything about polynomial I'm not getting right here?
This comes down to what the different functions do. poly generates orthonormal polynomials. Compare the values of poly(dataset$x, 3) to I(dataset$x^3). Your coefficients will be different because the values being passed directly into the linear model (as opposed to indirectly, through either the I or poly function) are different.
As 42 pointed out, your predicted values will be fairly similar. If a is your first linear model and b is your second, b$fitted.values - a$fitted.value should be fairly close to 0 at all points.
I got it now. There seems to be a difference between R computation of raw polynomial vs orthogonal polynomial. Thanks, everyone for the help.
I am a chemical engineer and very new to R. I am attempting to build a tool in R (and eventually a shiny app) for analysis of phase boundaries. Using a simulation I get output that shows two curves which can be well represented by a 4th order polynomial. The data is as follows:
https://i.stack.imgur.com/8Oa0C.jpg
The procedure I have to follow uses the difference between the two curves to produce a second. In order to compare the curves, the data has to increase as a function of pressure in set increments, for example of 0.2 As can be seen, the data from the simulation is not incremental and there is no way to compare the curves based on the output.
To resolve this, in excel I carried out the following steps on each curve:
I plotted the data with pressure on the x axis and temperature on the y axis
Found the line of best fit using a 4th order polynomial
Used the equation of the curve to calculate the temperature at set increments of pressure
From this, I was able to compare the curves mathematically and produce the required output.
Does anyone have any suggestions how to carry this out in R, or if there is a more statistical or simplified approach that I have missed(extracting bezier curve points etc)?
As a bit of further detail, I have taken the data and merged it using tidyr so that the graphs (4 in total) are displayed in just three columns, the graph title, temperature and pressure. I did this after following a course on ggplot2 on Datacamp, but not sure if this format is suitable when carrying out regression etc? The head of my dataset can be seen here:
https://i.stack.imgur.com/WeaPz.jpg
I am very new to R, so apologies if this is a stupid question and I am using the wrong terms.
Though I agree with #Jaap's comment, polynomial regression is very easy in R. I'll give you the first line:
x <- c(0.26,3.33,5.25,6.54,7.38,8.1,8.73,9.3,9.81,10.28,10.69,11.08,11.43,11.75,12.05,12.33)
y <- c(16.33,24.6,31.98,38.38,43.3,48.18,53.08,57.99,62.92,67.86,72.81,77.77,82.75,87.75,92.77,97.81)
lm <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4))
Now your polynomial coefficients are in lm$coef, you can extract them and easily plot the fitted line, e.g.:
coefs <- lm$coef
plot(x, y)
lines(x, coefs[1] + coefs[2] * x + coefs[3] * x^2 + coefs[4] * x^3 + coefs[5] * x^4)
The fitted values are also simply given using lm$fit. Build the same polynomial for the second curve and compare the coefficients, not just the "lines".
In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.
I'm trying to use R to do some modelling, I've started to use BodyWeight library, since I've seen some examples online. Just to understand and get used to the commands.
I've come to my final model, with estimates and I was wondering how to plot these estimates, but I haven't seen anything online..
Is there a way to plot the values of the estimates with a line, and dots for the values of each observation?
Where can I find information about how to do this, do I have to extract the values myself or it is possible to say plot the estimates of these model?
I'm only starting with R. Any help is welcome.
Thank you
There is no function that just plots the output of a model, since there are usually many different possible ways of plotting the output.
Take a look at the predict function for whatever model type you are using (for example, linear regressions using lm have a predict.lm function).
Then choose a plotting system (you will likely want different panels for different levels of diet, so use either ggplot2 or lattice). Then see if you can describe more clearly in words how you want the plot to look. Then update your question if you get stuck.
Now we've identified which dataset you are using, here's a possible plot:
#Run your model
model <- lme(weight ~ Time + Diet, BodyWeight, ~ 1 | Rat)
summary(model)
#Predict the values
#predict.lme is a pain because you have to specify which rat
#you are interested in, but we don't want that
#manually predicting things instead
times <- seq.int(0, 65, 0.1)
mcf <- model$coefficients$fixed
predicted <-
mcf["(Intercept)"] +
rep.int(mcf["Time"] * times, nlevels(BodyWeight$Diet)) +
rep(c(0, mcf["Diet2"], mcf["Diet3"]), each = length(times))
prediction_data <- data.frame(
weight = predicted,
Time = rep.int(times, nlevels(BodyWeight$Diet)),
Diet = rep(levels(BodyWeight$Diet), each = length(times))
)
#Draw the plot (using ggplot2)
(p <- ggplot(BodyWeight, aes(Time, weight, colour = Diet)) +
geom_point() +
geom_line(data = prediction_data)
)