I'm doing graphics with lm relation, and I want to archive and to plot for each one of them an equation y=ax+b with R². How can I do it?
lmfit <- geom_smooth(method="lm", se = T)
p <- qplot(x, y, data=Tab) + facet_grid(id ~., scales = "free") + lmfit
Within ggplot, there is no direct way to do this. You need to compute the regressions separately for each id and then extract the equation and R^2 from each of those. Put those extracted versions in a dataframe (along with id) and use geom_text to display them.
Related
I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))
There are several references that come close, but my lines() is producing multiple arcs instead of just one nonlinear curve. It looks like a hammock with a bunch of unwanted lines. How do you generate a simple nonlinear line? Dataset available as Auto.csv at http://www-bcf.usc.edu/~gareth/ISL/data.html.
library(ISLR)
data(Auto)
lm.fit1=lm(mpg~horsepower,data=Auto) #linear
lm.fit2=lm(mpg~horsepower+I(horsepower^2),data=Auto) #add polynomial
plot(Auto$horsepower,Auto$mpg,col=8,pch=1)
abline(lm.fit1,col=2) #linear fit
lines(Auto$horsepower,predict(lm.fit2),col=4) #attempt at nonlinear
lines plots the data in whatever order it happens to be in. As a result, if you don't sort by the x-value first, you'll get a mess of lines going back and forth as the x-value jumps back and forth from one row to the next. Try this, for example:
plot(c(1,3,2,0), c(1,9,4,0), type="l", lwd=7)
lines(0:3, c(0,1,4,9), col='red', lwd=4)
To get a nice curve, sort by horsepower first:
curve.dat = data.frame(x=Auto$horsepower, y=predict(lm.fit2))
curve.dat = curve.dat[order(curve.dat$x),]
lines(curve.dat, col=4)
Whereas, if you don't sort by horsepower, here's what you get:
You should use poly for your polynomial fit. You can then use curve with predict:
lm.fit2 = lm(mpg ~ poly(horsepower, 2, raw = TRUE), data = Auto) #fit polynomial
#curve passes values to x, see help("curve")
curve(predict(lm.fit2, newdata = data.frame(horsepower = x)), add = TRUE, col = 4)
This also works with nls fits.
An alternative way if you don't want to worry about sorting the dataframe first is to use ggplot. It has a useful method geom_smooth which lets you pick the formula and type of line you want to fit into your model:
library(ISLR)
library(ggplot2)
data(Auto)
ggplot(Auto, aes(mpg, horsepower)) +
geom_point() +
geom_smooth(method="lm", formula = y~x, se=FALSE)+
geom_smooth(method="lm", formula = y~x+I(x^2), se=FALSE, colour="red")
I have a data-set which has 3 columns: date, amount, and a factor/cluster. For example:
date;amount;cluster_id
02.10.10;-13,86;3
04.10.10;-66,28;3
06.10.10;-14,99;3
25.10.10;-20,96;3
30.10.10;-408,99;3
31.01.11;-29,5;2
07.02.11;-652,85;3
19.09.11;-277,48;3
30.09.11;-6,18;3
03.10.11;-242,47;3
04.11.11;-299,77;3
20.02.12;-367,85;3
03.10.12;-4,99;4
13.09.13;-6,59;4
14.10.13;-1043,46;3
24.10.13;-373,99;3
24.10.13;-1321,91;3
18.12.13;-24,45;4
03.02.14;-66,87;3
30.08.14;-7,6;2
28.10.14;-115;3
13.12.14;-8,99;3
15.12.14;-352,44;3
19.12.14;115;3
08.07.15;-59;2
The following code:
ggplot(data, aes(x=date, y=amount, colour=factor(mycluster))) +
stat_smooth(method = "rlm", formula = y ~ x)
simply performs a rlm per group/factor. And looks like:
How can I combine each separate regression model into one big (added) model in order to plot one "combined" model in an easy way e.g. without looping over all the rlm models manually.
I have a data set with some points in it and want to fit a line on it. I tried it with the loess function. Unfortunately I get very strange results. See the plot bellow. I expect a line that goes more through the points and over the whole plot. How can I achieve that?
How to reproduce it:
Download the dataset from https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1 (only two kb) and use this code:
load(url('https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1'))
lw1 = loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
lines(data$y,lw1$fitted,col="blue",lwd=3)
Any help is greatly appreciated. Thanks!
You've plotted fitted values against y instead of against x. Also, you will need to order the x values before plotting a line. Try this:
lw1 <- loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
j <- order(data$x)
lines(data$x[j],lw1$fitted[j],col="red",lwd=3)
Unfortunately the data are not available anymore, but an easier way how to fit a non-parametric line (Locally Weighted Scatterplot Smoothing or just a LOESS if you want) is to use following code:
scatter.smooth(y ~ x, span = 2/3, degree = 2)
Note that you can play with parameters span and degree to get arbitrary smoothness.
May be is to late, but you have options with ggplot (and dplyr). First if you want only plot a loess line over points, you can try:
library(ggplot2)
load(url("https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1"))
ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
Other way, is by predict() function using a loess fit. For instance I used dplyr functions to add predictions to new column called "loess":
library(dplyr)
data %>%
mutate(loess = predict(loess(y ~ x, data = data))) %>%
ggplot(aes(x, y)) +
geom_point(color = "grey50") +
geom_line(aes(y = loess))
Update: Added line of code to load the example data provided
Update2: Correction on geom_smoot() function name acoording #phi comment
The qqmath function makes great caterpillar plots of random effects using the output from the lmer package. That is, qqmath is great at plotting the intercepts from a hierarchical model with their errors around the point estimate. An example of the lmer and qqmath functions are below using the built-in data in the lme4 package called Dyestuff. The code will produce the hierarchical model and a nice plot using the ggmath function.
library("lme4")
data(package = "lme4")
# Dyestuff
# a balanced one-way classiï¬cation of Yield
# from samples produced from six Batches
summary(Dyestuff)
# Batch is an example of a random effect
# Fit 1-way random effects linear model
fit1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
summary(fit1)
coef(fit1) #intercept for each level in Batch
# qqplot of the random effects with their variances
qqmath(ranef(fit1, postVar = TRUE), strip = FALSE)$Batch
The last line of code produces a really nice plot of each intercept with the error around each estimate. But formatting the qqmath function seems to be very difficult, and I've been struggling to format the plot. I've come up with a few questions that I cannot answer, and that I think others could also benefit from if they are using the lmer/qqmath combination:
Is there a way to take the qqmath function above and add a few
options, such as, making certain points empty vs. filled-in, or
different colors for different points? For example, can you make the points for A,B, and C of the Batch variable filled, but then the rest of the points empty?
Is it possible to add axis labels for each point (maybe along the
top or right y axis, for example)?
My data has closer to 45 intercepts, so it is possible to add
spacing between the labels so they do not run into each other?
MAINLY, I am interested in distinguishing/labeling between points on the
graph, which seems to be cumbersome/impossible in the ggmath function.
So far, adding any additional option in the qqmath function produce errors where I would not get errors if it was a standard plot, so I'm at a loss.
Also, if you feel there is a better package/function for plotting intercepts from lmer output, I'd love to hear it! (for example, can you do points 1-3 using dotplot?)
EDIT: I'm also open to an alternative dotplot if it can be reasonably formatted. I just like the look of a ggmath plot, so I'm starting with a question about that.
One possibility is to use library ggplot2 to draw similar graph and then you can adjust appearance of your plot.
First, ranef object is saved as randoms. Then variances of intercepts are saved in object qq.
randoms<-ranef(fit1, postVar = TRUE)
qq <- attr(ranef(fit1, postVar = TRUE)[[1]], "postVar")
Object rand.interc contains just random intercepts with level names.
rand.interc<-randoms$Batch
All objects put in one data frame. For error intervals sd.interc is calculated as 2 times square root of variance.
df<-data.frame(Intercepts=randoms$Batch[,1],
sd.interc=2*sqrt(qq[,,1:length(qq)]),
lev.names=rownames(rand.interc))
If you need that intercepts are ordered in plot according to value then lev.names should be reordered. This line can be skipped if intercepts should be ordered by level names.
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Intercepts)])
This code produces plot. Now points will differ by shape according to factor levels.
library(ggplot2)
p <- ggplot(df,aes(lev.names,Intercepts,shape=lev.names))
#Added horizontal line at y=0, error bars to points and points with size two
p <- p + geom_hline(yintercept=0) +geom_errorbar(aes(ymin=Intercepts-sd.interc, ymax=Intercepts+sd.interc), width=0,color="black") + geom_point(aes(size=2))
#Removed legends and with scale_shape_manual point shapes set to 1 and 16
p <- p + guides(size=FALSE,shape=FALSE) + scale_shape_manual(values=c(1,1,1,16,16,16))
#Changed appearance of plot (black and white theme) and x and y axis labels
p <- p + theme_bw() + xlab("Levels") + ylab("")
#Final adjustments of plot
p <- p + theme(axis.text.x=element_text(size=rel(1.2)),
axis.title.x=element_text(size=rel(1.3)),
axis.text.y=element_text(size=rel(1.2)),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank())
#To put levels on y axis you just need to use coord_flip()
p <- p+ coord_flip()
print(p)
Didzis' answer is great! Just to wrap it up a little bit, I put it into its own function that behaves a lot like qqmath.ranef.mer() and dotplot.ranef.mer(). In addition to Didzis' answer, it also handles models with multiple correlated random effects (like qqmath() and dotplot() do). Comparison to qqmath():
require(lme4) ## for lmer(), sleepstudy
require(lattice) ## for dotplot()
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit, condVar=TRUE)) ## using ggplot2
qqmath(ranef(fit, condVar=TRUE)) ## for comparison
Comparison to dotplot():
ggCaterpillar(ranef(fit, condVar=TRUE), QQ=FALSE)
dotplot(ranef(fit, condVar=TRUE))
Sometimes, it might be useful to have different scales for the random effects - something which dotplot() enforces. When I tried to relax this, I had to change the facetting (see this answer).
ggCaterpillar(ranef(fit, condVar=TRUE), QQ=FALSE, likeDotplot=FALSE)
## re = object of class ranef.mer
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE) {
require(ggplot2)
f <- function(x) {
pv <- attr(x, "postVar")
cols <- 1:(dim(pv)[1])
se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
ord <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
pDf <- data.frame(y=unlist(x)[ord],
ci=1.96*se[ord],
nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
ind=gl(ncol(x), nrow(x), labels=names(x)))
if(QQ) { ## normal QQ-plot
p <- ggplot(pDf, aes(nQQ, y))
p <- p + facet_wrap(~ ind, scales="free")
p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
} else { ## caterpillar dotplot
p <- ggplot(pDf, aes(ID, y)) + coord_flip()
if(likeDotplot) { ## imitate dotplot() -> same scales for random effects
p <- p + facet_wrap(~ ind)
} else { ## different scales for random effects
p <- p + facet_grid(ind ~ ., scales="free_y")
}
p <- p + xlab("Levels") + ylab("Random effects")
}
p <- p + theme(legend.position="none")
p <- p + geom_hline(yintercept=0)
p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
p <- p + geom_point(aes(size=1.2), colour="blue")
return(p)
}
lapply(re, f)
}
Another way to do this is to extract simulated values from the distribution of each of the random effects and plot those. Using the merTools package, it is possible to easily get the simulations from a lmer or glmer object, and to plot them.
library(lme4); library(merTools) ## for lmer(), sleepstudy
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
randoms <- REsim(fit, n.sims = 500)
randoms is now an object with that looks like:
head(randoms)
groupFctr groupID term mean median sd
1 Subject 308 (Intercept) 3.083375 2.214805 14.79050
2 Subject 309 (Intercept) -39.382557 -38.607697 12.68987
3 Subject 310 (Intercept) -37.314979 -38.107747 12.53729
4 Subject 330 (Intercept) 22.234687 21.048882 11.51082
5 Subject 331 (Intercept) 21.418040 21.122913 13.17926
6 Subject 332 (Intercept) 11.371621 12.238580 12.65172
It provides the name of the grouping factor, the level of the factor we are obtaining an estimate for, the term in the model, and the mean, median, and standard deviation of the simulated values. We can use this to generate a caterpillar plot similar to those above:
plotREsim(randoms)
Which produces:
One nice feature is that the values that have a confidence interval that does not overlap zero are highlighted in black. You can modify the width of the interval by using the level parameter to plotREsim making wider or narrower confidence intervals based on your needs.
Yet another way to obtain the desired plot is through the plot_model()command integraded in the sjPlotpackage. The advantage is that the command returns a ggplot-object and hence there are many options to adjust the figure as wished. I kept the example simple because there are many options to individualize the visualisation - just check ?plot_modelfor all options.
library(lme4)
library(sjPlot)
#?plot_model
data(Dyestuff, package = "lme4")
summary(Dyestuff)
fit1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
summary(fit1)
plot_model(fit1, type="re",
vline.color="#A9A9A9", dot.size=1.5,
show.values=T, value.offset=.2)