I want to plot the overall trend for longitudinal data. I am using the sleepstudy data in lme4 package to demonstrate my problem.
library("lme4")
library("ggplot2")
p1 <- ggplot(data = sleepstudy, aes(x = Days, y = Reaction, group = Subject))
p1 + geom_line() + geom_point(aes(col = Subject) ,size=2)
When i plot the longitudinal trajectories for each individual, I got this plot
Here I am interested in finding the overall trend based on all subjects. For an example based on the above plot we can see an upward trend in general. In general this trend can be anything like linear, quadratic etc. Is this any way to plot this overall trend ?
I tried this . But i got smoothed curves for each subject instead of getting the overall trend
p1 + geom_point() + geom_smooth(method = "lm")
Can anyone help me figure this out ?
Thank you
Don't know if I understand correctly:
library("lme4")
library("ggplot2")
ggplot(data = sleepstudy, aes(x = Days, y = Reaction))+
geom_point(aes(colour = Subject), alpha = .3)+
geom_smooth()+
theme(legend.position = "none")
As you can see you'll have loess function:
> geom_smooth()` using method = 'loess' and formula 'y ~ x'
If you need lm just specify the argument method into geom_smooth.
library("lme4")
library("ggplot2")
ggplot(data = sleepstudy, aes(x = Days, y = Reaction))+
geom_point(aes(colour = Subject), alpha = .3)+
geom_smooth(method = "lm")+
labs(title = "Linear Model (LM)")+
theme(legend.position = "none")
The result:
Related
I am trying to plot a geom_smooth using a gamma error distribution.
library(ggplot)
data <- data.frame(x = 1:100, y = (1:100 + runif(1:100, min = 0, max = 50))^2)
p <- ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = Gamma(link = "log")))
I also want to reverse the y-axis however using scale_y_reverse, but this causes the Gamma distribution to fail as it can't be applied to negative values. How can I reverse the y-axis for this plot?
p + scale_y_reverse()
Warning message:
Computation failed in `stat_smooth()`:
non-positive values not allowed for the 'Gamma' family
I'm not sure if there are build-in methods to call out the predicted values of geom_smooth for scale_y_reverse to work.
Here's the more conventional method with visualizing of regression models, i.e. construct, predict and plot.
library(broom)
model <- glm(y ~ x, data, family = Gamma(link = "log"))
new <- augment(model, se=TRUE)
ggplot(new, aes(x, y)) +
geom_point() +
geom_line(aes(y=exp(1)^.fitted)) +
geom_line(aes(y=exp(1)^(.fitted + .se.fit)), linetype="dashed") +
geom_line(aes(y=exp(1)^(.fitted - .se.fit)), linetype="dashed") +
scale_y_reverse()
For descriptive plots in R studio, I would like to fit a regression curve in my spaghetti plot. To create the spaghetti plot I used:
library(lattice)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient <- data_head$Patient
Eye <-data_head$Eye
xyplot(GCIP~time_since_on, groups = Patient, type='b', data=data_head)
and I've got this plot
Then I wanted to fit a polynomial curve, so I used this code:
plot.new<- plot(time_since_on,GCIP)
lines(lowess(GCIP ~ time_since_on))
This is what I've got:
What I want is to fit a curve like the one I've got in the image 2 but over the spaghetti plot (with the longitudinal data for each subject).
I've tried to use this code:
library(ggplot2)
library(reshape2)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
ggplot(data = reprex, aes(x,y)) +
geom_point(alpha=1, size=2) +
aes(colour=Patient.ID) +
geom_text(aes(label=label), size=2, colour='white') +
geom_path(aes(group=Patient.ID))
ggplot(data= reprex, aes(x = time_since_on, y = GCIP)) +
geom_point(size = 2, alpha= 1, aes(color = Patient.ID)) + #colour points by group
geom_path(aes(group = Patient.ID)) + #spaghetti plot
stat_smooth(method = "lm", formula = y ~ x, aes(group = Patient.ID, colour = group)) + #line of best fit by group
ylab("GCIP (volume)") + xlab("time_since_on (months)") +
theme_bw()
But I don't get anything from this.
COuld anyone help me please?
Here an example taken from the internet
Million Thanks.
Lili
I have data that looks like this:
height <- c(1,2,3,4,2,4,6,8)
weight <- c(12,13,14,15,22,23,24,25)
type <- c("Wheat","Wheat","Wheat","Wheat","Rice","Rice","Rice","Rice")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,type,height,weight)
I run a lmer model with set as a random effect in R:
mod <- lmer(weight~height + type + (1|set), data = dat)
Now, I want to plot the estimates of the model and plot a regression, with weight on the x-axis and height on the y-axis, facet(~type)
I use the predict function as follows
dat$pred <- predict(mod, type = "response")
And I want to achieve a ggplot that will look like this:
ggplot(dat,aes(x = weight, y = height)) +
geom_point() + geom_smooth(method="lm", fill=NA) + facet_grid(~ type, scales = "free")
However, I note that the predict function has only a singular output. How do I plot that to achieve the same as above? Or do I have to store two different predict responses, and then plug it into the x,y of ggplot?
I can adapt your plot to show raw vs. predicted values like this:
ggplot(dat,aes(y = height)) +
geom_point(aes(x = weight)) +
geom_line(aes(x = pred)) +
facet_grid(~ type, scales = "free")
In your example plot though you have weight, the outcome variable in your model, on the x-axis, which is confusing. Normally you would have the outcome/predicted variable on the y-axis, so I would have plotted your model predictions like:
ggplot(dat,aes(x = height)) +
geom_point(aes(y = weight)) +
geom_line(aes(y = pred)) +
facet_grid(~ type, scales = "free")
I have a quadratic regression model. I would like to add the model's
fitted regression line to a scatter plot. My preference is to use ggplot2.
I am able to draw the scatter plot but when I use "stat_smooth()"
to specify the formula, I get the following warning and the fitted
line is not drawn on the scatter plot.
Warning messages:
1: 'newdata' had 80 rows but variables found have 24 rows
2: Computation failed in stat_smooth():
arguments imply differing number of rows: 80, 24
My code is below. Can someone please guide me what should I do
differently so that I can get fitted regression line in a scatter
plot using ggplot.
Code:
library(gamair)
library(ggplot2)
data(hubble)
names(hubble)[names(hubble) == "y"] <- c("velocity")
names(hubble)[names(hubble) == "x"] <- c("distance")
hubble$distance.sqr <- hubble$distance^2
model2.formula <- hubble$velocity ~ hubble$distance +
hubble$distance.sqr - 1
model2.hbl <- lm(model2.formula, data = hubble)
summary(model2.hbl)
model2.sp <- ggplot(hubble, aes(x = distance, y = velocity)) +
geom_point() + labs(title = "Scatter Plot between Distance & Velocity",
x = "Distance", y = "Velocity")
model2.sp + stat_smooth(method = "lm", formula = hubble$velocity ~
hubble$distance + hubble$distance.sqr - 1)
I think the issue here is how you specify the quadratic formula. For the squared term you could use I(x^2) or poly(x, 2). For example:
ggplot(hubble, aes(x, y)) +
geom_point() +
stat_smooth(method = "lm",
formula = y ~ x + poly(x, 2) - 1) +
labs(x = "Distance", y = "Velocity")
here is a MWE based on "mpg" dataset:
library(ggplot2)
ggplot(mpg, aes(x = hwy, y = displ)) +
geom_point(shape = 1) +
geom_smooth(method = lm, se = FALSE)
I am trying to plot the model predictions from a binary choice glm against the empirical probability using data from the titanic. To show differences across class and sex I am using faceting, but I have two things things I can't quite figure out. The first is that I'd like to restrict the loess curve to be between 0 and 1, but if I add the option ylim(c(0,1)) to the end of the plot, the ribbon around the loess curve gets cut off if one side of it is outside the bound. The second thing I'd like to do is draw a line from the minimum x-value (predicted probability from the glm) for each facet, to the maximum x-value (within the same facet) and y = 1 so as to show glm predicted probability.
#info on this data http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.sav'))
titanic <- titanic3[ ,-c(3,8:14)]; rm(titanic3)
titanic <- na.omit(titanic) #probably missing completely at random
titanic$age <- as.numeric(titanic$age)
titanic$sibsp <- as.integer(titanic$sibsp)
titanic$survived <- as.integer(titanic$survived)
training.df <- titanic[sample(nrow(titanic), nrow(titanic) / 2), ]
validation.df <- titanic[!(row.names(titanic) %in% row.names(training.df)), ]
glm.fit <- glm(survived ~ sex + sibsp + age + I(age^2) + factor(pclass) + sibsp:sex,
family = binomial(link = "probit"), data = training.df)
glm.predict <- predict(glm.fit, newdata = validation.df, se.fit = TRUE, type = "response")
plot.data <- data.frame(mean = glm.predict$fit, response = validation.df$survived,
class = validation.df$pclass, sex = validation.df$sex)
require(ggplot2)
ggplot(data = plot.data, aes(x = as.numeric(mean), y = as.integer(response))) + geom_point() +
stat_smooth(method = "loess", formula = y ~ x) +
facet_wrap( ~ class + sex, scale = "free") + ylim(c(0,1)) +
xlab("Predicted Probability of Survival") + ylab("Empirical Survival Rate")
The answer to your first question is to use coord_cartesian(ylim=c(0,1)) instead of ylim(0,1); this is a moderately FAQ.
For your second question, there may be a way to do it within ggplot but it was easier for me to summarize the data externally:
g0 <- ggplot(data = plot.data, aes(x = mean, y = response)) + geom_point() +
stat_smooth(method = "loess") +
facet_wrap( ~ class + sex, scale = "free") +
coord_cartesian(ylim=c(0,1))+
labs(x="Predicted Probability of Survival",
y="Empirical Survival Rate")
(I shortened your code slightly by eliminating some default values and using labs.)
ss <- ddply(plot.data,c("class","sex"),summarise,minx=min(mean),maxx=max(mean))
g0 + geom_segment(data=ss,aes(x=minx,y=minx,xend=maxx,yend=maxx),
colour="red",alpha=0.5)