I am trying to fit a quadratic curve over my spaghetti plot. In the beginning I did it only with ggplot like this:
library(ggplot2)
library(reshape2)
GCIP <- data_head$GCIP
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
data_head$time_since_on <- as.numeric(as.character(data_head$time_since_on))
ggplot(data = data_head, aes(x= time_since_on, y=GCIP)) +
geom_point(alpha=1, size=2) +
aes(colour=Patient.ID) +
geom_path(aes(group='Patient.ID'))
ggplot(data= data_head, aes(x = time_since_on, y = GCIP)) +
geom_point(size = 2, alpha= 1, aes(color = Patient.ID)) + #colour points by group
geom_path(aes(group = Patient.ID)) + #spaghetti plot
stat_smooth(method = "lm", formula = y ~ poly(x,2)) + #line of best fit by group
ylab("GCIP (volume)") + xlab("time_since_on (months)") +
theme_bw()
The problem is that I am not sure this code takes into account that each line contains different timepoints of 1 patient, so the line fitted should take that also into account.
Could you please tell me if this is correct?
Here you can see the graph I get
I am not sure and maybe is better to generate a lme model (but in that case I don't know how to introduce the quadratic fitting in the model).
I also did this:
data_head <- read.csv("/Users/adrianaroca-fernandez/Desktop/Analysis/Long_100418_2/N=lit.csv", sep=";", dec=",")
library(ggplot2)
library(reshape2)
library(lme4)
library(lsmeans)
GCIP <- data_head$GCIP
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
data_head$time_since_on <- as.numeric(as.character(data_head$time_since_on))
time_since_on <-data_head$time_since_on
time_since_on2 <- time_since_on^2
quadratic.model <-lm(GCIP ~ time_since_on + time_since_on2)
summary(quadratic.model)
time_since_onvalues <- seq(0, 250, 0.1)
predictedGCIP <- predict(quadratic.model,list(time_since_on=time_since_onvalues, time_since_on2=time_since_onvalues^2))
plot(time_since_on, GCIP, pch=16, xlab = "time_since_on (months)", ylab = "GCIP", cex.lab = 1.3, col = "blue")
lines(time_since_onvalues, predictedGCIP, col = "darkgreen", lwd = 3)
The problem is that I am still unable to introduce (1|Patient.ID) as a mixed effect. And I lose my spaghetti plot in this case, having just the dots. Here the result:
What do you think is better or how should I code this?
Thanks.
lili
Related
How do I plot a log linear model in R?
Currently, I am doing this but am not sure if it's the right/efficient way:
data(food)
model1 <- lm(food_exp~log(income), data = food)
temp_var <- predict(model1, interval="confidence")
new_df <- cbind(food, temp_var)
head(new_df)
ggplot(new_df, aes(x = income, y = food_exp))+
geom_point() +
geom_smooth(aes(y=lwr), color = "red", linetype = "dashed")+
geom_smooth(aes(y=upr), color = "red", linetype = "dashed")+
geom_smooth(aes(y = fit), color = "blue")+
theme_economist()
you can use geom_smooth and putting your formula directly in. It should yield the same as your fit (which you can check by also plotting that)
ggplot(new_df, aes(x = Sepal.Width, y = Sepal.Length))+
geom_point() +
geom_point(aes(y=fit), color="red") + #your original fit
geom_smooth(method=lm, formula=y~log(x)) #ggplot fit
If you don't car about extracting the parameters and just want the plot, you can plot directly in ggplot2.
Some fake data for plotting:
library(tidyverse)
set.seed(454)
income <- VGAM::rpareto(n = 100, scale = 20, shape = 2)*1000
food_exp <- rnorm(100, income*.3+.1, 3)
food <- data.frame(income, food_exp)
Now within ggplot2, use the geom_smooth function and specify that you want a linear model. Additionally, you can directly transform the income in the aes argument:
ggplot(food, aes(x = log(income), y = food_exp))+
geom_point()+
geom_smooth(method = "lm")+
theme_bw()+
labs(
title = "Log Linear Model Food Expense as a Function of Log(income)",
x = "Log(Income)",
y = "Food Expenses"
)
This will work for confidence intervals, but adding prediction intervals, you'll need to do what you did earlier with fitting the model, generating the prediction intervals.
Related to this question (Plotting a regression line through the origin), I want to force a geom_smooth(method="loess") call through the origin (0). For geom_smooth(method="lm"), this is possible by specifying the formula in the call, ie geom_smooth(method=lm, formula=y~x-1). What would be the equivalent for geom_smooth(method="loess")?
This is an odd thing to want to do. A loess regression is a locally adaptive fit, so you cannot constrain it to pass through the origin unless you include in your regression a heavily weighted point (or tight cluster of points) at the origin. This is a bit artificial at best.
If you were able to expand on what you are trying to achieve and what your data represents, there may be a better option, but in the meantime, you could achieve what you are asking like this.
First, let's set up a simple example:
library(ggplot2)
set.seed(1)
df <- data.frame(x = 0:10, y = rnorm(11, 0:10) + 5)
p <- ggplot(df, aes(x, y)) +
geom_point() +
coord_cartesian(xlim = c(0, 10), ylim = c(0, 20)) +
theme_bw(base_size = )
Our standard geom_smooth call would look like this:
p + geom_smooth(formula = y ~ x, method = "loess")
And to force it through the origin we can do:
p + geom_smooth(data = rbind(df, data.frame(x = 0, y = 0)),
formula = y ~ x,
aes(weight = c(rep(1, nrow(df)), 100)),
method = "loess")
Created on 2020-12-13 by the reprex package (v0.3.0)
I am trying to compare and contrast on one plot the difference between four relationships with a similar x-axis.
I can seem to plot the regression line but have no idea how to plot the equation and/or combine all four plots onto one.
Here is the basic foundation of my code: Sorry if it is pretty basic or clumsy, I am just beginning.
library(ggplot2)
library(cowplot)
p1 <- ggplot(NganokeData, aes(x=Depth,y=LCU1)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 1') +
ylim(1,2)
p2 <- ggplot(NganokeData, aes(x=Depth,y=LCU2)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 2') +
ylim(1,2)
p3 <- ggplot(NganokeData, aes(x=Depth,y=LCU3)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 3') +
ylim(1,2)
p4 <- ggplot(NganokeData, aes(x=Depth,y=LCU4)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 4') +
ylim(1,2)
p3 + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) #Adds polynomial regression
Picture of my code
It looks like you have a variable of interest (LCU1, LCU2, LCU3, LCU4) in the column names. You can use gather from the tidyr package to reshape the data frame:
library(tidyr)
long_data <- gather(NganokeData, key = "core", value = "density",
LCU1, LCU2, LCU3, LCU4)
And then use facet_grid from the ggplot2 package to divide your plot into the four facets you are looking for.
p <- ggplot(long_data, aes(x=Depth,y=density)) +
geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)',
title = 'Density Regression of Lake Nganoke Cores') +
ylim(1,2) +
facet_grid(rows = vars(core)) #can also use cols instead
p + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)
Your code is great. But as a beginner, I highly recommend taking a few minutes to read and learn to use the tidyr package, as ggplot2 is built on the concepts of tidy data, and you'll find it much easier to make your visualizations if you can manipulate the data frame into the format you need before trying to plot it.
https://tidyr.tidyverse.org/index.html
EDIT:
To add an annotation detailing the regression equation, I found code taken from this blog post by Jodie Burchell:
http://t-redactyl.io/blog/2016/05/creating-plots-in-r-using-ggplot2-part-11-linear-regression-plots.html
First, though, it's not going to be possible to glean a displayable regression equation using the poly function in your formula as you have it. The advantage of orthogonal polynomials is that they avoid collinearity, but the drawback is that you no longer have an easily interpretable regression equation with x and x squared and x cubed as regressors.
So we'll have to change the lm fit formula to
y ~ poly(x, 3, raw = TRUE)
which will fit the raw polynomials and give you the regression equation you are looking for.
You are going to have to alter the x and y position values to determine where on the graph to place the annotation, since I don't have your data, but here's the custom function you need:
equation = function(x) {
lm_coef <- list(a = round(coef(x)[1], digits = 2),
b = round(coef(x)[2], digits = 2),
c = round(coef(x)[3], digits = 2),
d = round(coef(x)[4], digits = 2),
r2 = round(summary(x)$r.squared, digits = 2));
lm_eq <- substitute(italic(y) == a + b %.% italic(x) + c %.% italic(x)^2 + d %.% italic(x)^3*","~~italic(R)^2~"="~r2,lm_coef)
as.character(as.expression(lm_eq));
}
Then just add the annotation to your plot, adjust the x and y parameters as needed, and you'll be set:
p +
stat_smooth(method = "lm", formula = y ~ poly(x, 3, raw = TRUE), size = 1) +
annotate("text", x = 1, y = 10, label = equation(fit), parse = TRUE)
For descriptive plots in R studio, I would like to fit a regression curve in my spaghetti plot. To create the spaghetti plot I used:
library(lattice)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient <- data_head$Patient
Eye <-data_head$Eye
xyplot(GCIP~time_since_on, groups = Patient, type='b', data=data_head)
and I've got this plot
Then I wanted to fit a polynomial curve, so I used this code:
plot.new<- plot(time_since_on,GCIP)
lines(lowess(GCIP ~ time_since_on))
This is what I've got:
What I want is to fit a curve like the one I've got in the image 2 but over the spaghetti plot (with the longitudinal data for each subject).
I've tried to use this code:
library(ggplot2)
library(reshape2)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
ggplot(data = reprex, aes(x,y)) +
geom_point(alpha=1, size=2) +
aes(colour=Patient.ID) +
geom_text(aes(label=label), size=2, colour='white') +
geom_path(aes(group=Patient.ID))
ggplot(data= reprex, aes(x = time_since_on, y = GCIP)) +
geom_point(size = 2, alpha= 1, aes(color = Patient.ID)) + #colour points by group
geom_path(aes(group = Patient.ID)) + #spaghetti plot
stat_smooth(method = "lm", formula = y ~ x, aes(group = Patient.ID, colour = group)) + #line of best fit by group
ylab("GCIP (volume)") + xlab("time_since_on (months)") +
theme_bw()
But I don't get anything from this.
COuld anyone help me please?
Here an example taken from the internet
Million Thanks.
Lili
I will ask my question with a study case and then I'll make my question more general.
Let's first import some libraries and create some data:
require(visreg)
require(ggplot2)
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=c(rep(1,40), rep(2,20), rep(3,5))
dt=data.frame(x=x, y=y)
and run a linear regression of y on x and graph the data and the model with ggplot2
m1 = lm(y~x, data=dt)
ggplot(dt, aes(x,y)) + geom_point() + geom_smooth(formula = y~x, method="anova", data=dt)
Now I would like to consider my xvariable as a nominal variable. So I slightly change my data and run the following model.
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=factor(c(rep(1,40), rep(2,20), rep(3,5))) # this line has changed!
dt=data.frame(x=x, y=y)
m2 = lm(y~x, data=dt)
How can I plot this model m2 with ggplot2? And more globally how can I directly tell ggplot to consider the object m2 in order to create representation of the model?
What I aim to do is the kind of things that can be done using the visreg package
visreg(m2)
So, is there any visreg-like solution for ggplot? something like
ggplot(..,aes(..)) + super_geom_smooth(model = m2)
This is not much different from #rnso's idea. geom_jitter() adds more flavour. I also change the colour of median bar. Hope this helps you!
ggplot(data = m2$model, aes(x = x, y = y)) +
geom_boxplot(fill = "gray90") +
geom_jitter() +
theme_bw() +
stat_summary(geom = "crossbar", width = 0.65, fatten = 0, color = "blue",
fun.data = function(x){return(c(y=median(x), ymin=median(x), ymax=median(x)))})
Following using boxplot is very similar to your desired graph:
ggplot(dt, aes(x,y))+ geom_boxplot(aes(group=x), alpha=0.5)+ geom_jitter()
Just FYI, visreg can now output a gg object:
visreg(m2, gg=TRUE)