Plotting geom_smooth/geom_spline regression line through the origin

Plotting geom_smooth/geom_spline regression line through the origin - r

Related to this question (Plotting a regression line through the origin), I want to force a geom_smooth(method="loess") call through the origin (0). For geom_smooth(method="lm"), this is possible by specifying the formula in the call, ie geom_smooth(method=lm, formula=y~x-1). What would be the equivalent for geom_smooth(method="loess")?

This is an odd thing to want to do. A loess regression is a locally adaptive fit, so you cannot constrain it to pass through the origin unless you include in your regression a heavily weighted point (or tight cluster of points) at the origin. This is a bit artificial at best.
If you were able to expand on what you are trying to achieve and what your data represents, there may be a better option, but in the meantime, you could achieve what you are asking like this.
First, let's set up a simple example:
library(ggplot2)
set.seed(1)
df <- data.frame(x = 0:10, y = rnorm(11, 0:10) + 5)
p <- ggplot(df, aes(x, y)) +
geom_point() +
coord_cartesian(xlim = c(0, 10), ylim = c(0, 20)) +
theme_bw(base_size = )
Our standard geom_smooth call would look like this:
p + geom_smooth(formula = y ~ x, method = "loess")
And to force it through the origin we can do:
p + geom_smooth(data = rbind(df, data.frame(x = 0, y = 0)),
formula = y ~ x,
aes(weight = c(rep(1, nrow(df)), 100)),
method = "loess")
Created on 2020-12-13 by the reprex package (v0.3.0)

Related

annotate r squared to ggplot by using facet_wrap

I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?

Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))

You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..

Transfer calculated value from stat_smooth to other geom like linerange

I have a question about ggplot2.
I want to connect data point with ols result via vertical line, like the code listed below.
Can I transfer ..y.., the value calculated by stat_smooth, to geom_linerange directly?
I tried stat_smooth(..., geom = "linerange", mapping(aes(ymin=pmin(myy, ..y..), ymax=pmax(myy,..y..)) but it is not the result I want.
library(ggplot2)
df <- data.frame(myx = 1:10,
myy = c(1:10) * 5 + 2 * rnorm(10, 0, 1))
lm.fit <- lm("myy~myx", data = df)
pred <- predict(lm.fit)
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(mapping = aes(ymin = pmin(myy, pred),
ymax = pmax(myy, pred)))

stat_smooth evaluates the values at n evenly spaced points, with n = 80 by default. These points may not coincide with the original x values in your data frame.
Since you are calculating predicted values anyway, it would probably be more straightforward to add that back to your data frame and plot all geom layers based on that as your data source, for example:
df$pred <- pred
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(aes(ymin = myy, ymax = pred))

quadratic fit curve in Spaghetti plot. Lme?

I am trying to fit a quadratic curve over my spaghetti plot. In the beginning I did it only with ggplot like this:
library(ggplot2)
library(reshape2)
GCIP <- data_head$GCIP
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
data_head$time_since_on <- as.numeric(as.character(data_head$time_since_on))
ggplot(data = data_head, aes(x= time_since_on, y=GCIP)) +
geom_point(alpha=1, size=2) +
aes(colour=Patient.ID) +
geom_path(aes(group='Patient.ID'))
ggplot(data= data_head, aes(x = time_since_on, y = GCIP)) +
geom_point(size = 2, alpha= 1, aes(color = Patient.ID)) + #colour points by group
geom_path(aes(group = Patient.ID)) + #spaghetti plot
stat_smooth(method = "lm", formula = y ~ poly(x,2)) + #line of best fit by group
ylab("GCIP (volume)") + xlab("time_since_on (months)") +
theme_bw()
The problem is that I am not sure this code takes into account that each line contains different timepoints of 1 patient, so the line fitted should take that also into account.
Could you please tell me if this is correct?
Here you can see the graph I get
I am not sure and maybe is better to generate a lme model (but in that case I don't know how to introduce the quadratic fitting in the model).
I also did this:
data_head <- read.csv("/Users/adrianaroca-fernandez/Desktop/Analysis/Long_100418_2/N=lit.csv", sep=";", dec=",")
library(ggplot2)
library(reshape2)
library(lme4)
library(lsmeans)
GCIP <- data_head$GCIP
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
data_head$time_since_on <- as.numeric(as.character(data_head$time_since_on))
time_since_on <-data_head$time_since_on
time_since_on2 <- time_since_on^2
quadratic.model <-lm(GCIP ~ time_since_on + time_since_on2)
summary(quadratic.model)
time_since_onvalues <- seq(0, 250, 0.1)
predictedGCIP <- predict(quadratic.model,list(time_since_on=time_since_onvalues, time_since_on2=time_since_onvalues^2))
plot(time_since_on, GCIP, pch=16, xlab = "time_since_on (months)", ylab = "GCIP", cex.lab = 1.3, col = "blue")
lines(time_since_onvalues, predictedGCIP, col = "darkgreen", lwd = 3)
The problem is that I am still unable to introduce (1|Patient.ID) as a mixed effect. And I lose my spaghetti plot in this case, having just the dots. Here the result:
What do you think is better or how should I code this?
Thanks.
lili

How to plot two distribution curves in a faceted way in R / ggplot2?

I have two probability distribution curves, a Gamma and a standarized Normal, that I need to compare:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
f <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun=pgammaX)
f + stat_function(fun = pnorm)
The output is like this
However I need to have the two curves separated by means of the faceting mechanism provided by ggplot2, sharing the Y axis, in a way like shown below:
I know how to do the faceting if the depicted graphics come from data (i.e., from a data.frame), but I don't understand how to do it in a case like this, when the graphics are generated on line by functions. Do you have any idea on this?

you can generate the data similar to what stat_function is doing ahead of time, something like:
x <- seq(-4,9,0.1)
dat <- data.frame(p = c(pnorm(x), pgammaX(x)), g = rep(c(0,1), each = 131), x = rep(x, 2) )
ggplot(dat)+geom_line(aes(x,p, group = g)) + facet_grid(~g)

The issue with doing facet_wrap is that the same stat_function is designed to be applied to each panel of the faceted variable which you don't have.
I would instead plot them separately and use grid.arrange to combine them.
f1 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pgammaX) + ggtitle("Gamma") + theme(plot.title = element_text(hjust = 0.5))
f2 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pnorm) + ggtitle("Norm") + theme(plot.title = element_text(hjust = 0.5))
library(gridExtra)
grid.arrange(f1, f2, ncol=2)
Otherwise create the data frame with y values from both pgammaX and pnorm and categorize them under a faceting variable.

Finally I got the answer. First, I need to have two data sets and attach each function to each data set, as follows:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
a <- data.frame(x=c(3,9), category="Gamma")
b <- data.frame(x=c(-4,4), category="Normal")
f <- ggplot(a, aes(x)) + stat_function(fun=pgammaX) + stat_function(data = b, mapping = aes(x), fun = pnorm)
Then, using facet_wrap(), I separate into two graphics according to the category assigned to each data set, and establishing a free_x scale.
f + facet_wrap("category", scales = "free_x")
The result is shown below:

Adding a general abline in log-log ggplot2

I am trying to add a line to separate part of data in ggplot2. Following this thread:
Adding linear model abline to log-log plot in ggplot
I tried
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
ggplot(d, aes(x, y)) + geom_point() +
geom_abline(intercept = 100, slope = -1, col='red') +
scale_y_log10() + scale_x_log10()
but it did not plot the line. Note that the old plot approach got the line alright:
plot(d$x, d$y, log='xy')
abline(a = 100, b=-1, col='red', untf=TRUE)

This may not be the most elegant solution, but I usually define a separate data frame for predictions when I'm adding them to plots. I know that it's quicker in a lot of ways to add the model specification as part of the plot, but I really like the flexibility of having this as a separate object. Here's what I've got in mind in this case:
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
p = ggplot(d, aes(x,y)) + geom_point() + scale_x_log10() + scale_y_log10()
pred.func = function(x){
100 - x
}
new.dat = data.frame(x = seq(from = 5, to = 90))
new.dat$pred = pred.func(new.dat$x)
p + geom_line(aes(x = x, y = pred), data = new.dat, col = "red")