I am trying to do a exponential regression in ggplot2. So first my skript:
g <- ggplot(data, aes(x=datax, y=datay), color="black") +
geom_point(shape=1) + stat_smooth(method = 'nls', formula = y~a*exp(b*x), aes(colour = 'Exponential'), se = FALSE)
g <- g + theme_classic()
g <- g + theme(panel.grid.major=element_blank())
g <- g + theme(panel.grid.minor=element_blank())
g <- g + theme(axis.line.x=element_line(color="black"),
axis.line.y=element_line(color="black"),
panel.border=element_blank(),
panel.background=element_blank())
g <- g + labs(x="\ndatax",y="datay\n")
g <- g + theme(axis.text.y=element_text(size=14))
g <- g + theme(axis.text.x=element_text(size=14))
g <- g + theme(axis.title.y=element_text(size=18,vjust=1))
g <- g + theme(axis.title.x=element_text(size=18,vjust=1))
g
This is the image that I got
As a R-beginner I did the script by mixing scripts of mine and the internet. I always get the following error:
"In (function (formula, data = parent.frame(), start, control = nls.control(), : No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'.Consider specifying 'start' or using a selfStart model"
I did not found a better way to do the exponential graph, yet.
In addition, I would like to change the color of the graph into black and delete the legend and I would like to have the R² and p value in the graph. (maybe as well the confidence intervals?)
It's not easy to answer without a reproducible example and so many questions.
Are you sure the message you reported is an error and not a warning instead? On my own laptop, with dataset 'iris', I got a warning...
However, how you can read on ?nls page on R documentation, you should provide through the parameter "start" an initial value for starting the estimates to help finding the convergence. If you don't provide it, nls() itself should use some dummy default values (in your case, a and b are set to 1).
You could try something like this:
g <- ggplot(data, aes(x=datax, y=datay), color="black") +
geom_point(shape=1) + stat_smooth(method = 'nls',
method.args = list(start = c(a=1, b=1)),
formula = y~a*exp(b*x), colour = 'black', se = FALSE)
You told R that the colour of the plot is "Exponential", I think that so is going to work (I tried with R-base dataset 'iris' and worked).
You can notice that I passed the start parameter as an element of a list passed to 'method.args': this is a new feature in ggplot v2.0.0.
Hope this helps
Edit:
just for completeness, I attach the code I reproduced on my laptop with default dataset: (please take care that it has no sense an exponential fit with such a dataset, but the code runs without warning)
library(ggplot2)
data('iris')
g1 <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point(color='green') +geom_smooth(method = 'nls',
method.args = list(start=c(a=1, b=1)), se = FALSE,
formula = y~a*exp(b*x), colour='black')
g1
Related
I'm using ggplot2 and ggmics to generate a trend line.
library(ggplot2)
library(ggpmisc)
library(scales)
x <- c(5,2,6,8,9,1,3,6,8,2)
y <- c(4,7,2,5,7,9,5,2,1,3)
df <- data.frame(x,y)
g <- ggplot(df,aes(x,y))
g <- g + geom_point(colour = "black")
g <- g + stat_smooth(method = lm, formula = y ~ x, se = FALSE)
g <- g + stat_poly_eq(formula = y ~ x,
aes(label = paste(stat(eq.label),stat(rr.label),stat(adj.rr.label),stat(p.value.label),sep = "~~~")),
label.x = "right",label.y = "bottom",parse = TRUE)
gg <- g + coord_trans(y = "identity")
gg <- g + coord_trans(y = "log")
gg <- g + scale_y_log10(breaks=10^(0:3),
labels=trans_format("log10",math_format(10^.x)))
For g, the label is y = 6.26 - 0.351x
For gg, it is y = 0.782 - 0.0419x
When I use exponential notation, the value of eq.label changes. I think the eq.label should not change because I am only changing the axis scale. Can you tell me why?
Main thing to understand is in my opinion this information provided on the link below:
The difference between transforming the scales and transforming the
coordinate system is that scale transformation occurs BEFORE
statistics, and coordinate transformation afterwards. Coordinate
transformation also changes the shape of geoms:
In case of transforamtion (scale) is BEFORE statistics, decreasing the errors of the sum of squares is performed on the transformed data.This would be ok if the relation is linear to the log of the variable.
This changes if you transform the coordinates because here the statistics is performed AFTER transformation. e.g. decreasing the errors of sum of squares is performed on the untransformed data.
See here: https://ggplot2.tidyverse.org/reference/coord_trans.html
I've already tried many of the suggestions found here, but I simply can't figure it out.
Is it possible to extract the equation (y = a+exp(-b*x)) from a line fitted with stat_smooth?
This is a data example:
df <-data_frame(Time = c(0.5,1,2,4,8,16,24), Concentration = c(1,0.5,0.2,0.05,0.02,0.01,0.001))
Plot <- ggplot(df, aes(x=Time, y=Concentration))+
geom_point(size=2) +
stat_smooth(method = nls, formula = y ~ a*exp(-b *x),
se = FALSE,
method.args = list(start = c(a=10, b=0.01)))+
theme_classic(base_size = 15) +
labs(x=expression(Time (h)),
y=expression(C[t]/C[0]))
I tried to use "stat_regline_equation" , but it does not work when I add the exponential function.
To extract data from ggplot you can use: ggplot_build()
Values from stat_smooth() are in ggplot_build(Plot)$data[[2]]
You can assign it to the object: build <- ggplot_build(Plot)$data[[2]]
Both codes below give the same result
Plot <- ggplot(df, aes(x=Time, y=Concentration)) + geom_point(size=2) +
stat_smooth(method = nls, formula = y ~ a*exp(-b *x), se = FALSE,
method.args = list(start = c(a=10, b=0.01)))
and
Plot <- ggplot(df, aes(x=Time, y=Concentration)) + geom_point(size=2) +
geom_line(data=build,aes(x=x,y=y),color="blue")
I don't think it's possible. (1) I poked around in the guts of the object generated by ggplot_build(Plot) and didn't find anything likely (that doesn't prove it isn't there but...) (2) If you poke around in the source code of the ggpubr::stat_regline_equation() function you can see that rather than poke around in the stored information from the smooth it has to call a package function that re-fits the linear model so it can extract the coefficients and construct the equation.
You probably just have to re-fit the model yourself:
nls_fit <- nls(formula = Concentration ~ a*exp(-b *Time),
start = c(a=10, b=0.01), data = df)
coef(nls_fit)
(You might find the format returned by broom::tidy(nls_fit) convenient.)
For this particular model you can also get the coefficients via
cc <- coef(glm(Concentration ~ Time, data = df, family = gaussian(link= "log")))
c(exp(cc[1]), -cc[2])
You could in principle write your own stat_ function mirroring stat_regline_equation that would encapsulate this functionality, but it would be a lot more work/wouldn't be worth it unless you were doing this operation very routinely or wanted to make it easy for others to do ...
I know there have been a number of entries with regards to adding R^2 values to plots, but I am having trouble following the codes. I am graphing a scatter plot with three categories. I have added a linear regression line for each one. I would now like to add r^2 values for each but I can't figure out how to do this.
My code:
veg <- read.csv("latandwtall2.csv", header=TRUE)
library("ggplot2")
a <- ggplot(veg, aes(x=avglat, y=wtfi, color=genus)) + geom_point(shape=19, size=4)
b <- a + scale_colour_hue(l=50) + stat_smooth(method = "lm", formula = y ~ x, size = 1, se = FALSE)
c <- b + labs(x="Latitude", y="Weight (g)")
d <- c + theme_bw()
e <- d + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
#changes size of text
f <- e + theme(
axis.title.x = element_text(color="black", vjust=-0.35, size=15, face="bold"),
axis.title.y = element_text(color="black" , vjust=0.35, size=15, face="bold")
)
g <- e+theme(legend.key=element_rect(fill='white'))
g
Any help with how to add R^2 values would be greatly appreciated. Thanks!
If you build a data frame with the r-squared values, you might be able to (mostly) automate the positioning of the annotation text by including it as a call to geom_text.
Here's a toy example. The rsq data frame is used in geom_text to place the r-squared labels. In this case, I've set it up to put the labels just after the highest x-value and the predict function gets the y-value. It's probably too much work for a single plot, but if you're doing this a lot, you can turn it into a function so that you don't have to repeat the set-up code every time, and maybe add some fancier logic to make label placement more flexible:
library(reshape2) # For melt function
# Fake data
set.seed(12)
x = runif(100, 0, 10)
dat = data.frame(x, y1 = 2*x + 3 + rnorm(100, 0, 5),
y2 = 4*x + 20 + rnorm(100, 0, 10))
dat.m = melt(dat, id.var="x")
# linear models
my1 = lm(y1 ~ x, data=dat)
my2 = lm(y2 ~ x, data=dat)
# Data frame for adding r-squared values to plot
rsq = data.frame(model=c("y1","y2"),
r2=c(summary(my1)$adj.r.squared,
summary(my2)$adj.r.squared),
x=max(dat$x),
y=c(predict(my1, newdata=data.frame(x=max(dat$x))),
predict(my2, newdata=data.frame(x=max(dat$x)))))
ggplot() +
geom_point(data=dat.m, aes(x, value, colour=variable)) +
geom_smooth(data=dat.m, aes(x, value, colour=variable),
method="lm", se=FALSE) +
geom_text(data=rsq, aes(label=paste("r^2 == ", round(r2,2)),
x=1.05*x, y=y, colour=model, hjust=0.5),
size=4.5, parse=TRUE)
I can't really reproduce what you're doing but you need to use annotate()
Something that could work (puting the R2 on the 10th point) would be :
R2 = 0.4
i = 10
text = paste("R-squared = ", R2, sep="")
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=text, font="Calibri", colour="red", vjust = -2, hjust = 1)
Use vjust and hjust to adjust the position of the text to the point (change the i), and just fill the variable R2 with your computed rsquared. You can choose the point you like or manually enter the x,y coordinate it's up to you. Does that help ?
PS. I put extra parameters (font, colours) so that you have the flexibility to change them.
Build the model separately, get the R^2 from there, and add it to the plot. I'll give you some dummy code, but it would be of better quality if you had given us a sample data frame.
r2 = summary(lm(wtfi ~ avglat, data=veg))$r.squared
#to piggyback on Romain's code...
i=10
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=round(r2,2), font="Calibri", colour="red", vjust = -2, hjust = 1)
The way I wrote it here you don't need to hard-code the R^2 value in.
I will ask my question with a study case and then I'll make my question more general.
Let's first import some libraries and create some data:
require(visreg)
require(ggplot2)
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=c(rep(1,40), rep(2,20), rep(3,5))
dt=data.frame(x=x, y=y)
and run a linear regression of y on x and graph the data and the model with ggplot2
m1 = lm(y~x, data=dt)
ggplot(dt, aes(x,y)) + geom_point() + geom_smooth(formula = y~x, method="anova", data=dt)
Now I would like to consider my xvariable as a nominal variable. So I slightly change my data and run the following model.
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=factor(c(rep(1,40), rep(2,20), rep(3,5))) # this line has changed!
dt=data.frame(x=x, y=y)
m2 = lm(y~x, data=dt)
How can I plot this model m2 with ggplot2? And more globally how can I directly tell ggplot to consider the object m2 in order to create representation of the model?
What I aim to do is the kind of things that can be done using the visreg package
visreg(m2)
So, is there any visreg-like solution for ggplot? something like
ggplot(..,aes(..)) + super_geom_smooth(model = m2)
This is not much different from #rnso's idea. geom_jitter() adds more flavour. I also change the colour of median bar. Hope this helps you!
ggplot(data = m2$model, aes(x = x, y = y)) +
geom_boxplot(fill = "gray90") +
geom_jitter() +
theme_bw() +
stat_summary(geom = "crossbar", width = 0.65, fatten = 0, color = "blue",
fun.data = function(x){return(c(y=median(x), ymin=median(x), ymax=median(x)))})
Following using boxplot is very similar to your desired graph:
ggplot(dt, aes(x,y))+ geom_boxplot(aes(group=x), alpha=0.5)+ geom_jitter()
Just FYI, visreg can now output a gg object:
visreg(m2, gg=TRUE)
I have a bivariate data set:
set.seed(45)
require(mvtnorm)
sigma <- matrix(c(3,2,2,3), ncol=2)
df <- as.data.frame(rmvnorm(100, sigma=sigma))
names(df) <- c("u", "v")
Setting up v as the dependent variable, with ggplot I can easily show the "usual" least-squares regression of v on u:
require(ggplot2)
qplot(u, v, data=df) + geom_smooth(aes(u, v), method="lm", se=FALSE)
... but I'd also like to show the least-squares regression of u on v (at the same time).
This is how I naively tried to do it, by passing a different aes to geom_smooth:
last_plot() + geom_smooth(aes(v, u), method="lm", color="red", se=FALSE)
Of course, that doesn't quite work. The second geom_smooth shows the inverse of the proper line (I think). I'm expecting it to have a steeper slope than the first line.
Moreover, the confidence intervals are wrongly shaped. I don't particularly care about those, but I do think they might be a clue.
Am I asking for something that can't easily be done with ggplot2?
EDIT: Here is a bit more, showing the lines I expect:
# (1) Least-squares regression of v on u
mod <- lm(v ~ u, data=df)
v_intercept <- coef(mod)[1]
v_slope <- coef(mod)[2]
last_plot() + geom_abline(
intercept = v_intercept,
slope = v_slope,
color = "blue",
linetype = 2
)
# (2) Least-squares regression of u on v
mod2 <- lm(u ~ v, data=df)
u_intercept <- coef(mod2)[1]
u_slope <- coef(mod2)[2]
# NOTE: we have to solve for the v-intercept and invert the slope
# because we're still in the original (u, v) coordinate frame
last_plot() + geom_abline(
intercept = - u_intercept / u_slope,
slope = 1 / u_slope,
color = "red",
linetype = 2
)
ggplot(df) +
geom_smooth(aes(u,v), method='lm') +
geom_smooth(aes(v,u), method='lm', colour="red")