I am trying to interpolate with ggplot2 an interpolated function and overlap it to the dotplot graph of the single values.
I obtain an error that I am not able to understand, like if I were binding two different vectors of different length.
3: Computation failed in `stat_smooth()`:
arguments imply differing number of rows: 80, 6
The complete code is written below:
library(ggplot2)
tabella <- data.frame("Tempo" = c(0, 15, 30, 60, 90, 120), "Visc" = c(500, 9125, 11250, 10875, 11325, 10375))
attach(tabella)
Visc.mod <- nls((Visc ~ 500 + (k1*Tempo/(k2+Tempo))), start=list(k1=100, k2=100), trace=TRUE)
cor(Visc,predict(Visc.mod))
predict(Visc.mod)
summary(Visc.mod)
ggplot(tabella, aes(x=Tempo, y=Visc)) +
geom_point() +
stat_smooth(method = "nls",
method.args = list(formula = "Visc ~ 500 + (k1*Tempo/(k2+Tempo))",
start = list(k1=100, k2=100)), data = tabella, se = FALSE)
I really do not understand where the mistake could be.
Thank you in advance for every reply!
I got it to run without errors by moving the formula argument. However the fit doesn't look particularly good though.
library(ggplot2)
tabella <- data.frame("Tempo" = c(0, 15, 30, 60, 90, 120), "Visc" = c(500, 9125, 11250, 10875, 11325, 10375))
ggplot(tabella, aes(x=Tempo, y=Visc)) +
geom_point() +
stat_smooth(method = "nls", formula = y ~ 500 + (k1 * x / (k2 + x)),
method.args = list(start = list(k1=100, k2=100)), data = tabella, se = FALSE)
Created on 2021-04-14 by the reprex package (v1.0.0)
One issue with your code is that the formula is a parameter of nls and you need to pass a formula object to it and not a character.
Secondly, ggplot2 passes y and x to nls and not Visc and Tempo
ggplot(tabella, aes(x = Tempo, y = Visc)) +
geom_point()+
geom_smooth(
method = "nls",
formula = y ~ 500 + (k1 * x / (k2 + x)),
method.args = list(start = c(k1 = 100, k2 = 100)),
se=FALSE)
I was typing my answer when #teunbrand preceded me. However, I place it using geom_smooth instead of stat_smooth
Same result. Not a good fit
Related
I'm trying to visualize a dataset that uses a binomial response variable (proportions). I'm using a gam to examine the trend, but having difficult getting it to plot with ggplot. How do I get the smooth added to the plot?
Example:
set.seed(42)
df <- data.frame(y1 = sample.int(100),
y2 = sample.int(100),
x = runif(100, 0, 100))
ggplot(data = df,
aes(y = y1/(y1+y2), x = x)) +
geom_point(shape = 1) +
geom_smooth(method = "gam",
method.args = list(family = binomial),
formula = cbind(y1, y2) ~ s(x))
Warning message:
Computation failed in `stat_smooth()`
Caused by error in `cbind()`:
! object 'y1' not found
The formula in geom_smooth has to be in terms of x and y, representing the variables on your x and y axes, so you can't pass in y1 and y2.
The way round this is that rather than attempting to use the cbind type left-hand side of your gam, you can expand the counts into 1s and 0s so that there is only a single y variable. Although this makes for a little extra pre-processing, it allows you to draw your points just as easily using stat = 'summary' inside geom_point and makes your geom_smooth very straightforward:
library(tidyverse)
set.seed(42)
df <- data.frame(y1 = sample.int(100),
y2 = sample.int(100),
x = runif(100, 0, 100))
df %>%
rowwise() %>%
summarize(y = rep(c(1, 0), times = c(y1, y2)), x = x) %>%
ggplot(aes(x, y)) +
geom_point(stat = 'summary', fun = mean, shape = 1) +
geom_smooth(method = "gam",
method.args = list(family = binomial),
formula = y ~ s(x)) +
theme_classic()
Created on 2023-01-20 with reprex v2.0.2
The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)
I'm working with the Wage dataset in the ISLR library. My objective is to perform a spline regression with knots at 3 locations (see code below). I can do this regression. That part is fine.
My issue concerns the visualization of the regression curve. Using base R functions, I seem to get the correct curve. But I can't seem to get quite the right curve using the tidyverse. This is what is expected, and what I get with the base functions:
This is what ggplot spits out
It's noticeably different. R gives me the following message when running the ggplot functions:
geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")
What does this mean and how do I fix it?
library(tidyverse)
library(ISLR)
attach(Wage)
agelims <- range(age)
age.grid <- seq(from = agelims[1], to = agelims[2])
fit <- lm(wage ~ bs(age, knots = c(25, 40, 60), degree = 3), data = Wage) #Default is 3
plot(age, wage, col = 'grey', xlab = 'Age', ylab = 'Wages')
points(age.grid, predict(fit, newdata = list(age = age.grid)), col = 'darkgreen', lwd = 2, type = "l")
abline(v = c(25, 40, 60), lty = 2, col = 'darkgreen')
ggplot(data = Wage) +
geom_point(mapping = aes(x = age, y = wage), color = 'grey') +
geom_smooth(mapping = aes(x = age, y = fit$fitted.values), color = 'red')
I also tried
ggplot() +
geom_point(data = Wage, mapping = aes(x = age, y = wage), color = 'grey') +
geom_smooth(mapping = aes(x = age.grid, y = predict(fit, newdata = list(age = age.grid))), color = 'red')
but that looks very similar to the 2nd picture.
Thanks for any help!
splines::bs() and s(., type="bs") from mgcv do very different things; the latter is a penalized regression spline. I would try (untested!)
geom_smooth(method="lm",
formula= y ~ splines::bs(x, knots = c(25, 40, 60), degree = 3))
I'd love some help with this. I'm trying to put an exponential decay curve onto some vehicle data I have. I've been searching through Stack Overflow and none of the answers have been helpful.
This is my current code that's not working. It's based off the ggplot2 documentation and it's still not working.
plot <- ggplot(data = rawData, aes(x = Mileage, y = Cost, color = Car)) + geom_point() + stat_smooth(method = 'nls', formula = y ~ a*exp(b *-x), se = FALSE, start = list(a=1,b=1))
plot
It plots my data but doesn't show a curve.
I can't embed photos for some reason so here it is
The current warning messages I receive are:
1: In (function (formula, data = parent.frame(), start, control =
nls.control(), : No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'. Consider specifying 'start' or using a
selfStart model 2: Computation failed in stat_smooth(): singular
gradient matrix at initial parameter estimates
I tried these other options too, to no avail.
ggplot(mtcars, aes(x = Mileage, y = Cost)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * b), se = FALSE,
method.args = list(start = list(a = 1, b = 1)))
Which resulted in an error message of:
Computation failed in stat_smooth(): Missing value or an infinity
produced when evaluating the model
And I tried this too
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE,
method.args = list(start = list(a = 1, b = 1),
lower = c(0),
algorithm = "port"))
Which resulted in an error message of:
Computation failed in stat_smooth(): singular gradient matrix at
initial parameter estimates
UPDATE
If I divide all my values by 100,000, all of sudden the trendline works, albeit without confidence intervals. I have no idea why this works and doesn't provide me with an acceptable answer since all my axis values are now off by 100,000.
rawData %>% mutate(Mileage = Mileage / 100000,
Cost = Cost / 100000) %>%
ggplot(aes(x = Mileage, y = Cost, color = Car)) +
geom_point() + stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE)
Here is my data - https://docs.google.com/spreadsheets/d/1SKhkqHK-qFGG8IST67iUhMIIdvA_k6htVid7lAwCb3A/edit?usp=sharing
I have a simple dataset and I am trying to use the power trend to best fit the data. The sample data is very small and is as follows:
structure(list(Discharge = c(250, 300, 500, 700, 900), Downstream = c(0.3,
0.3, 0.3, 0.3, 0.3), Age = c(1.32026239202165, 1.08595138888889,
0.638899189814815, 0.455364583333333, 0.355935185185185)), .Names = c("Discharge",
"Downstream", "Age"), row.names = c(NA, 5L), class = "data.frame")
Data looks as follows:
> new
Discharge Downstream Age
1 250 0.3 1.3202624
2 300 0.3 1.0859514
3 500 0.3 0.6388992
4 700 0.3 0.4553646
5 900 0.3 0.3559352
I tried to plot the above data using ggplot2
ggplot(new)+geom_point(aes(x=Discharge,y=Age))
I could add the linear line using geom_smooth(method="lm") but I am not sure what code do I need to show the power line.
The output is as follows:
How Can I add a power linear regression line as done in excel ? The excel figure is shown below:
While mnel's answer is correct for a nonlinear least squares fit, note that Excel isn't actually doing anything nearly that sophisticated. It's really just log-transforming the response and predictor variables, and doing an ordinary (linear) least squares fit. To reproduce this in R, you would do:
lm(log(Age) ~ log(Discharge), data=df)
Call:
lm(formula = log(Age) ~ log(Discharge), data = df)
Coefficients:
(Intercept) log(Discharge)
5.927 -1.024
As a check, the coefficient for log(Discharge) is identical to that from Excel while exp(5.927) ~ 375.05.
While I'm not sure how to use this as a trendline in ggplot2, you can do it in base graphics thusly:
m <- lm(log(y) ~ log(x), data=df)
newdf <- data.frame(Discharge=seq(min(df$Discharge), max(df$Discharge), len=100))
plot(Age ~ Discharge, data=df)
lines(newdf$Discharge, exp(predict(m, newdf)))
text(600, .8, substitute(b0*x^b1, list(b0=exp(coef(m)[1]), b1=coef(m)[2])))
text(600, .75, substitute(plain("R-square: ") * r2, list(r2=summary(m)$r.squared)))
Use nls (nonlinear least squares) as your smoother
eg
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE)
Noting Doug Bates comments on R-squared values and non-linear models here, you could use the ideas in
Adding Regression Line Equation and R2 on graph
to append the regression line equation
# note that you have to give it sensible starting values
# and I haven't worked out why the values passed to geom_smooth work!
power_eqn = function(df, start = list(a =300,b=1)){
m = nls(Discharge ~ a*Age^b, start = start, data = df);
eq <- substitute(italic(y) == a ~italic(x)^b,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2)))
as.character(as.expression(eq));
}
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE) +
geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
2018 Update:
The call "start" now seems to be depreciated. It is not in the stat_smooth function information either.
If you want to choose starting values, you need to use "method.args" option now.
See changes below:
ggplot(DD,aes(x = Discharge,y = Age)) +
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a = 1,b=1)),se=FALSE) + geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)