I'd love some help with this. I'm trying to put an exponential decay curve onto some vehicle data I have. I've been searching through Stack Overflow and none of the answers have been helpful.
This is my current code that's not working. It's based off the ggplot2 documentation and it's still not working.
plot <- ggplot(data = rawData, aes(x = Mileage, y = Cost, color = Car)) + geom_point() + stat_smooth(method = 'nls', formula = y ~ a*exp(b *-x), se = FALSE, start = list(a=1,b=1))
plot
It plots my data but doesn't show a curve.
I can't embed photos for some reason so here it is
The current warning messages I receive are:
1: In (function (formula, data = parent.frame(), start, control =
nls.control(), : No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'. Consider specifying 'start' or using a
selfStart model 2: Computation failed in stat_smooth(): singular
gradient matrix at initial parameter estimates
I tried these other options too, to no avail.
ggplot(mtcars, aes(x = Mileage, y = Cost)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * b), se = FALSE,
method.args = list(start = list(a = 1, b = 1)))
Which resulted in an error message of:
Computation failed in stat_smooth(): Missing value or an infinity
produced when evaluating the model
And I tried this too
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE,
method.args = list(start = list(a = 1, b = 1),
lower = c(0),
algorithm = "port"))
Which resulted in an error message of:
Computation failed in stat_smooth(): singular gradient matrix at
initial parameter estimates
UPDATE
If I divide all my values by 100,000, all of sudden the trendline works, albeit without confidence intervals. I have no idea why this works and doesn't provide me with an acceptable answer since all my axis values are now off by 100,000.
rawData %>% mutate(Mileage = Mileage / 100000,
Cost = Cost / 100000) %>%
ggplot(aes(x = Mileage, y = Cost, color = Car)) +
geom_point() + stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE)
Here is my data - https://docs.google.com/spreadsheets/d/1SKhkqHK-qFGG8IST67iUhMIIdvA_k6htVid7lAwCb3A/edit?usp=sharing
Related
The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)
I am attempting to display a linear model for low x values and a non-linear model for higher x values. To do this, I will use DNase as an example:
library(ggplot2)
#Assinging DNase as a new dataframe:
data_1 <- DNase
#Creating a column that can distinguish low and high range values:
data_1$range <- ifelse(data_1$conc <5, "low", "high")
#Attempting to plot separate lines for low and high range values, and also facet_wrap by run:
ggplot(data_1, aes(x = conc, y = density, colour = range)) +
geom_point(size = 0.5) + stat_smooth(method = "nls",
method.args = list(formula = y ~ a*exp(b*x),
start = list(a = 0.8, b = 0.1)),
data = data_1,
se = FALSE) +
stat_smooth(method = 'lm', formula = 'y~0+x') +
facet_wrap(~Run)
However, as you can see, it seems to plot both the linear model and the non-linear model for both, and I can't quite figure out where to put information that would tell it to only plot one for each. Also, if possible, can I extend these models out to the full range of values on the x axis?
You can provide specific data to each geom. In this case use subset data_1 using range to only provide the relevant data to each stat_smooth() call (and the whole frame to geom_point()
ggplot(NULL, aes(x = conc, y = density, colour = range)) +
geom_point(data = data_1, size = 0.5) +
stat_smooth(data = subset(data_1, range == "high"),
method = "nls",
method.args = list(formula = y ~ a*exp(b*x),
start = list(a = 0.8, b = 0.1)),
se = FALSE) +
stat_smooth(data = subset(data_1, range == "low"), method = 'lm', formula = 'y~0+x') +
facet_wrap(~Run)
If you want to fit both models on all the data, then just calculate those manually in data_1 and plot manually.
I am trying to interpolate with ggplot2 an interpolated function and overlap it to the dotplot graph of the single values.
I obtain an error that I am not able to understand, like if I were binding two different vectors of different length.
3: Computation failed in `stat_smooth()`:
arguments imply differing number of rows: 80, 6
The complete code is written below:
library(ggplot2)
tabella <- data.frame("Tempo" = c(0, 15, 30, 60, 90, 120), "Visc" = c(500, 9125, 11250, 10875, 11325, 10375))
attach(tabella)
Visc.mod <- nls((Visc ~ 500 + (k1*Tempo/(k2+Tempo))), start=list(k1=100, k2=100), trace=TRUE)
cor(Visc,predict(Visc.mod))
predict(Visc.mod)
summary(Visc.mod)
ggplot(tabella, aes(x=Tempo, y=Visc)) +
geom_point() +
stat_smooth(method = "nls",
method.args = list(formula = "Visc ~ 500 + (k1*Tempo/(k2+Tempo))",
start = list(k1=100, k2=100)), data = tabella, se = FALSE)
I really do not understand where the mistake could be.
Thank you in advance for every reply!
I got it to run without errors by moving the formula argument. However the fit doesn't look particularly good though.
library(ggplot2)
tabella <- data.frame("Tempo" = c(0, 15, 30, 60, 90, 120), "Visc" = c(500, 9125, 11250, 10875, 11325, 10375))
ggplot(tabella, aes(x=Tempo, y=Visc)) +
geom_point() +
stat_smooth(method = "nls", formula = y ~ 500 + (k1 * x / (k2 + x)),
method.args = list(start = list(k1=100, k2=100)), data = tabella, se = FALSE)
Created on 2021-04-14 by the reprex package (v1.0.0)
One issue with your code is that the formula is a parameter of nls and you need to pass a formula object to it and not a character.
Secondly, ggplot2 passes y and x to nls and not Visc and Tempo
ggplot(tabella, aes(x = Tempo, y = Visc)) +
geom_point()+
geom_smooth(
method = "nls",
formula = y ~ 500 + (k1 * x / (k2 + x)),
method.args = list(start = c(k1 = 100, k2 = 100)),
se=FALSE)
I was typing my answer when #teunbrand preceded me. However, I place it using geom_smooth instead of stat_smooth
Same result. Not a good fit
I am running the following code:
ggplot(data= data_nickel_t, aes( x=index(data_nickel_t), y= log(ni_demand) )) +
scale_x_yearqtr(format = "%Y-%q", n = 14) +
geom_point() + stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', aes(colour = "linear fit"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 2), size = 1, aes(colour = "quadratic"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 3), size = 1, aes(colour = "polynomial"), se= FALSE ) +
ggtitle("Global Refined Nickel Demand") +
xlab("Time") +
ylab("Thousand Metric Tons")
The code above produce a graph with three fitted lines but I get the following warning messages:
1: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se,
prediction from a rank-deficient fit may be misleading;
2: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se, :
prediction from a rank-deficient fit may be misleading;
3: Removed 94 rows containing missing values (geom_pointrange).
My first impression was collinearity between time trends variable in poly() function. I might estimate numerical model to check this further. As for the missing value issue, e.g this link explain the reasons for missing k rows. When I tried solutions suggested in that link, it does not work in my case, I still get the same error. I have 94 observations. I also don't have zeros in my data so no reason for log transformation to drop my values. I am still kind of new using r with time series any idea how I may fix the missing value warning?
I would like to use ggplot to replicate the plots partial effects (with partial residuals), as obtained with the "effect" package. To do this I need to retrieve some information.
This is the plot I want to replicate with ggplot.
library(effects)
mod <- lm(log(prestige) ~ income:type + education, data=Prestige)
eff = effect("education", mod, partial.residuals=T)
plot(eff)
From the eff object I am able to retrieve the partial residuals, as eff$residuals, but they are not sufficient to replicate the plot. I think that what I need is the both the residuals, AND the marginal predicted effect. However I was not able to retrieve them from my eff object.
Otherwise I only have the residuals scores that cannot be plotted against the line of the marginal effect.
Any hint on how to retrieve this information?
You have almost all the information available. This would take some more time to generalize, but here's some code that results in a figure approximately like from the effects package. Notice that the smoother is off, but I didn't bother to dig up why.
The code should be self explanatory. I only copied function closest from the package.
mod <- lm(log(prestige) ~ income:type + education, data=Prestige)
eff = effect("education", mod, partial.residuals=T)
library(ggplot2)
library(gridExtra)
closest <- function(x, x0) apply(outer(x, x0, FUN=function(x, x0) abs(x - x0)), 1, which.min)
x.fit <- unlist(eff$x.all)
trans <- I
x <- data.frame(lower = eff$lower, upper = eff$upper, fit = eff$fit, education = eff$x$education)
xy <- data.frame(x = x.fit, y = x$fit[closest(trans(x.fit), x$education)] + eff$residuals)
g <- ggplot(x, aes(x = education, y = fit)) +
theme_bw() +
geom_line(size = 1) +
geom_point(data = xy, aes(x = x, y = y), shape = 1, col = "blue", size = 2) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.5) +
geom_smooth(data = xy, aes(x = trans(x), y = y),
method = "loess", span = 2/3, linetype = "dashed", se = FALSE)
grid.arrange(plot(eff), g, ncol = 2)