having problem with ggplot2 for polynomial regression - r

#linear regression
fit1 <- lm(temp ~ usage ,data= electemp)
#polynomial regression
fit2 <- lm(temp ~ poly(electemp$usage,degree), data = electemp)
ggplot(data=electemp, aes(x=temp,y=usage))+geom_point()+
stat_smooth(method="lm",col="red"). #linear regression
ggplot(electemp, aes(usage, temp) ) +
geom_point() +
stat_smooth(method = lm, formula=temp~ poly(electemp$usage, 3, raw=TRUE))
I am using the same ggplot for my polynomial regression but getting "Error: Aesthetics must be either length 1 or the same as the data (55): x".

You need to use x and y in the formula you pass to geom_smooth, not the variable names in your data frame.
Here's an example using some dummy data (though the structure and names are the same, so it should work on your own data):
library(ggplot2)
fit1 <- lm(temp ~ usage ,data= electemp)
fit2 <- lm(temp ~ poly(usage, 3), data = electemp)
ggplot(electemp, aes(usage, temp)) +
geom_point() +
stat_smooth(method = "lm", col = "red")
ggplot(electemp, aes(usage, temp) ) +
geom_point() +
stat_smooth(method = lm, formula= y ~ poly(x, 3))
Data
set.seed(1)
electemp <- data.frame(usage = 1:60,
temp = 20 + .2 * 1:60 - 0.02*(1:60)^2 +
0.0005 * (1:60)^3 + rnorm(60, 0, 5))
Created on 2020-11-24 by the reprex package (v0.3.0)

Related

How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?

The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)

Cannot plot p-value on simple logistic regression

I am trying to plot a simple logistic regression in R.
I am following this tutorial to conduct the logistic regression and calculate a P-value (https://mgimond.github.io/Stats-in-R/Logistic.html). I am trying to use ggplot2 and ggpmisc to plot the regression. I have been trying to use this guide (http://cran.nexr.com/web/packages/ggpmisc/vignettes/user-guide-1.html#stat_fit_glance) to stat_fit_glance to display a p-value
require(cowplot)
require(ggplot2)
library(ggpmisc)
library(rms)
dataset=read.table('input.txt', header=TRUE)
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
#plot logistic regression curve
plot <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + stat_fit_glance(method = "glm", method.args = list(formula = formula), geom = "text", aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")))
ggsave("output.pdf")
The output however comes out as
> source("C:/Users/Deven/Desktop/logistic/script.R")
Saving 7 x 7 in image
`geom_smooth()` using formula 'y ~ x'
Warning message:
Computation failed in `stat_fit_glance()`:
object of type 'closure' is not subsettable
I have also tried stat_cor from ggpubr, but that seem to be generating different p-values and R^2 values from what I have calculated.
UPDATE BASED ON COMMENTS:
+ stat_poly_eq(formula = y ~ x, method="glm", aes(x = ancestry, y = variable, label = paste(..p.value.label..,sep = "~~~~")),parse = TRUE) fails due to
1: Computation failed in `stat_poly_eq()`:
Method 'glm' not yet implemented.
If I remove method it defaults to a linear regresssion (and gives p values that do not correspond to a logistic regression).
SECOND UPDATE
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
#plot logistic regression curve
plot <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x), mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",after_stat(x_estimate),after_stat(x_p.value))))
ggsave("variable.pdf")
yields the following error:
Saving 7 x 7 in image
`geom_smooth()` using formula 'y ~ x'
Warning message:
Computation failed in `stat_fit_tidy()`:
no applicable method for 'tidy' applied to an object of class "c('glm', 'lm')"
YET ANOTHER UPDATE
library(ggplot2)
library(ggpmisc)
da =read.table('data.txt', header=TRUE)
model = glm(variable ~ ancestry,family=binomial,data=da)
summary(model)
ggplot(da,aes(x = ancestry,y = variable)) + geom_point() +
stat_smooth(method="glm",se=FALSE,method.args = list(family=binomial)) +
stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x),
mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",
after_stat(x_estimate),after_stat(x_p.value))))
ggsave("test.pdf")
works in theory, but the p-value it gives me is very different from the p value that I calculated manually (which corresponds to the one from lrm(variable ~ ancestry, dataset))...
Not sure at all what is going on here...
There is a table on ggpmisc help page that specifies what can be applied to each type of models.
You have a glm, so glance() from tidy will not give you a p-value. Using an example:
library(ggplot2)
library(ggpmisc)
da = MASS::Pima.tr
da$label = as.numeric(da$type=="Yes")
model = glm(label ~ bmi,family=binomial,data=da)
summary(model)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.11156 0.92806 -4.430 9.41e-06 ***
bmi 0.10482 0.02738 3.829 0.000129 ***
You can see glance will not give you a p-value :
broom::glance(model)
# A tibble: 1 x 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 256. 199 -120. 244. 251. 240. 198 200
You need to use tidy() and as #JonSpring mentioned in the comment, provide the formula, so something like this:
ggplot(da,aes(x = bmi,y = label)) + geom_point() +
stat_smooth(method="glm",se=FALSE,method.args = list(family=binomial)) +
stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x),
mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",
after_stat(x_estimate),after_stat(x_p.value))))
Thank you for all the help, but unfortunately nothing automated worked, so I came up with this instead
require(cowplot)
require(ggplot2)
library(ggpmisc)
library(rms)
dataset=read.table('data.txt', header=TRUE)
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
M1 <- glm(variable ~ ancestry, dataset, family = binomial)
M1
M1$null.deviance
M1$deviance
modelChi <- M1$null.deviance - M1$deviance
pseudo.R2 <- modelChi / M1$null.deviance
pseudo.R2
test <-lrm(variable ~ ancestry, dataset)
Chidf <- M1$df.null - M1$df.residual
chisq.prob <- 1 - pchisq(modelChi, Chidf)
chisq.prob
#plot logistic regression curve
all_variable <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + annotate("text", x=-Inf, y=Inf, hjust = 0, vjust = 2.5, label=paste("p-value: ",signif(chisq.prob, digits = 3),"\nR2: ",signif(pseudo.R2, digits = 3),sep="") )+
ggtitle("Title not relevant to Stack Overflow")
ggsave("variable.pdf")

Plotting multiple lm() models in one plot

I have fitted 6 lm() models and 1 gam() model on the same dataset.
Now I want to plot them all in one plot on top of each other. Can I do this without defining the models again in ggplot?
My case is this
I have
model1 <- lm(y~1, data = data) %>% coef()
model2 <- lm(y~x, data = data) %>% coef()
model3 <- lm(y~abs(x), data = data) %>% coef()
...
model7 <- gam(y~s(x), data = data) %>% coef()
can I feed the stored coefficients of my models to ggplot?
ggplot(data, mapping = aes(x = x, y = y)) +
geom_point() +
geom_abline(model1) +
geom_abline(model2) +
....
Or do Is the only way to plot the model prediction lines to manualy fill out the parameters like this:
ggplot(data, mapping = aes(x = x, y = y)) +
geom_point() +
geom_abline(intercept = model1[1]) +
geom_abline(slope = model2[2], intercept = model2[1]) +
geom_abline(slope = model3[2], intercept = model3[1]) +
...
Example code
set.seed(123)
x <- rnorm(50)
y <- rweibull(50,1)
d <- as.data.frame(cbind(x,y))
model1 <- coef(lm(y~1, data = d))
model2 <- coef(lm(y~x, data = d))
model3 <- coef(lm(y~abs(x), data = d))
Including the SE for each line/model and a legend would be welcome as well.
In order for this to work, you really need to save the whole model. So if we assume you have the entire model
# set.seed(101) used for sample data
model1 <- lm(y~1, data = d)
model2 <- lm(y~x, data = d)
model3 <- lm(y~abs(x), data = d)
We can write a helper function to predict new values from these models over a the given range of x values. Here's such a function
newvalsforx <- function(x) {
xrng <- seq(min(x), max(x), length.out=100)
function(m) data.frame(x=xrng, y=predict(m, data.frame(x=xrng)))
}
pred <- newvals(d$x)
This pred() will make predictions from the models over the observed range of x. We can then use these as new data to pass to geom_lines that we can add to a plot. For example
ggplot(d, aes(x,y)) +
geom_point() +
geom_line(data=pred(model1), color="red") +
geom_line(data=pred(model2), color="blue") +
geom_line(data=pred(model3), color="green")
This gives me

How to plot 3 models in one Figure in R?

I'm new with R and I have fit 3 models for my data as follows:
Model 1: y = a(x) + b
lm1 = lm(data$CBI ~ data$dNDVI)
Model 2: y = a(x)2 + b(x) + c
lm2 <- lm(CBI ~ dNDVI + I(dNDVI^2), data=data)
Model 3: y = x(a|x| + b)–1
lm3 = nls(CBI ~ dNDVI*(a*abs(dNDVI) + b) - 1, start = c(a = 1.5, b = 2.7), data = data)
Now I would like to plot all these three models in R but I could not find the way to do it, can you please help me? I have tried with the first two models as follow and it work but I don't know how to add the Model 3 on it:
ggplot(data = data, aes(x = dNDVI, y = CBI)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, size = 1, se = FALSE) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1, se = FALSE ) +
theme_bw()
I also would like to add a legend which show 3 different colours or types of lines/curves for the 3 models as well. Can you please guide me how to make it in the figure?
Using iris as a dummy set to represent the three models:
new.dat <- data.frame(Sepal.Length=seq(min(iris$Sepal.Length),
max(iris$Sepal.Length), length.out=50)) #new data.frame to predict the fitted values for each model
m1 <- lm(Petal.Length ~ Sepal.Length, iris)
m2 <- lm(Petal.Length ~ Sepal.Length + I(Sepal.Length^2), data=iris)
m3 <- nls(Petal.Length ~ Sepal.Length*(a*abs(Sepal.Length) + b) - 1,
start = c(a = 1.5, b = 2.7), data = iris)
new.dat$m1.fitted <- predict(m1, new.dat)
new.dat$m2.fitted <- predict(m2, new.dat)
new.dat$m3.fitted <- predict(m3, new.dat)
new.dat <- new.dat %>% gather(var, val, m1.fitted:m3.fitted) #stacked format of fitted data of three models (to automatically generate the legend in ggplot)
ggplot(new.dat, aes(Sepal.Length, val, colour=var)) +
geom_line()

Stack coefficient plots in R

I'm running a set of models with the same independent variables but different dependent variables and would like to create a set of coefficient plots in one figures in which each model gets its own panel. The following code provides intuition but in this all of the models are integrated into one figure rather than have 3 unique panels side-by-side in one figure:
require("coefplot")
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100), y3 = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
mod2 <- lm(y2 ~ x + z, data = dat)
mod3 <- lm(y3 ~ x + z, data = dat)
multiplot(mod1,mod2, mod3)
Which generates this plot:
Any thoughts on how to get them to panel next to each other in one figure? Thanks!
I haven't used the coefplot package before, but you can create a coefficient plot directly in ggplot2.
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100), y3 = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
mod2 <- lm(y2 ~ x + z, data = dat)
mod3 <- lm(y3 ~ x + z, data = dat)
## Create data frame of model coefficients and standard errors
# Function to extract what we need
ce = function(model.obj) {
extract = summary(get(model.obj))$coefficients[ ,1:2]
return(data.frame(extract, vars=row.names(extract), model=model.obj))
}
# Run function on the three models and bind into single data frame
coefs = do.call(rbind, sapply(paste0("mod",1:3), ce, simplify=FALSE))
names(coefs)[2] = "se"
# Faceted coefficient plot
ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept=0, lty=2, lwd=1, colour="grey50") +
geom_errorbar(aes(ymin=Estimate - se, ymax=Estimate + se, colour=vars),
lwd=1, width=0) +
geom_point(size=3, aes(colour=vars)) +
facet_grid(. ~ model) +
coord_flip() +
guides(colour=FALSE) +
labs(x="Coefficient", y="Value") +
theme_grey(base_size=15)

Resources