Warming : Rank deficient and missing values - r

I am running the following code:
ggplot(data= data_nickel_t, aes( x=index(data_nickel_t), y= log(ni_demand) )) +
scale_x_yearqtr(format = "%Y-%q", n = 14) +
geom_point() + stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', aes(colour = "linear fit"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 2), size = 1, aes(colour = "quadratic"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 3), size = 1, aes(colour = "polynomial"), se= FALSE ) +
ggtitle("Global Refined Nickel Demand") +
xlab("Time") +
ylab("Thousand Metric Tons")
The code above produce a graph with three fitted lines but I get the following warning messages:
1: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se,
prediction from a rank-deficient fit may be misleading;
2: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se, :
prediction from a rank-deficient fit may be misleading;
3: Removed 94 rows containing missing values (geom_pointrange).
My first impression was collinearity between time trends variable in poly() function. I might estimate numerical model to check this further. As for the missing value issue, e.g this link explain the reasons for missing k rows. When I tried solutions suggested in that link, it does not work in my case, I still get the same error. I have 94 observations. I also don't have zeros in my data so no reason for log transformation to drop my values. I am still kind of new using r with time series any idea how I may fix the missing value warning?

Related

Binomial logit model with glmer doesn't yield a good fit to sigmoidal data

I am trying to fit a model to my data, which has a dependent variable that can be 0 or 1.
I tried to fit a binomial glmer to the data, but the fit is pretty bad as you can see below. This puzzles me because this is quite a sigmoid so I thought I would get a great fit with that kind of model? Am I using the wrong model?
(color is my data, black is the fit)
Here is the code I used on r
library(lme4)
library(ggplot2)
exdata <- read.csv("https://raw.githubusercontent.com/FlorianLeprevost/dummydata/main/exdata.csv")
model=glmer(VD~ as.factor(VI2)*VI1 + (1|ID),exdata,
family=binomial(link = "logit"),
control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=2e5)))
summary(model)
exdata$fit=predict(model, type = "response")
ggplot(exdata,aes(VI1, VD, color=as.factor(VI2),
group=as.factor(VI2))) +
stat_summary(geom="line", size=0.8) +
stat_summary(aes(y=(fit)),geom="line", size=0.8, color="black") +
theme_bw()
And I tried without the random effect to see if it would change but no...
ggplot(exdata, aes(x=VI1, y=VD, color=as.factor(VI2),
group=as.factor(VI2))) +
stat_summary(fun.data=mean_se, geom="line", size=1)+
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial), color='black')
Here is the data:https://github.com/FlorianLeprevost/dummydata/blob/main/exdata.csv
tl;dr I don't think these data are as sigmoidal as you think. In particular, a logistic regression estimates a sigmoid curve that ranges from 0 to 1, whereas yours levels out (sort of) at 0.9. In much more detail:
slightly streamlined data import/model fitting
library(lme4)
library(ggplot2)
library(dplyr)
exdata <- (read.csv("https://raw.githubusercontent.com/FlorianLeprevost/dummydata/main/exdata.csv")
|> mutate(across(VI2, factor))
)
model <- glmer(VD~ VI2*VI1 + (1|ID),
exdata,
family=binomial(link = "logit"))
compute data summary and predictions
This can also be done with stat_summary(), but I like the finer control of doing it myself. In particular, I like to get Clopper-Pearson CIs on the proportions (could also do this with prop.test() to get score-test CIs). I'm also computing predictions across a wider VI1-range than the data (see why below).
ddsum <- (exdata
|> group_by(VI1, VI2)
|> summarise(
num = n(),
pos = sum(VD),
VD = mean(VD),
lwr = binom.test(pos, num)$conf.int[1],
upr = binom.test(pos, num)$conf.int[2],
.groups = "drop")
)
pframe <- expand.grid(
VI1 = seq(-50, 50, length = 101),
VI2 = unique(exdata$VI2))
pframe$VD <- predict(model, newdata = pframe, re.form = ~0, type = "response")
plot
gg0 <- ggplot(ddsum,aes(x=VI1, y=VD, color=VI2)) +
geom_pointrange(position = position_dodge(width = 0.3),
aes(ymin = lwr, ymax = upr, size = num), alpha = 0.5) +
scale_size_area(max_size = 4) +
theme_bw()
gg1 <- gg0 + geom_line(data = pframe)
ggsave(g1, file = "gglogist1.png")
Conclusion: the sharp increase from x=0 to x=15 combined with the saturation below 1.0 makes it hard to fit with a logistic curve.
We could try a quadratic-logistic fit:
model2 <- update(model, . ~ poly(VI1,2)*VI2 + (1|ID))
pframe$VD2 <- predict(model2, newdata = pframe, re.form = ~0, type = "response")
gg2 <- gg1 + geom_line(data=pframe, aes(y=VD2), linetype = 2)
ggsave(gg2, file = "gglogist2.png")
This fits better (it couldn't fit worse), but might not make sense for your application.
In principle we could fit a logistic that saturated at a value <1, but it's a bit tricky ...

How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?

The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)

Cannot plot p-value on simple logistic regression

I am trying to plot a simple logistic regression in R.
I am following this tutorial to conduct the logistic regression and calculate a P-value (https://mgimond.github.io/Stats-in-R/Logistic.html). I am trying to use ggplot2 and ggpmisc to plot the regression. I have been trying to use this guide (http://cran.nexr.com/web/packages/ggpmisc/vignettes/user-guide-1.html#stat_fit_glance) to stat_fit_glance to display a p-value
require(cowplot)
require(ggplot2)
library(ggpmisc)
library(rms)
dataset=read.table('input.txt', header=TRUE)
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
#plot logistic regression curve
plot <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + stat_fit_glance(method = "glm", method.args = list(formula = formula), geom = "text", aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")))
ggsave("output.pdf")
The output however comes out as
> source("C:/Users/Deven/Desktop/logistic/script.R")
Saving 7 x 7 in image
`geom_smooth()` using formula 'y ~ x'
Warning message:
Computation failed in `stat_fit_glance()`:
object of type 'closure' is not subsettable
I have also tried stat_cor from ggpubr, but that seem to be generating different p-values and R^2 values from what I have calculated.
UPDATE BASED ON COMMENTS:
+ stat_poly_eq(formula = y ~ x, method="glm", aes(x = ancestry, y = variable, label = paste(..p.value.label..,sep = "~~~~")),parse = TRUE) fails due to
1: Computation failed in `stat_poly_eq()`:
Method 'glm' not yet implemented.
If I remove method it defaults to a linear regresssion (and gives p values that do not correspond to a logistic regression).
SECOND UPDATE
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
#plot logistic regression curve
plot <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x), mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",after_stat(x_estimate),after_stat(x_p.value))))
ggsave("variable.pdf")
yields the following error:
Saving 7 x 7 in image
`geom_smooth()` using formula 'y ~ x'
Warning message:
Computation failed in `stat_fit_tidy()`:
no applicable method for 'tidy' applied to an object of class "c('glm', 'lm')"
YET ANOTHER UPDATE
library(ggplot2)
library(ggpmisc)
da =read.table('data.txt', header=TRUE)
model = glm(variable ~ ancestry,family=binomial,data=da)
summary(model)
ggplot(da,aes(x = ancestry,y = variable)) + geom_point() +
stat_smooth(method="glm",se=FALSE,method.args = list(family=binomial)) +
stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x),
mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",
after_stat(x_estimate),after_stat(x_p.value))))
ggsave("test.pdf")
works in theory, but the p-value it gives me is very different from the p value that I calculated manually (which corresponds to the one from lrm(variable ~ ancestry, dataset))...
Not sure at all what is going on here...
There is a table on ggpmisc help page that specifies what can be applied to each type of models.
You have a glm, so glance() from tidy will not give you a p-value. Using an example:
library(ggplot2)
library(ggpmisc)
da = MASS::Pima.tr
da$label = as.numeric(da$type=="Yes")
model = glm(label ~ bmi,family=binomial,data=da)
summary(model)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.11156 0.92806 -4.430 9.41e-06 ***
bmi 0.10482 0.02738 3.829 0.000129 ***
You can see glance will not give you a p-value :
broom::glance(model)
# A tibble: 1 x 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 256. 199 -120. 244. 251. 240. 198 200
You need to use tidy() and as #JonSpring mentioned in the comment, provide the formula, so something like this:
ggplot(da,aes(x = bmi,y = label)) + geom_point() +
stat_smooth(method="glm",se=FALSE,method.args = list(family=binomial)) +
stat_fit_tidy(method = "glm",method.args = list(family=binomial,formula=y~x),
mapping = aes(label = sprintf("Coef = %.3g\np-value = %.3g",
after_stat(x_estimate),after_stat(x_p.value))))
Thank you for all the help, but unfortunately nothing automated worked, so I came up with this instead
require(cowplot)
require(ggplot2)
library(ggpmisc)
library(rms)
dataset=read.table('data.txt', header=TRUE)
model <- glm(variable ~ ancestry, data=dataset, family=binomial)
summary(model)
M1 <- glm(variable ~ ancestry, dataset, family = binomial)
M1
M1$null.deviance
M1$deviance
modelChi <- M1$null.deviance - M1$deviance
pseudo.R2 <- modelChi / M1$null.deviance
pseudo.R2
test <-lrm(variable ~ ancestry, dataset)
Chidf <- M1$df.null - M1$df.residual
chisq.prob <- 1 - pchisq(modelChi, Chidf)
chisq.prob
#plot logistic regression curve
all_variable <- ggplot(dataset, aes(x=ancestry, y=variable)) +
geom_point(alpha=.5, color=dataset$colorsite) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) + annotate("text", x=-Inf, y=Inf, hjust = 0, vjust = 2.5, label=paste("p-value: ",signif(chisq.prob, digits = 3),"\nR2: ",signif(pseudo.R2, digits = 3),sep="") )+
ggtitle("Title not relevant to Stack Overflow")
ggsave("variable.pdf")

Exponential Decay in ggplot2

I'd love some help with this. I'm trying to put an exponential decay curve onto some vehicle data I have. I've been searching through Stack Overflow and none of the answers have been helpful.
This is my current code that's not working. It's based off the ggplot2 documentation and it's still not working.
plot <- ggplot(data = rawData, aes(x = Mileage, y = Cost, color = Car)) + geom_point() + stat_smooth(method = 'nls', formula = y ~ a*exp(b *-x), se = FALSE, start = list(a=1,b=1))
plot
It plots my data but doesn't show a curve.
I can't embed photos for some reason so here it is
The current warning messages I receive are:
1: In (function (formula, data = parent.frame(), start, control =
nls.control(), : No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'. Consider specifying 'start' or using a
selfStart model 2: Computation failed in stat_smooth(): singular
gradient matrix at initial parameter estimates
I tried these other options too, to no avail.
ggplot(mtcars, aes(x = Mileage, y = Cost)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * b), se = FALSE,
method.args = list(start = list(a = 1, b = 1)))
Which resulted in an error message of:
Computation failed in stat_smooth(): Missing value or an infinity
produced when evaluating the model
And I tried this too
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE,
method.args = list(start = list(a = 1, b = 1),
lower = c(0),
algorithm = "port"))
Which resulted in an error message of:
Computation failed in stat_smooth(): singular gradient matrix at
initial parameter estimates
UPDATE
If I divide all my values by 100,000, all of sudden the trendline works, albeit without confidence intervals. I have no idea why this works and doesn't provide me with an acceptable answer since all my axis values are now off by 100,000.
rawData %>% mutate(Mileage = Mileage / 100000,
Cost = Cost / 100000) %>%
ggplot(aes(x = Mileage, y = Cost, color = Car)) +
geom_point() + stat_smooth(method = "nls", formula = y ~ a * exp(x * -b), se = FALSE)
Here is my data - https://docs.google.com/spreadsheets/d/1SKhkqHK-qFGG8IST67iUhMIIdvA_k6htVid7lAwCb3A/edit?usp=sharing

ggplot GLM fitted curve without interaction

I want to add the fitted function from GLM on a ggplot. By default, it automatically create the plot with interaction. I am wondering, if I can plot the fitted function from the model without interaction. For example,
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
plt <- ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point(size = 2) +
geom_smooth(method = "glm", , se = F,
method.args = list(family = "poisson"))
print(plt)
gives the plot with interaction,
However, I want the plot from the model,
`num_awards` = ß0 + ß1*`math` + ß2*`prog` + error
I tried to get this this way,
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
fun.gen <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd)
fun.acd <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[3])
fun.voc <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[4])
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
stat_function(fun = fun.gen, col = "red") +
stat_function(fun = fun.acd, col = "green") +
stat_function(fun = fun.voc, col = "blue") +
geom_smooth(method = "glm", se = F,
method.args = list(family = "poisson"), linetype = "dashed")
The output plot is
Is there any simple way in ggplot to do this efficiently?
Ben's idea of plotting predicted value of the response for specific model terms inspired me improving the type = "y.pc" option of the sjp.glm function. A new update is on GitHub, with version number 1.9.4-3.
Now you can plot predicted values for specific terms, one which is used along the x-axis, and a second one used as grouping factor:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"))
which gives you following plot:
The vars argument is needed in case your model has more than two terms, to specify the term for the x-axis-range and the term for the grouping.
You can also facet the groups:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"), show.ci = T, facet.grid = T)
There's no way that I know of to trick geom_smooth() into doing this, but you can do a little better than you've done. You still have to fit the model yourself and add the lines, but you can use the predict() method to generate the predictions and load them into a data frame with the same structure as the original data ...
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
## generate prediction frame
pframe <- with(dta,
expand.grid(math=seq(min(math),max(math),length=51),
prog=levels(prog)))
## add predicted values (on response scale) to prediction frame
pframe$num_awards <- predict(mod,newdata=pframe,type="response")
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
geom_smooth(method = "glm", se = FALSE,
method.args = list(family = "poisson"), linetype = "dashed")+
geom_line(data=pframe) ## use prediction data here
## (inherits aesthetics etc. from main ggplot call)
(the only difference here is that the way I've done it the predictions span the full horizontal range for all groups, as if you had specified fullrange=TRUE in geom_smooth()).
In principle it seems as though the sjPlot package should be able to handle this sort of thing, but it looks like the relevant bit of code for doing this plot type is hard-coded to assume a binomial GLM ... oh well.
I'm not sure, but you wrote "without interaction" - maybe you are looking for effect plots? (If not, excuse me that I'm assuming something completely wrong...)
You can, for instance, use the effects package for this.
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
library(effects)
plot(allEffects(mod))
Another option would be the sjPlot package, as Ben suggested - however, the current version on CRAN only supports logistic regression models properly for effect plots. But in the current development version on GitHub I added support for various model families and link functions, so if you like, you can download that snapshot. The sjPlot package uses ggplot instead of lattice (which is used by the effects package, I think):
sjp.glm(mod, type = "eff", show.ci = T)
Or in non-faceted way:
sjp.glm(mod, type = "eff", facet.grid = F, show.ci = T)

Resources