Specifying that model is logit transformed to plot backtransformed trends - r

I have fitted a lme model in R with a logit transformed response. I have not been able to find a direct command that does the logit transformation so I have done it manually.
logitr<-log(r/1-r)
I then use this as response in my lme model with interaction between two factors and a numerical variable.
model<-lme(logitr<-factor1*factor2*numeric,random=1|random)
Now, R obviously do not know that this model is logit transformed. How can I specify this to R?
I have without luck tried:
update(model, tran="logit")
The reason why I want to specify that the model is logit transformed is because I want to plot the backtransformed results using the function emmip in the emmeans package, showing the trends of the interaction between my variables.
Normally (if I only had factors) I would just use:
update_refgrid_model<-update(ref_grid(model, tran="logit"))
But this approach does not work when I want to use emmip to plot the trends of the interaction between a numerical variable and factors. If I specify:
emmip(update_refgrid_model, factor1~numeric|factor2, cov.reduce = range, type = "response")
then I do not get any trends plotted, only the estimate for the average level on the numerical variable.
So, how can I specify the logit transformation and plot the backtransformed trends of a lme model with factors interacting with numerical variables?

You don't update the model object, you update the reference grid:
rg = update(ref_grid(model, cov.reduce = range), tran = "logit")
emmip(rg, factor1~numeric|factor2, type = "response")
It is possible to update a model with other things, just not the transformation; that is in the update method for emmGrid objects.
Update
Here's an example showing how it works
require(emmeans)
## Loading required package: emmeans
foo = transform(fiber, p = (strength - 25)/25)
foo.lm = lm(log(p/(1-p)) ~ machine*diameter, data = foo)
emm = emmeans(foo.lm, ~diameter|machine,
tran = "logit", at = list(diameter = 15:32))
## Warning in ref_grid(object, ...): There are unevaluated constants in the response formula
## Auto-detection of the response transformation may be incorrect
emmip(emm, machine ~ diameter)
emmip(emm, machine ~ diameter, type = "r")
Created on 2020-06-02 by the reprex package (v0.3.0)

Related

GAM residuals missing in plot

I am applying a GAM model to my data: cell abundance over time.
The model works just fine (although I am aware of a pattern in my resiudals, but this is a different issue not relevant here).
It just fails to display the partial residuals in the final plot, although i set residuals = TRUE. Here is my output:
https://i.stack.imgur.com/C1MlY.png
also I used mgcv package.
Previously this code worked as I wanted, but on different data. Any ideas on why it is not working are welcome!
GAM_EA <- mgcv::gam(EUB_FISH ~ s(Day, by = Heatwave), data = HnH, method = "REML")
gam.check(GAM_EA) #Checking the model
mgcv::anova.gam(GAM_EA) #Retrieving the statistical results. See ?anova.gam
summary.gam(GAM_EA)
plot(GAM_EA, shift = coef(GAM_EA)[1], residuals = TRUE)
See argument by.resid in ?plot.gam. They way these are used in plot.gam would been meaningless for factor by terms unless you were to subset the partial residuals and plot only the residuals for observations in the specific level of the by factor.

Plotting Kaplan-Meier Survival Plots in R

I'm trying to plot a Kaplan-Meier survival plot in R, but I'm having some trouble.
I'm quite new to R, so forgive my terrible code.
library(survival)
data_time = c(0.19,0.75,0.27,0.26,0.22,0.91,0.21,0.091,0.19,0.37,0.093,0.92,0.046,0.93,042)
data_event = c(1,1,1,1,0,0,1,1,0,0,0,1,1,1,0)
surv_object = Surv(time = data_time, event = data_event)
survfit(surv_object)
This of course gives me an error: "The survfit function requires a formula as its first argument".
I've split the data into two vectors, the first for the life-length, and the second for whether or not that specific data point was censored or not, with 0 meaning not censored, and 1 meaning censored.
I thought the Surv function was supposed to produce the formula required for the survfit function, with the default being the Kaplan-Meier.
The survfit function, as the name suggests, serves to fit a survival model, i.e. predicting survival based on some variables. The "formula" is the non-linear y = f(x) model that is fitted, expressed as Surv(...) ~ x1 + ... + xn.
However, it is definitely possible to do a Kaplan-Meier survival plot without any predictors. Just fitting the model on a constant (i.e. 1) should do the trick. Then, I like to use the ggsurvplot function from the survminer package.
install.packages("survminer")
library(survminer)
library(survival)
data_time = c(0.19,0.75,0.27,0.26,0.22,0.91,0.21,0.091,0.19,0.37,0.093,0.92,0.046,0.93,0.42)
data_event = c(1,1,1,1,0,0,1,1,0,0,0,1,1,1,0)
surv_object = Surv(time = data_time, event = data_event)
# Regress on a constant
fit <- survfit(surv_object ~ 1)
# Plot the fit
ggsurvplot(fit, data.frame(time=data_time, event=data_event), conf.int=FALSE)
Of course, the plot will be a lot more interesting if you're fitting some strata.
Note: I assume you missed a period in the last even time, and fixed it.

Plotting random slopes from glmer model using sjPlot

In the past, I had used the sjp.glmer from the package sjPlot to visualize the different slopes from a generalized mixed effects model. However, with the new package, I can't figure out how to plot the individual slopes, as in the figure for the probabilities of fixed effects by (random) group level, located here
Here is the code that, I think, should allow for the production of the figure. I just can't seem to get it in the new version of sjPlot.
library(lme4)
library(sjPlot)
data(efc)
# create binary response
efc$hi_qol = 0
efc$hi_qol[efc$quol_5 > mean(efc$quol_5,na.rm=T)] = 1
# prepare group variable
efc$grp = as.factor(efc$e15relat)
# data frame for 2nd fitted model
mydf <- na.omit(data.frame(hi_qol = as.factor(efc$hi_qol),
sex = as.factor(efc$c161sex),
c12hour = as.numeric(efc$c12hour),
neg_c_7 = as.numeric(efc$neg_c_7),
grp = efc$grp))
# fit 2nd model
fit2 <- glmer(hi_qol ~ sex + c12hour + neg_c_7 + (1|grp),
data = mydf,
family = binomial("logit"))
I have tried to graph the model using the following code.
plot_model(fit2,type="re")
plot_model(fit2,type="prob")
plot_model(fit2,type="eff")
I think that I may be missing a flag, but after reading through the documentation, I can't find out what that flag may be.
Looks like this might do what you want:
(pp <- plot_model(fit2,type="pred",
terms=c("c12hour","grp"),pred.type="re"))
type="pred": plot predicted values
terms=c("c12hour", "grp"): include c12hour (as the x-axis variable) and grp in the predictions
pred.type="re": random effects
I haven't been able to get confidence-interval ribbons yet (tried ci.lvl=0.9, but no luck ...)
pp+facet_wrap(~group) comes closer to the plot shown in the linked blog post (each random-effects level gets its own facet ...)
Ben already posted the correct answer. sjPlot uses the ggeffects-package for marginal effects plot, so an alternative would be using ggeffects directly:
ggpredict(fit2, terms = c("c12hour", "grp"), type="re") %>% plot()
There's a new vignette describing how to get marginal effects for mixed models / random effects. However, confidence intervals are currently not available for this plot-type.
The type = "ri.prob" option in the linked blog-post did not adjust for covariates, that's why I first removed that option and later re-implemented it (correctly) in ggeffects / sjPlot. The confidence intervals shown in the linked blog-post are not correct, either. Once I figure out a way how to obtain CI or prediction intervals, I'll add this option as well.

Can I do a mulitvariate regression with the segmented package in r?

I have FINALLY figured out how to use the segmented package with a uni-variate analysis giving results comparable to what I was expecting. Ultimately though, I have to do a GLM piece-wise regression on a multivariate analysis. The model has some variables that need to be segmented and some that do not as well as categorical variables. Is this possible with the segmented package?
If so, how?
Do I have to keep interactively keep developing models adding one variable to the segmented package at at time?
piecewise <- glm(y ~ x, family = quasipoisson(link = "log"), data = data)
piecewise_seg <- segmented(piecewise, seg.z = ~ x1, psi = 3)
piecewise_seg2 <- segmented(piecewise_seg, seg.z = ~x2 psi = 400)
Or can I do this in one go? If so, how can I set the different psi parameters for each different variable?
Wait, I think I found it towards the end of the package documentation.
2 segmented variables: starting values requested via a named list
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))

How can I get the probability density function from a regression random forest?

I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.

Resources