obtain the probability equation represented by plotmo plots - r

I want to obtain the equations of the probability functions represented by plotmo (R). This is the equations of the model when varying one or two predictors while holding the other predictors constant in the mean value. I want an easy way to obtain the mathematical equation because a must to make to many models with different variables.
if my model is like this:
glm(formula = pres_aus ~ pH_sp + Annual_prec + I(pH_sp^2) + I(Annual_prec^2), family = binomial(link = "logit"), data = puntos_calibrado)
how can i make it?

No data example provided, so no testing done, but couldn't you just skip the construction of a symbolic expression and do something along the lines of:
model.matrix(data.frame(one=1, dat) ) %*% coef(mdl.fit)
# where mdl.fit is returned from glm()
In a sense this is the R matrix representation of the formula: sum( beta_i*X_1). If you want to specify a mean value for a particular column then just pull that dataframe apart and use only parts of it for a calculation. So for the first column held at the mean:
model.matrix(data.frame(one=1, mn1 =mean(dat[[1]]), dat[-1]) ) %*%
coef(mdl.fit)

Related

How to plot multi-level meta-analysis by study (in contrast to the subgroup)?

I am doing a multi-level meta-analysis. Many studies have several subgroups. When I make a forest plot studies are presented as subgroups. There are 60 of them, however, I would like to plot studies according to the study, then it would be 25 studies and it would be more appropriate. Does anyone have an idea how to do this forest plot?
I did it this way:
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | Author/Study,
test = "t",
method = "REML")
forest(full.model)
It is not clear to me if you want to aggregate to the Author level or to the Study level. If there are multiple rows of data for particular studies, then the model isn't really complete and you would want to add another random intercept for the level of the estimates within studies. Essentially, the lowest random effect should have as many values for nlvls in the output as there are estimates (k).
Let's first tackle the case where we have a multilevel structure with two levels, studies and multiple estimates within studies (for some technical reasons, some might call this a three-level model, but let's not get into this). I will use a fully reproducible example for illustration purposes, using the dat.konstantopoulos2011 dataset, where we have districts and schools within districts. We fit a multilevel model of the type as you have with:
library(metafor)
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res
We can aggregate the estimates to the district level using the aggregate() function, specifying the marginal var-cov matrix of the estimates from the model to account for their non-independence (note that this makes use of aggregate.escalc() which only works with escalc objects, so if it is not, you need to convert the dataset to one - see help(aggregate.escalc) for details):
agg <- aggregate(dat, cluster=dat$district, V=vcov(res, type="obs"))
agg
You will find that if you then fit an equal-effects model to these estimates based on the aggregated data that the results are identical to what you obtained from the multilevel model (we use an equal-effects model since the heterogeneity accounted for by the multilevel model is already encapsulated in vcov(res, type="obs")):
rma(yi, vi, method="EE", data=agg)
So, we can now use these aggregated values in a forest plot:
with(agg, forest(yi, vi, slab=district))
My guess based on your description is that you actually have an additional level that you should include in the model and that you want to aggregate to the intermediate level. This is a tad more complicated, since aggregate() isn't meant for that. Just for illustration purposes, say we use year as another (higher) level and I will mess a bit with the data so that all three variance components are non-zero (again, just for illustration purposes):
dat$yi[dat$year == 1976] <- dat$yi[dat$year == 1976] + 0.8
res <- rma.mv(yi, vi, random = ~ 1 | year/district/school, data=dat)
res
Now instead of aggregate(), we can accomplish the same thing by using a multivariate model, including the intermediate level as a factor and using again vcov(res, type="obs") as the var-cov matrix:
agg <- rma.mv(yi, V=vcov(res, type="obs"), mods = ~ 0 + factor(district), data=dat)
agg
Now the model coefficients of this model are the aggregated values and the var-cov matrix of the model coefficients is the var-cov matrix of these aggregated values:
coef(agg)
vcov(agg)
They are not all independent (since we haven't aggregated to the highest level), so if we want to check that we can obtain the same results as from the multilevel model, we must account for this dependency:
rma.mv(coef(agg), V=vcov(agg), method="EE")
Again, exactly the same results. So now we use these coefficients and the diagonal from vcov(agg) as their sampling variances in the forest plot:
forest(coef(agg), diag(vcov(agg)), slab=names(coef(agg)))
The forest plot cannot indicate the dependency that still remains in these values, so if one were to meta-analyze these aggregated values using only diag(vcov(agg)) as their sampling variances, the results would not be identical to what you get from the full multilevel model. But there isn't really a way around that and the plot is just a visualization of the aggregated estimates and the CIs shown are correct.
You need to specify your own grouping in a new column of data and use this as the new random effect:
df$study_group <- c(1,1,1,2,2,3,4,5,5,5) # example
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | study_group,
test = "t",
method = "REML")
forest(full.model)

How to specify random effects names in a newdata data.frame used in predict() function? - lme4

I have a problem using the predict() function in lme4.
More precisely, I am not very clear on how to declare the names of random effects to be used in the newdata data frame, which I feed the predict() function with, in order to get some predictions.
I will try and describe my problem in detail.
Data
The data I am working with is longitudinal. I have 119 observations, for each of which I have several (6-7) measurements for each observation, which represent the size of proteins, which aggregate in time and grow bigger (let's call it LDL).
Model
The model used to describe this process is a Richard's curve (generalized logistic function), which can be written as
Now, I fit a separate curve for the group of measurements of each observation, with the following fixed, random effects, and variables:
alpha_fix - a fixed effect for alpha
alpha|Obs - a random effect for alpha, which varies among observations
gamma_fix - a fixed effect for gamma
gamma|Obs - a random effect for gamma, which varies among observations
delta_f - a fixed effect
Time - a continuous variable, time in hours
LDL - response variable, continuous, representing size of proteins at time point t.
Predictions
Once I fit the model, I want to use it to predict the value of LDL at a specific time point, for each observation. In order to do this, I need to use the predict function and assign a data frame for newdata. reading through the documentation here, it says the following:
If any random effects are included in re.form (see below), newdata
must contain columns corresponding to all of the grouping variables
and random effects used in the original model, even if not all are
used in prediction; however, they can be safely set to NA in this case
Now, the way I understand this, I need to have a data frame newdata, which in my case contains the following columns: "Time", "Obs", "alpha_f", "gamma_f", "delta_f", as well as two columns for the random effects of alpha and gamma, respectively. However, I am not sure how these two columns with random effects should be named, in order for the predict() function to understand them. I tried with "alpha|Obs" and "gamma|Obs", as well as "Obs$alpha", "Obs$gamma", but both throw the error
Error in FUN(X[[i]], ...) : random effects specified in re.form
that were not present in original model.
I was wondering whether anyone has any idea what the correct way to do this is.
For completeness, the code used to fit the model is provided below:
ModelFunction = function (alpha, gamma, delta, Time) {
15 + (alpha-15) / (1 + exp ((gamma-Time) / delta))
}
ModelGradient = deriv(body(ModelFunction) [[2]], namevec = c ("alpha", "gamma", "delta"), function.arg = ModelFunction)
starting_conditions = c (alpha = 5000, gamma = 1.5, delta = 0.2) # Based on visual observation
fit = nlmer (
LDL ~
ModelGradient (alpha, gamma, delta, Time)
~ (gamma | Obs) + (alpha | Obs),
start = starting_conditions,
nlmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)),
data = ldlData)
I would really appreciate it if someone could give me some advice.
Thanks for your time!

How does one extract hat values and Cook's Distance from an `nlsLM` model object in R?

I'm using the nlsLM function to fit a nonlinear regression. How does one extract the hat values and Cook's Distance from an nlsLM model object?
With objects created using the nls or nlreg functions, I know how to extract the hat values and the Cook's Distance of the observations, but I can't figure out how to get them using nslLM.
Can anyone help me out on this? Thanks!
So, it's not Cook's Distance or based on hat values, but you can use the function nlsJack in the nlstools package to jackknife your nls model, which means it removes every point, one by one, and bootstraps the resulting model to see, roughly speaking, how much the model coefficients change with or without a given observation in there.
Reproducible example:
xs = rep(1:10, times = 10)
ys = 3 + 2*exp(-0.5*xs)
for (i in 1:100) {
xs[i] = rnorm(1, xs[i], 2)
}
df1 = data.frame(xs, ys)
nls1 = nls(ys ~ a + b*exp(d*xs), data=df1, start=c(a=3, b=2, d=-0.5))
require(nlstools)
plot(nlsJack(nls1))
The plot shows the percentage change in each model coefficient as each individual observation is removed, and it marks influential points above a certain threshold as "influential" in the resulting plot. The documentation for nlsJack describes how this threshold is determined:
An observation is empirically defined as influential for one parameter if the difference between the estimate of this parameter with and without the observation exceeds twice the standard error of the estimate divided by sqrt(n). This empirical method assumes a small curvature of the nonlinear model.
My impression so far is that this a fairly liberal criterion--it tends to mark a lot of points as influential.
nlstools is a pretty useful package overall for diagnosing nls model fits though.

Calculating the standard error of parameters in nlme

I am running a non-linear mixed model in nlme, and I am having trouble calculating the standard errors of the three parameters. We have our final model here:
shortG.nlme9 <- update(shortG.nlme6,
fixed = Asym + xmid + scal ~ Treatment * Breed + Environment,
start = c(shortFix6[1:16], rep(0,2),
shortFix6[17:32], rep(0,2),
shortFix6[33:48], rep(0,2)),
control = nlmeControl(pnlsTol = 0.02, msVerbose = TRUE))
And when we plug it in with the summary statement, we can get the standard errors of each of the treatments, breeds, treatment*breed interactions, and environments. However, we are looking at making growth curves for specific combinations (treatment1/breed1, treatment2/breed1, treatment3/breed1, etc), so we need to combine effects of treatment, breed, and the environments for the parameter values, and logically combine their standard errors to get the SE of the full parameter. To do this, is there either a way to get R to come up with the full SE on its own, or is there an easy way to have R give us a covariate matrix so we can calculate the values by hand? When we look at the basic statistics by simply plugging in the summary(shortG.nlme9) statement, we are automatically given a correlation matrix, so is there something we could write in for a covariate matrix instead?

Separate B-spline between knots and compare goodness of fit on each piece

I am working on some B-spline regression (first degree, single knot, not very complicated) and would like to compare the parameter estimates for the portion before and the portion after the internal knot.
Right now I have something like this:
fit <- lm(y ~ bs(x, degree = 1, knots = 20), data = bar)
The fit then has the intercept estimate and two sets of parameter estimates.
I am interested in comparing those two parameter sets against one an other - does anyone know to divide the bs model object or extract those two sub-models? Or, convert an F value to a p value in R so I can do those tests manually.

Resources