View slope of computed line Avplot function - r

I'm using the Avplots function in R. The function places a fitted line on the graph, and I'm wondering if there is a way to view the equation of that line. I know I could computationally reproduce it us the lm function, but I'm curious if there is a way to view the "back-end" code being used to produce the graph.
Thanks!
Below is some code. The function takes a linear model followed by the variables you want to create avPlots for (all against the regressor).
avPlots(mlm1,terms=~pctUrban+log(ppgdp))

I am not very familiar with Added-Variable Plots, but had an idea, though I'm not entirely sure what you are looking for. I hope this might be helpful.
Say you have an example using a linear model lm such as this (also from the Car package):
res <- avPlots(lm(prestige~income+education+type, data=Duncan))
This includes data on the prestige and other characteristics of 45 U. S. occupations in 1950.
The returned data res will have the data points for each of the four plots generated (see below). The avPlot function uses lsfit (least squares fit) for the fitted line. This can also be done from the returned data for each factor (e.g., for typeprof):
fit <- lsfit(res$typeprof[,1], res$typeprof[,2])
You could then get your slope from the coefficients (16.7):
fit$coefficients
Intercept X
4.178364e-16 1.665751e+01
As mentioned, this would give the same slopes from the lm model:
Call:
lm(formula = prestige ~ income + education + type, data = Duncan)
Coefficients:
(Intercept) income education typeprof typewc
-0.1850 0.5975 0.3453 16.6575 -14.6611

Related

Syntax error when fitting a Bayesian logistic regression

I am attempting to model binary species traits, where presence is represented by 1 and absence by 0, as a function of some sampling variables. To accomplish this, I have constructed a brms model and added a phylogenetic structure to it. Here is the model I used:
model <- brms::brm(male_head | trials(1 + 0) ~
PC1 + PC2 + PC3 +
(1|gr(phylo, cov = covariance_matrix)),
data = data,
family = binomial(),
prior = prior,
data2 = list(covariance_matrix = covariance_matrix))
Each line of my df represents one observation with a binary outcome.
Initially, I was unsure about which arguments to use in the trials() function. Since my species are non-repeated and some have the traits I'm modeling while others do not, I thought that trials(1 + 0) might be appropriate. I recall seeing a vignette that suggested this, but I can't find it now. Is this syntax correct?
Furthermore, for some reason I'm unaware, the model is producing one estimate value for each line of my predictors. As my df has 362 lines, the model summary displays a lengthy list of 362 estimate values. I would prefer to have one estimate value for each sampling variable instead. Although I have managed to achieve this by making the treatment effect a random effect (i.e., (1|PC1) + (1|PC2) + (1|PC3)), I don't think this is the appropriate approach. I also tried bernoulli() but no success either. Do you have any suggestions for how I can address this issue?
EDIT:
For some reason the values of my sampling variables/principal components were being read as factors. The second part of this question was solved.

How to plot multi-level meta-analysis by study (in contrast to the subgroup)?

I am doing a multi-level meta-analysis. Many studies have several subgroups. When I make a forest plot studies are presented as subgroups. There are 60 of them, however, I would like to plot studies according to the study, then it would be 25 studies and it would be more appropriate. Does anyone have an idea how to do this forest plot?
I did it this way:
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | Author/Study,
test = "t",
method = "REML")
forest(full.model)
It is not clear to me if you want to aggregate to the Author level or to the Study level. If there are multiple rows of data for particular studies, then the model isn't really complete and you would want to add another random intercept for the level of the estimates within studies. Essentially, the lowest random effect should have as many values for nlvls in the output as there are estimates (k).
Let's first tackle the case where we have a multilevel structure with two levels, studies and multiple estimates within studies (for some technical reasons, some might call this a three-level model, but let's not get into this). I will use a fully reproducible example for illustration purposes, using the dat.konstantopoulos2011 dataset, where we have districts and schools within districts. We fit a multilevel model of the type as you have with:
library(metafor)
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res
We can aggregate the estimates to the district level using the aggregate() function, specifying the marginal var-cov matrix of the estimates from the model to account for their non-independence (note that this makes use of aggregate.escalc() which only works with escalc objects, so if it is not, you need to convert the dataset to one - see help(aggregate.escalc) for details):
agg <- aggregate(dat, cluster=dat$district, V=vcov(res, type="obs"))
agg
You will find that if you then fit an equal-effects model to these estimates based on the aggregated data that the results are identical to what you obtained from the multilevel model (we use an equal-effects model since the heterogeneity accounted for by the multilevel model is already encapsulated in vcov(res, type="obs")):
rma(yi, vi, method="EE", data=agg)
So, we can now use these aggregated values in a forest plot:
with(agg, forest(yi, vi, slab=district))
My guess based on your description is that you actually have an additional level that you should include in the model and that you want to aggregate to the intermediate level. This is a tad more complicated, since aggregate() isn't meant for that. Just for illustration purposes, say we use year as another (higher) level and I will mess a bit with the data so that all three variance components are non-zero (again, just for illustration purposes):
dat$yi[dat$year == 1976] <- dat$yi[dat$year == 1976] + 0.8
res <- rma.mv(yi, vi, random = ~ 1 | year/district/school, data=dat)
res
Now instead of aggregate(), we can accomplish the same thing by using a multivariate model, including the intermediate level as a factor and using again vcov(res, type="obs") as the var-cov matrix:
agg <- rma.mv(yi, V=vcov(res, type="obs"), mods = ~ 0 + factor(district), data=dat)
agg
Now the model coefficients of this model are the aggregated values and the var-cov matrix of the model coefficients is the var-cov matrix of these aggregated values:
coef(agg)
vcov(agg)
They are not all independent (since we haven't aggregated to the highest level), so if we want to check that we can obtain the same results as from the multilevel model, we must account for this dependency:
rma.mv(coef(agg), V=vcov(agg), method="EE")
Again, exactly the same results. So now we use these coefficients and the diagonal from vcov(agg) as their sampling variances in the forest plot:
forest(coef(agg), diag(vcov(agg)), slab=names(coef(agg)))
The forest plot cannot indicate the dependency that still remains in these values, so if one were to meta-analyze these aggregated values using only diag(vcov(agg)) as their sampling variances, the results would not be identical to what you get from the full multilevel model. But there isn't really a way around that and the plot is just a visualization of the aggregated estimates and the CIs shown are correct.
You need to specify your own grouping in a new column of data and use this as the new random effect:
df$study_group <- c(1,1,1,2,2,3,4,5,5,5) # example
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | study_group,
test = "t",
method = "REML")
forest(full.model)

Obtaining predicted (i.e. expected) values from the orm function (Ordinal Regression Model) from rms package in R

I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.

Plot the Profile Deviance for a GLM fit in R

I would like to be able to plot the profile deviance for a parameter estimate fitted using the function glm() in R. The profile Deviance is the deviance function for different values of the parameter estimate in question, after estimating all other parameters. I need to plot the deviance for several values around the fitted parameter, to check the assumption of quadratic deviance function.
My model is predicting reconviction of offenders. The formula is of the form:
reconv ~ [other variables] + sex, where reconv is a binary yes/no factor, and sex is binary male/female factor. I'd like to plot the profile deviance of the parameter estimated for sex=female (sex=male is the reference level).
The glm() function estimated the parameter as -0.22, with std error 0.12.
[I'm asking this question because there was no answer I could find, but I worked it out, and wanted to post a solution to be of use to others. Of course, additional help is welcome. :-)]
See the profileModel package by Ioannis Kosmidis. He had a paper in the R Journal (R News it would appear) illustrating the package:
Ioannis Kosmidis. The profilemodel R package: Profiling objectives for models with linear predictors. R News, 8(2):12-18, October 2008.
The PDF is here (entire newsletter).
See ?profile.glm (and example("profile.glm")) in the MASS package -- I think it will do everything you want (this is not loaded by default, but it is mentioned in ?profile, which might have been the first place you looked ...) (Note that the profiles are generally plotted on a signed-square-root scale, so that a truly quadratic profile will appear as a straight line.)
The way I found to do this involves using the offset() function (as detailed in Pawitan, Y. (2001) 'In All Likelihood' p172).
The answers given by #BenBolker and #GavinSimpson are better than this, in that they referenced packages which will do everything this does and a lot more.
I'm posting this because its another way round it, and also, plotting things "manually" is sometimes nice for learning. It taught me a lot.
sexi <- as.numeric(data.frame$sex)-1 #recode a factor as 0/1 numeric
beta <- numeric(60) #Set up vector to Store the betas
deviance <- numeric(60) #Set up vector to Store the deviances
for (i in 1:60){
beta[i] <- 0.5 - (0.01*i)
#A vector of values either side of the fitted MLE (in this case -0.22)
mod <- update(model,
.~. - sex #Get rid of the fitted variable
+ offset( I(sexi*beta[i]) ) #Replace with offset term.
)
deviance[i] <- mod$deviance #Store i'th deviance
}
best <- which.min(deviance)
#Find the index of best deviance. Should be the fitted value from the model
deviance0 <- deviance - deviance[best]
#Scale deviance to zero by subtracting best deviance
betahat <- beta[best] #Store best beta. Should be the fitted value.
stderror <- 0.12187 #Store the std error of sex, found in summary(model)
quadratic <- ((beta-betahat)^2)*(1/(stderror^2))
#Quadratic reference function to check quadratic assumption against
x11()
plot(beta,deviance0,type="l",xlab="Beta(sex)",ylim=c(0,4))
lines(beta,quadratic,lty=2,col=3) #Add quadratic reference line
abline(3.84,0,lty=3) #Add line at Deviance = 3.84

How do I plot predictions from new data fit with gee, lme, glmer, and gamm4 in R?

I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?
Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq

Resources