Using predict in metafor when each author has multiple rows in the data - r

I'm running a meta-analysis where I'm interested in the effect of X on the effect of age on habitat use (raw mean values and variances) using the metafor package.
An example of one of my models is:
mod6 <-
rma.mv(
yi = Used_value,
V = Used_variance,
slab = Citation,
mods = ~ Age + poly(Slope, degrees = 2),
random = ~ 1 | Region,
data = vel.focal,
method = "ML"
)
My justification for not using Citation as a random effect is that using only Region accounts for more of the heterogeneity than when random = list( ~ 1 | Citation/ID, ~ 1 | Region) or when Citation/ID is used by itself.
What I need for output is the prediction for each age by region, but the predict() function for the model and the associated forest plot spits out the prediction for each row, as it assumes each row in the data is a unique study. In my case it is not as I have my input values separated by age and season.
predict(mod6)
pred se ci.lb ci.ub pi.lb pi.ub
Riehle and Griffith 1993.1 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.2 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.3 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Spina 2000.1 8.7706 2.7386 3.4030 14.1382 -0.7364 18.2776
Spina 2000.2 8.5407 2.7339 3.1824 13.8991 -0.9611 18.0426
Spina 2000.3 8.5584 2.7406 3.1868 13.9299 -0.9509 18.0676
Vondracek and Longanecker 1993.1 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.2 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.3 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Vondracek and Longanecker 1993.4 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Does anybody know a way to modify the arguments inside predict() to tell it how you want your predictions output or to tell it that there are multiple rows per slab?

You need to use the newmods argument to specify the values for Age for which you want predicted values. You will have to plug in something for the linear and quadratic terms for the Slope variable as well (e.g., holding Slope constant at its mean and hence the quadratic term will just be the mean squared). Region is not a fixed effect, so it is not relevant if you want to compute predicted values based on the fixed effects. If you want to compute BLUPs for those random effects, you can do so with ranef(). One can then combine the predictions based on the fixed effects with the BLUPs. That would be the general idea, but implementing this will require a bit of programming.

Related

Unmarked colext function: detection probability = 1

I'm building a single species dynamic occupancy model through the r package "unmarked" with UnmarkedMultFrame setup with the "colext()" function for pika den occupancy across 4 years. However, I want to specify that detection probability (p) = 1 (perfect detection). I want to do this because the detection probability for known dens is near-perfect (many other studies make this assumption too). In theory, this is just a simpler model and kind-of defeats the purpose of an occupancy model, but I'm using it because I need to estimate colonization and extinction probabilities.
Another specification is that all dens we are monitoring were initially occupied the first year we have data for them (so occupancy = 1 for all dens the first year and I am monitoring extinction and re-colonization rates after the first year).
Does anyone know how to specify in Unmarked that p = 1 when using the colext function? This function estimates detection probability, but I'm not sure what it is basing it off of, so I don't know how to either eliminate it from the function entirely or force it to be 1. Here is an example of my data and code:
dets1 <- as.matrix(dets1) #detections (179 total dens sampled once per year for 4 years (lots of NAs))
year <- factor(rep(c(2018, 2019,2020, 2021),179)) #the 4 years we surveyed
UMFdets <- unmarkedMultFrame(y=dets1, numPrimary=4)
m4 <- colext(psiformula = ~1, # First-year occupancy
gammaformula = ~ year, # Colonization
epsilonformula = ~ year, # Extinction
pformula = ~1, #Detection
data = UMFdets)
*simply removing "pformula" doesn't work.
Any ideas or knowledge about this would be much appreciated! Thank you!

I am working on ordered Logit. Tried to solve the estimates and proportional odds using R. Is it correct

Question: Have a look at data set Two.csv. It contains a potentially dependent binary variable Y , and
two potentially independent variables {X1, X2} for each unit of measurement.
(a) Read data set Two.csv into R and have a look at the format of the dependent variable.
Discuss three models which might be appropriate in this data situation. Discuss which
aspects speak in favor of each model, and which aspects against.
(b) Suppose variable Y measures financial ratings A : y = 1, B : y = 2, and C : y = 3, that
is, the creditworthiness A: high, B: intermediate, C: low for unit of measurement firm
i. Model Y by means of an ordered Logit model as a function of {X1,X2} and estimate
your model by means of a built-in command.
(c) Explain the proportional odds-assumption and test whether the assumption is critical
in the context of the data set at hand.
##a) Read data set Two.csv into R and have a look at the format of the dependent variable.
O <- read.table("C:/Users/DELL/Downloads/ExamQEIII2021/Two.csv",header=TRUE,sep=";")
str(O)
dim(O)
View(O)
##b)
library(oglmx)
ologit<- oglmx(y~x1+x2,data=O, link="logit",constantMEAN = FALSE, constantSD = FALSE,
delta=0,threshparam =NULL)
results.ologis <- ologit.reg(y~x1+x2,data=O)
summary(results.ologis)
## x1 1.46251
## x2 -0.45391
margins.oglmx(results.ologis,ascontinuous = FALSE) #Build in command for AMElogit
##c) Explain the proportional odds-assumption and test whether the assumption is critical
#in the context of the data set at hand.
#ordinal Logit WITH proportional odds(PO)
library(VGAM)
a <- vglm(y~x1+x2,family=cumulative(parallel=TRUE),data=O)
summary(a)
#ordinal Logit WITHOUT proportional odds [a considers PO and c doesn't]
c <- vglm(y~x1+x2,family=cumulative(parallel=FALSE),data=O)
summary(c)
pchisq(deviance(a)-deviance(c),df.residual(a)-df.residual(c),lower.tail=FALSE)
## 0.4936413 ## No significant difference in the variance left unexplained. Cannot
#confirm that PO assumption is critical.
#small model
LLa <- logLik(a)
#large model
LLc <- logLik(c)
#2*LLc-2*
df.residual(c)
df.residual(a) #or similarly, via a Likelihood Ratio test.
# or, if you are unsure about the number of degrees of freedom
LL<- 2*(LLc -LLa)
1-pchisq(LL,df.residual(a)-df.residual(c))
## 0.4936413 [SAME AS ## No sign. line]
##Conclusion: Likelihood do not differ significantly with the assumption of non PO.

Grouping Variables in Multilevel Linear Models

I am trying to learn hierarchical models in R and I have generated some sample data for myself. I am having trouble with the correct syntax for coding a multilevel regression problem.
I generated some data for salaries in a Business school. I made the salaries depend linearly on the number of years of employment and the total number of publications by the faculty member. The faculty are in various departments and I made the base salary(intercept) different for each department and also the yearly hike(slopes) different for each department. This way, I have the intercept (base salary) and slope(w.r.t experience in number of years) of the salary depend on the nested level (department) and slope w.r.t another explanatory variable (Publications) not depend on the nested level. What would be the correct syntax to model this in R?
here's my data
Data <-data.frame(Sl_No = c(1:40),
+ Dept = as.factor(sample(c("Mark","IT","Fin"),40,replace = TRUE)),
+ Years = round(runif(40,1,10)))
pubs <-round(Data$Years*runif(40,1,3))
Data$Pubs <- pubs
lookup_table<-data.frame(Dept = c("Mark","IT","Fin","Strat","Ops"),
+ base = c(100000,140000,150000,150000,120000),
+ slope = c(6000,5000,3000,2000,4000))
Data <- merge(Data,lookup_table,by = 'Dept')
salary <-Data$base+Data$slope*Data$Years+Data$Pubs*10000+rnorm(length(Data$Dept))*10000
Data$base<-NULL
Data$slope<-NULL
I have tried the following:
1)
multilevel_model<-lmer(Salary~1|Dept+Pubs+Years|Dept, data = Data)
Error in model.matrix.default(eval(substitute(~foo, list(foo = x[[2]]))), :
model frame and formula mismatch in model.matrix()
2)
multilevel_model<-lmer(`Salary`~ Dept + `Pubs`+`Years`|Dept , data = Data)
boundary (singular) fit: see ?isSingular
I want to see the estimates of the salary intercept and yearly hike by Dept and the estimate of the effect of publication as a standalone (pooled). Right now I am not getting the code to work at all.
I know the base salary and the yearly hike by dept and the effect of a publication (since I generated it).
Dept base Slope
Fin 150000 3000
Mark 100000 6000
Ops 120000 4000
IT 140000 5000
Strat 150000 2000
Every publication increases the salary by 10,000.
ANSWER:
Thanks to #Ben 's answer here I think the correct model is
multilevel_model<-lmer(Salary~(1|Dept)+ Pubs +(0+Years|Dept), data = Data)
This gives me the following fixed effects by running
summary(multilevel_model)
Fixed effects:
Estimate Std. Error t value
(Intercept) 131667.4 10461.0 12.59
Pubs 10235.0 550.8 18.58
Correlation of Fixed Effects:
Pubs -0.081
The Department level coefficients are as follows:
coef(multilevel_model)
$Dept
Years (Intercept) Pubs
Fin 3072.5133 148757.6 10235.02
IT 5156.6774 136710.7 10235.02
Mark 5435.8301 102858.3 10235.02
Ops 3433.1433 118287.1 10235.02
Strat 963.9366 151723.1 10235.02
These are pretty good estiamtes of the original values. Now I need to learn to assess "how good" they are. :)
(1)
multilevel_model<-lmer(`Total Salary`~ 1|Dept +
`Publications`+`Years of Exp`|Dept , data = sample_data)
I can't immediately diagnose why this gives a syntax error, but parentheses are generally recommended around random-effect terms because the | operator has high precedence in formulas. Thus the response/right-hand side (RHS) formula
~ (1|Dept) + (`Publications`+`Years of Exp`|Dept)
might work, except that it would be problematic because both terms contain the same intercept term: if you wanted to do this you'd probably need
~ (1|Dept) + (0+`Publications`+`Years of Exp`|Dept)
(2)
~ Dept + `Publications`+`Years of Exp`|Dept
It doesn't really make any sense to put the same variable (Dept) on both the left- and right-hand sides of the bar.
You should probably use
~ pubs + years_exp + (1 + years_exp|Dept)
Since in principle the effect of publication could vary across departments, the maximal model would be
~ pubs + years_exp + (1 + pubs + years_exp|Dept)
It rarely makes sense to include a random effect without its corresponding fixed effect.
Note that you may get singular fits even if you have the right model; see the ?isSingular man page.
if the 18 observations listed above represent your whole data set, it's very likely too small to fit the maximal model successfully. Rule of thumb is that you need 10-20 observations per parameter estimated, and the maximal model has (intercept + 2 fixed-effect params + (3*4)/2=6 random-effect parameters) = 9 parameters. (Since it's simulated, you can easily simulate a big data set ...)
I'd recommend renaming variables in your data frame so you don't have to fuss with backtick-protecting variable names with spaces in them ...
The GLMM FAQ has more on model specification

Adjusting range of predicted values in ggeffects

I am using ggpredict to plot the marginal effects of temperature (a continuous variable) from a glmm zero-inflated model:
pr1 = ggpredict(mod, "temp", type = "re.zi")
The function is working properly, but only returns predicted values for 7 random temperatures. Does anyone know how to increase the quantity of x values, to 25 for example?
Thanks,
Andrew
According to the section Marginal Effects at Specific Values of the link https://rdrr.io/github/strengejacke/ggeffects/man/ggpredict.html, the syntax for ggpredict looks something like this:
pr1 = ggpredict(mod, terms = "temp", type = "re.zi")
You can take a random sample of any size by inserting sample=n in square brackets after the name of the temp variable (making sure a white space exists between temp and the sample option), e.g. terms = "temp [sample=25]", which will sample 25 values at random from all possible values of the variable temp:
pr1 = ggpredict(mod, terms = "temp [sample=25]", type = "re.zi")

Lambda Issue, or cross validation

I am doing double cross validation with LASSO of glmnet package, however when I plot the results I am getting lambda of 0 - 150000 which is unrealistic in my case, not sure what is wrong I am doing, can someone point me in the right direction. Thanks in advance!
calcium = read.csv("calciumgood.csv", header=TRUE)
dim(calcium)
n = dim(calcium)[1]
calcium = na.omit(calcium)
names(calcium)
library(glmnet) # use LASSO model from package glmnet
lambdalist = exp((-1200:1200)/100) # defines models to consider
fulldata.in = calcium
x.in = model.matrix(CAMMOL~. - CAMLEVEL - AGE,data=fulldata.in)
y.in = fulldata.in[,2]
k.in = 10
n.in = dim(fulldata.in)[1]
groups.in = c(rep(1:k.in,floor(n.in/k.in)),1:(n.in%%k.in))
set.seed(8)
cvgroups.in = sample(groups.in,n.in) #orders randomly, with seed (8)
#LASSO cross-validation
cvLASSOglm.in = cv.glmnet(x.in, y.in, lambda=lambdalist, alpha = 1, nfolds=k.in, foldid=cvgroups.in)
plot(cvLASSOglm.in$lambda,cvLASSOglm.in$cvm,type="l",lwd=2,col="red",xlab="lambda",ylab="CV(10)")
whichlowestcvLASSO.in = order(cvLASSOglm.in$cvm)[1]; min(cvLASSOglm.in$cvm)
bestlambdaLASSO = (cvLASSOglm.in$lambda)[whichlowestcvLASSO.in]; bestlambdaLASSO
abline(v=bestlambdaLASSO)
bestlambdaLASSO # this is the lambda for the best LASSO model
LASSOfit.in = glmnet(x.in, y.in, alpha = 1,lambda=lambdalist) # fit the model across possible lambda
LASSObestcoef = coef(LASSOfit.in, s = bestlambdaLASSO); LASSObestcoef # coefficients for the best model fit
I found the dataset you referring at
Calcium, inorganic phosphorus and alkaline phosphatase levels in elderly patients.
Basically the data are "dirty", and it is a possible reason why the algorithm does not converge properly. E.g. there are 771 year old patients, bisides 1 and 2 for male and female, there is 22 for sex encodeing etc.
As for your case you removed only NAs.
You need to check data.frame imported types as well. E.g. instead of factors it could be imported as integers (SEX, Lab and Age group) which will affect the model.
I think you need:
1) cleanse the data;
2) if doesnot work submit *.csv file

Resources