Confidence intervals for proportions by svypredmeans() - r

I am trying to obtain the 95% CI for proportions which are actually predictive marginal means, as computed with the survey package for R. I'm including this reproducible example that makes no sense content-wise, but hopefully illustrates well my purpose:
library(survey)
data(api)
dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# 1.- marginal means for groups according to a variable
svyglm(I(sch.wide=="Yes") ~ awards+comp.imp, design=dstrat, family=quasipoisson()) %>% svypredmeans(., ~yr.rnd)
# 2.- the confidence intervals I'd like to compute by svyciprop(), in an obviously wrong way
svyby(I(sch.wide=="Yes") ~ awards+comp.imp, ~yr.rnd, design=dstrat, svyciprop, vartype="ci", method="xlogit")
What I can't figure out is how to enter the right arguments into svyciprop(), or if this is even possible. The function svyciprop() takes a single formula, and this does not seem compatible with the way compute predictive marginal means are computed, nor with the output of svypredmeans().
Thanks beforehand for any help!
EDIT Apologies because the code was right, but there was a typo. However, there's a follow-up question: the estimates for the predictive marginal means in step 1 will not match the same estimates in step 2. Can someone shed light on why these differences?

Your code does not work for a very simple reason: R is case-sensitive. Change sch.wide=="yes" to sch.wide=="Yes" and it should work.

Related

SE for logistic regression predictions

I have been tasked with calculating the SE for logistic regression point estimates (where all my predictor variables are factors). I typically use ggpredict to estimate my predictions which provides CI's. However, we are comparing our results to estimates from program MARK and we find readers have a better grasp at understanding our plots with SE as opposed to 95% CI's.
Based on reading the package notes, it appears I can simply calculate (conf.high - predicted value)/1.96). Am I correct? Or am I missing something and that is not the correct way to calculate SE for the predicted estimates. If I am wrong, any ideas on how I can do this or do I need to just use CI's?
Thank you very much for your help.

R language, how to use bootstraps to generate maximum likelihood and AICc?

Sorry for a quite stupid question. I am doing multiple comparisons of morphologic traits through correlations of bootstraped data. I'm curious if such multiple comparisons are impacting my level of inference, as well as the effect of the potential multicollinearity in my data. Perhaps, a reasonable option would be to use my bootstraps to generate maximum likelihood and then generate AICc-s to do comparisons with all of my parameters, to see what comes out as most important... the problem is that although I have (more or less clear) the way, I don't know how to implement this in R. Can anybody be so kind as to throw some light on this for me?
So far, here an example (using R language, but not my data):
library(boot)
data(iris)
head(iris)
# The function
pearson <- function(data, indices){
dt<-data[indices,]
c(
cor(dt[,1], dt[,2], method='p'),
median(dt[,1]),
median(dt[,2])
)
}
# One example: iris$Sepal.Length ~ iris$Sepal.Width
# I calculate the r-squared with 1000 replications
set.seed(12345)
dat <- iris[,c(1,2)]
dat <- na.omit(dat)
results <- boot(dat, statistic=pearson, R=1000)
# 95% CIs
boot.ci(results, type="bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = results, type = "bca")
Intervals :
Level BCa
95% (-0.2490, 0.0423 )
Calculations and Intervals on Original Scale
plot(results)
I have several more pairs of comparisons.
More of a Cross Validated question.
Multicollinearity shouldn't be a problem if you're just assessing the relationship between two variables (in your case correlation). Multicollinearity only becomes an issue when you fit a model, e.g. multiple regression, with several highly correlated predictors.
Multiple comparisons is always a problem though because it increases your type-I error. The way to address that is to do a multiple comparison correction, e.g. Bonferroni-Holm or the less conservative FDR. That can have its downsides though, especially if you have a lot of predictors and few observations - it may lower your power so much that you won't be able to find any effect, no matter how big it is.
In high-dimensional setting like this, your best bet may be with some sort of regularized regression method. With regularization, you put all predictors into your model at once, similarly to doing multiple regression, however, the trick is that you constrain the model so that all of the regression slopes are pulled towards zero, so that only the ones with the big effects "survive". The machine learning versions of regularized regression are called ridge, LASSO, and elastic net, and they can be fitted using the glmnet package. There is also Bayesian equivalents in so-called shrinkage priors, such as horseshoe (see e.g. https://avehtari.github.io/modelselection/regularizedhorseshoe_slides.pdf). You can fit Bayesian regularized regression using the brms package.

Plotting backtransformed data with LS means plot

I have used the package lsmeans in R to get the average estimate for all observations for my treatment factor (across the levels of a block factor in the experimental design that has been included with systematic effect because it only had 3 levels). I have used a sqrt transformation for my response variable.
Thus I have used the following commands in R.
First defining model
model<-sqrt(response)~treatment+block
Then applying lsmeans
model_lsmeans<-lsmeans(model,~treatment)
Then plotting this
plot(model_lsmeans,ylab="treatment", xlab="response(with 95% CI)")
This gives a very nice graph with estimates and 95% confidense intervals for the different treatment.
The problems is just that this graph is for the transformed response.
How do I get this same plot with the backtransformed response (so the squared response)?
I have tried to create a new data frame and extract the lsmean, lower.CL, and upper.CL:
a<-summary(model_lsmeans)
New_dataframe<-as.data.frame(a[c("treatment","lsmean","lower.CL","upper.CL")])
And then make these squared
New_dataframe$lsmean<-New_dataframe$lsmean^2
New_dataframe$lower.CL<-New_dataframe$lower.CL^2
New_dataframe$upper.CL<-New_dataframe$upper.CL^2
New_dataframe
This gives me the estimates and CI boundaries squared that I need.
The problem is that I cannot make the same graph for thise estimates and CI as the one that I did in LS means above.
How can I do this? The reason that I ask is that I want to have graphs that are all of a similar style for my article. Since I very much like this LSmeans plot, and it is very convenient for me to use on the non-transformed response variables, I would like to have all my graphs in this style.
Thank you very much for your help! Hope everything is clear!
Kind regards
Ditlev

Meta analysis in R with adjusted ORs

I would like to calculate a summary odds ratio value for two or more papers where the only information I have is the individual odds ratios with their 95% confidence intervals. Is this possible? I have been poking around in the meta package, and only figured out how to do it with crude counts.
Thanks so much!
It is quite simple.
You just need to use the natural logarithm of the odds ratio (logOR), and its standard errror (and corresponding variance). These can be easily back-calculated from the 95% confidence intervals according to the normal distribution. Finally, pool logORs with their variance.
For instance, after you have built a data frame (eg called mydata) with logOR and variance for each study, you can easily proceed with a random effect meta-analysis with the metafor package in R as follows:
res <- rma(logOR, variance, data=mydata, method="DL")
forest(res)
In the future, you may consider posting similar questions in CrossValidated.

How to get 95% CIs using ezANOVA()

This is a programming question for people who like to use the ez package in R. I am accustomed to using linear mixed effects models with lmer(). Among the useful outputs of lmer (), I get a coefficient value for each of my experimental factors, and using pvals.fnc() I can easily get 95% confidence intervals (CI) to report together with the model coefficients.
I have recently started using ezANOVA, and I would like to know: Is there a mainstream way to get the same output? That is, I'd like to get a value for the coefficient of an experimental factor and a CI to go along with it. Here is sample code to make this concrete:
library(languageR) #necessary to use pvals.fnc()
library(lme4) #necessary for lmer()
library(ez) #necessary for ezANOVA
data(ANT) #load sample data
If I were using lmer, I would estimate my model and then get 95% CIs for the coefficients:
model_lmer = lmer( formula = rt ~ cue*flank + (1|subnum), data = ANT)
pvals.fnc(model_lmer, withMCMC=T)$fixed
So, for example, I know that the estimate of the interaction between cue and flank (when cue has the level "center" and flank has the level "congruent") is -3.9511 and the 95% CI is [-12.997, 5.535]
Now say that I want to run an anova by-subjects and by-items using ezANOVA, and I want to get 95% CIs for the by-subject estimates. This is my model:
model.f1 = ezANOVA(data=ANT, dv=rt,wid=subnum,within=.(cue,flank),return_aov=T)
But in the output, I don't see the model estimates when I do:
model.f1$ANOVA
And I don't know how to calculate the 95% CIs corresponding to those estimates. I think I should be able to use ezBoot() but I tried and I'm not sure how to implement it.
Any suggestions? Thanks for your help!
This answer was provided by the author of the "ez" package in another forum. I'm copying it here in case someone else finds it useful:
"One somewhat hacky way to get CIs for effects is to use ezStats () to get the means
and FLSD, compute the difference between the means to get the effect,
and divide the FLSD by sqrt(2) to get the CI"

Resources