I'm working on a dataset with several different types of proteins as columns. It kinds of looks like this This is simplified, the original dataset contains over 100 types of proteins. I wanted to see if the concentration of a protein differs by treatments when taking random effect (=id) into consideration. I managed to run multiple repeated ANOVA at once. But I would also like to do pairwise comparisons for all proteins based on the treatment. The first thing came to my mind was using emmeans package, but I had trouble coding this.
#install packages
library(tidyverse)
library(emmeans)
#Create a data set
set.seed(1)
id <- rep(c("1","2","3","4","5","6"),3)
Treatment <- c(rep(c("A"), 6), rep(c("B"), 6),rep(c("C"), 6))
Protein1 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein2 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein3 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
DF <- data.frame(id, Treatment, Protein1, Protein2, Protein3) %>%
mutate(id = factor(id),
Treatment = factor(Treatment, levels = c("A","B","C")))
#First, I tried to run multiple anova, by using lapply
responseList <- names(DF)[c(3:5)]
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
aov(mF, data = DF)
})
lapply(modelList, summary)
#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment")
> wt_emm <- emmeans(modelList, "Treatment")
Error in ref_grid(object, ...) : Can't handle an object of class “list”
Use help("models", package = "emmeans") for information on supported models.
So I tried a different approach
anova2 <- aov(cbind(Protein1,Protein2,Protein3)~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
#Pairwise comparison using emmeans.
#I got only result for the whole dataset, instead of by different types of protein.
wt_emm2 <- emmeans(anova2, "Treatment")
pairs(wt_emm2)
> pairs(wt_emm2)
contrast estimate SE df t.ratio p.value
A - B -1.704 1.05 10 -1.630 0.2782
A - C 0.865 1.05 10 0.827 0.6955
B - C 2.569 1.05 10 2.458 0.0793
I don't understand why even if I used "cbind(Protein1, Protein2, Protein3)" in the anova model. R still only gives me one result instead of something like the following
this is what I was hoping to get
> Protein1
contrast
A - B
A - C
B - C
> Protein2
contrast
A - B
A - C
B - C
> Protein3
contrast
A - B
A - C
B - C
How do I code this or should I try a different package/function?
I don't have trouble running one protein at a time. However, since I have over 100 proteins to run, it would be really time-consuming to code them one by one.
Any suggestion is appreciated. Thank you!
Here
#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment")
you need to lapply over the list like you did with lapply(modelList, summary)
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
aov(mF, data = DF)
})
But when you do this, there is an error:
lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))
Note: re-fitting model with sum-to-zero contrasts
Error in terms(formula, "Error", data = data) : object 'mF' not found
attr(modelList[[1]], 'call')$formula
# mF
Note that mF was the name of the formula object, so it seems emmeans needs the original formula for some reason. You can add the formula to the call:
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
av <- aov(mF, data = DF)
attr(av, 'call')$formula <- mF
av
})
lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))
# [[1]]
# contrast estimate SE df t.ratio p.value
# A - B -1.89 1.26 10 -1.501 0.3311
# A - C 1.08 1.26 10 0.854 0.6795
# B - C 2.97 1.26 10 2.356 0.0934
#
# P value adjustment: tukey method for comparing a family of 3 estimates
#
# [[2]]
# contrast estimate SE df t.ratio p.value
# A - B -1.44 1.12 10 -1.282 0.4361
# A - C 1.29 1.12 10 1.148 0.5082
# B - C 2.73 1.12 10 2.430 0.0829
#
# P value adjustment: tukey method for comparing a family of 3 estimates
#
# [[3]]
# contrast estimate SE df t.ratio p.value
# A - B -1.58 1.15 10 -1.374 0.3897
# A - C 1.27 1.15 10 1.106 0.5321
# B - C 2.85 1.15 10 2.480 0.0765
#
# P value adjustment: tukey method for comparing a family of 3 estimates
Make a loop of the function by column names.
responseList <- names(DF)[c(3:5)]
for(n in responseList) {
anova2 <- aov(get(n) ~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
wt_emm2 <- emmeans(anova2, "Treatment")
print(pairs(wt_emm2))
}
This returns
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -1.41 1.26 10 -1.122 0.5229
A - C 1.31 1.26 10 1.039 0.5705
B - C 2.72 1.26 10 2.161 0.1269
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -2.16 1.37 10 -1.577 0.2991
A - C 1.19 1.37 10 0.867 0.6720
B - C 3.35 1.37 10 2.444 0.0810
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -1.87 1.19 10 -1.578 0.2988
A - C 1.28 1.19 10 1.077 0.5485
B - C 3.15 1.19 10 2.655 0.0575
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
If you want to have the output as a list:
responseList <- names(DF)[c(3:5)]
output <- list()
for(n in responseList) {
anova2 <- aov(get(n) ~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
wt_emm2 <- emmeans(anova2, "Treatment")
output[[n]] <- pairs(wt_emm2)
}
Related
I'm trying to use emmeans to test "contrasts of contrasts" with custom orthogonal contrasts applied to a zero-inflated negative binomial model. The study design has 4 groups (study_group: grp1, grp2, grp3, grp4), each of which is assessed at 3 timepoints (time: Time1, Time2, Time3).
With the code below, I am able to get very close to, but not exactly, what I want. The contrasts that emerge are expressed in terms of ratios such as grp1/grp2, grp1/grp3,..., grp3/grp4 ("lower over higher"; see output following code).
What would be immensely helpful to me to have a way to flip these ratios to be grp2/grp1, grp3/grp1,..., grp4/grp3 ("higher over lower"). I've tried sticking reverse=TRUE in various spots, but to no effect.
Short of re-leveling the study_group factor, is there anyway to do this in emmeans?
Thanks!
library(glmmTMB)
library(emmeans)
set.seed(3456)
# Building grid for study design: 4 groups of 3 sites,
# each with 20 participants observed 3 times
site <- rep(1:12, each=60)
pid <- 1000*site+10*(rep(rep(1:20,each=3),12))
study_group <- c(rep("grp1",180), rep("grp2",180), rep("grp3",180), rep("grp4",180))
grp_num <- c(rep(0,180), rep(1,180), rep(2,180), rep(3,180))
time <- c(rep(c("Time1", "Time2", "Time3"),240))
time_num <- c(rep(c(0:2),240))
# Site-level random effects (intercepts)
site_eff_count = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
site_eff_zeros = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
# Simulating a neg binomial outcome
y_count <- rnbinom(n = 720, mu=exp(3.25 + grp_num*0.15 + time_num*-0.20 + grp_num*time_num*0.15 + site_eff_count), size=0.8)
# Simulating some extra zeros
log_odds = (-1.75 + grp_num*0.2 + time_num*-0.40 + grp_num*time_num*0.50 + site_eff_zeros)
prob_1 = plogis(log_odds)
prob_0 = 1 - prob_1
y_zeros <- rbinom(n = 720, size = 1, prob = prob_0)
# Building datasest with ZINB-ish outcome
data_ZINB <- data.frame(site, pid, study_group, time, y_count, y_zeros)
data_ZINB$y_obs <- ifelse(y_zeros==1, y_count, 0)
# Estimating ZINB GLMM in glmmTMB
mod_ZINB <- glmmTMB(y_obs ~ 1
+ study_group + time + study_group*time
+ (1|site),
family=nbinom2,
zi = ~ .,
data=data_ZINB)
#summary(mod_ZINB)
# Getting model-estimated "cell" means for conditional (non-zero) sub-model
# in response (not linear predictor) scale
count_means <- emmeans(mod_ZINB,
pairwise ~ time | study_group,
component="cond",
type="response",
adjust="none")
# count_means
# Defining custom contrast function for orthogonal time contrasts
# contr1 = Time 2 - Time 1
# contr2 = Time 3 - Times 1 and 2
compare_arms.emmc <- function(levels) {
k <- length(levels)
contr1 <- c(-1,1,0)
contr2 <- c(-1,-1,2)
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - 1), function(i) {
if(i==1) contr1 else contr2
}))
names(coef) <- c("T1vT2", "T1T2vT3")
attr(coef, "adjust") = "none"
coef
}
# Estimating pairwise between-group "contrasts of contrasts"
# i.e., testing if time contrasts differ across groups
compare_arms_contrast <- contrast(count_means[[1]],
interaction = c("compare_arms", "pairwise"),
by = NULL)
compare_arms_contrast
applying theemmeans::contrast function as above yields this:
time_compare_arms study_group_pairwise ratio SE df null t.ratio p.value
T1vT2 grp1 / grp2 1.091 0.368 693 1 0.259 0.7957
T1T2vT3 grp1 / grp2 0.623 0.371 693 1 -0.794 0.4276
T1vT2 grp1 / grp3 1.190 0.399 693 1 0.520 0.6034
T1T2vT3 grp1 / grp3 0.384 0.241 693 1 -1.523 0.1283
T1vT2 grp1 / grp4 0.664 0.245 693 1 -1.108 0.2681
.
.
.
T1T2vT3 grp3 / grp4 0.676 0.556 693 1 -0.475 0.6346
Tests are performed on the log scale
The answer, provided by Russ Lenth in the comments and in the emmeans documentation for the contrast function, is to replace pairwise with revpairwise in the contrast function call.
I would like to use nls to fit a global parameter and group-specific parameters. The closest I have found to a minimum reproducible example is below (found here: https://stat.ethz.ch/pipermail/r-help/2015-September/432020.html)
#Generate some data
d <- transform(data.frame(x=seq(0,1,len=17),
group=rep(c("A","B","B","C"),len=17)), y =
round(1/(1.4+x^ifelse(group=="A", 2.3, ifelse(group=="B",3.1, 3.5))),2))
#Fit to model using nls
nls(y~1/(b+x^p[group]), data=d, start=list(b=1, p=rep(3,length(levels(d$group)))))
This gives me an error:
Error in numericDeriv(form[[3L]], names(ind), env, central = nDcentral) :
Missing value or an infinity produced when evaluating the model
I have not been able to figure out if the error is coming from bad guesses for the starting values, or the way this code is dealing with group-specific parameters. It seems the line with p=rep(3,length(levels(d$group))) is for generating c(3,3,3), but switching this part of the code does not remove the problem (same error obtained as above):
#Fit to model using nls
nls(y~1/(b+x^p[group]), data=d, start=list(b=1, p=c(3, 3, 3)))
Switching to nlsLM gives a different error which leads be to believe I am having an issue with the group-specific parameters:
#Generate some data
library(minpack.lm)
d <- transform(data.frame(x=seq(0,1,len=17),
group=rep(c("A","B","B","C"),len=17)), y =
round(1/(1.4+x^ifelse(group=="A", 2.3, ifelse(group=="B",3.1, 3.5))),2))
#Fit to model using nlsLM
nlsLM(y~1/(b+x^p[group]), data=d, start=list(b=1, p=c(3,3,3)))
Error:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
Any ideas?
I think you can do this much more easily with nlme::gnls:
fit2 <- nlme::gnls(y~1/(b+x^p),
params = list(p~group-1, b~1),
data=d,
start = list(b=1, p = rep(3,3)))
Results:
Generalized nonlinear least squares fit
Model: y ~ 1/(b + x^p)
Data: d
Log-likelihood: 62.05887
Coefficients:
p.groupA p.groupB p.groupC b
2.262383 2.895903 3.475324 1.407561
Degrees of freedom: 17 total; 13 residual
Residual standard error: 0.007188101
The params argument allows you to specify fixed-effect submodels for each nonlinear parameter. Using p ~ b-1 parameterizes the model with a separate estimate for each group, rather than fitting a baseline (intercept) value for the first group and the differences between successive groups. (In R's formula language, -1 or +0 signify "fit a model without intercept/set the intercept to 0", which in this case corresponds to fitting all three groups separately.)
I'm quite surprised that gnls and nls don't give identical results (although both give reasonable results); would like to dig in further ...
Parameter estimates (code below):
term nls gnls
1 b 1.41 1.40
2 pA 2.28 2.28
3 pB 3.19 3.14
4 pC 3.60 3.51
par(las = 1, bty = "l")
plot(y~x, data = d, col = d$group, pch = 16)
xvec <- seq(0, 1, length = 21)
f <- function(x) factor(x, levels = c("A","B","C"))
## fit1 is nls() fit
ll <- function(g, c = 1) {
lines(xvec, predict(fit1, newdata = data.frame(group=f(g), x = xvec)), col = c)
}
Map(ll, LETTERS[1:3], 1:3)
d2 <- expand.grid(x = xvec, group = f(c("A","B","C")))
pp <- predict(fit2, newdata = d2)
ll2 <- function(g, c = 1) {
lines(xvec, pp[d2$group == g], lty = 2, col = c)
}
Map(ll2, LETTERS[1:3], 1:3)
legend("bottomleft", lty = 1:2, col = 1, legend = c("nls", "gnls"))
library(tidyverse)
library(broom)
library(broom.mixed)
(purrr::map_dfr(list(nls=fit1, gnls=fit2), tidy, .id = "pkg")
%>% select(pkg, term, estimate)
%>% group_by(pkg)
## force common parameter names
%>% mutate(across(term, ~ c("b", paste0("p", LETTERS[1:3]))))
%>% pivot_wider(names_from = pkg, values_from = estimate)
)
I was able to get this by switching the class of the group from chr to factor. Note the addition of factor() when generating the dataset.
> d <- transform(data.frame(
+ x=seq(0,1,len=17),
+ group=rep(factor(c("A","B","B","C")),len=17)),
+ y=round(1/(1.4+x^ifelse(group=="A", 2.3, ifelse(group=="B",3.1, 3.5))),2)
+ )
> str(d)
'data.frame': 17 obs. of 3 variables:
$ x : num 0 0.0625 0.125 0.1875 0.25 ...
$ group: Factor w/ 3 levels "A","B","C": 1 2 2 3 1 2 2 3 1 2 ...
$ y : num 0.71 0.71 0.71 0.71 0.69 0.7 0.69 0.69 0.62 0.64 ...
> nls(y~1/(b+x^p[group]), data=d, start=list(b=1, p=c(3,3,3)))
Nonlinear regression model
model: y ~ 1/(b + x^p[group])
data: d
b p1 p2 p3
1.406 2.276 3.186 3.601
residual sum-of-squares: 9.537e-05
Number of iterations to convergence: 5
Achieved convergence tolerance: 4.536e-06
I would like to run a regression based on a correlation matrix rather than raw data. I have looked at this post, but can't make sense of it. How do I do this in R?
Here is some code:
#Correlation matrix.
MyMatrix <- matrix(
c(1.0, 0.1, 0.5, 0.4,
0.1, 1.0, 0.9, 0.3,
0.5, 0.9, 1.0, 0.3,
0.4, 0.3, 0.3, 1.0),
nrow=4,
ncol=4)
df <- as.data.frame(MyMatrix)
colnames(df)[colnames(df)=="V1"] <- "a"
colnames(df)[colnames(df)=="V2"] <- "b"
colnames(df)[colnames(df)=="V3"] <- "c"
colnames(df)[colnames(df)=="V4"] <- "d"
#Assume means and standard deviations as follows:
MEAN.a <- 4.00
MEAN.b <- 3.90
MEAN.c <- 4.10
MEAN.d <- 5.00
SD.a <- 1.01
SD.b <- 0.95
SD.c <- 0.99
SD.d <- 2.20
#Run model [UNSURE ABOUT THIS PART]
library(lavaan)
m1 <- 'd ~ a + b + c'
fit <- sem(m1, ????)
summary(fit, standardize=TRUE)
This should do it. First you can convert your correlation matrix to a covariance matrix
MyMatrix <- matrix(
c(1.0, 0.1, 0.5, 0.4,
0.1, 1.0, 0.9, 0.3,
0.5, 0.9, 1.0, 0.3,
0.4, 0.3, 0.3, 1.0),
nrow=4,
ncol=4)
rownames(MyMatrix) <- colnames(MyMatrix) <- c("a", "b","c","d")
#Assume means and standard deviations as follows:
MEAN.a <- 4.00
MEAN.b <- 3.90
MEAN.c <- 4.10
MEAN.d <- 5.00
SD.a <- 1.01
SD.b <- 0.95
SD.c <- 0.99
SD.d <- 2.20
s <- c(SD.a, SD.b, SD.c, SD.d)
m <- c(MEAN.a, MEAN.b, MEAN.c, MEAN.d)
cov.mat <- diag(s) %*% MyMatrix %*% diag(s)
rownames(cov.mat) <- colnames(cov.mat) <- rownames(MyMatrix)
names(m) <- rownames(MyMatrix)
Then, you can use lavaan to estimate the model along the lines of the post you mentioned in your question. Note, you need to supply a number of observations to get the sample estimate. I used 100 for the example, but you may want to change it if that doesn't make sense.
library(lavaan)
m1 <- 'd ~ a + b + c'
fit <- sem(m1,
sample.cov = cov.mat,
sampl.nobs=100,
sample.mean=m
meanstructure=TRUE)
summary(fit, standardize=TRUE)
# lavaan 0.6-6 ended normally after 44 iterations
#
# Estimator ML
# Optimization method NLMINB
# Number of free parameters 5
#
# Number of observations 100
#
# Model Test User Model:
#
# Test statistic 0.000
# Degrees of freedom 0
#
# Parameter Estimates:
#
# Standard errors Standard
# Information Expected
# Information saturated (h1) model Structured
#
# Regressions:
# Estimate Std.Err z-value P(>|z|) Std.lv Std.all
# d ~
# a 6.317 0.095 66.531 0.000 6.317 2.900
# b 12.737 0.201 63.509 0.000 12.737 5.500
# c -13.556 0.221 -61.307 0.000 -13.556 -6.100
#
# Intercepts:
# Estimate Std.Err z-value P(>|z|) Std.lv Std.all
# .d -14.363 0.282 -50.850 0.000 -14.363 -6.562
#
# Variances:
# Estimate Std.Err z-value P(>|z|) Std.lv Std.all
# .d 0.096 0.014 7.071 0.000 0.096 0.020
#
#
I am using the lrm function from the rms package to get:
> model_1 <- lrm(dependent_variable ~ var1+ var2 + var3, data = merged_dataset, na.action="na.delete")
> print(model_1)
Logistic Regression Model
lrm(dependent_variable ~ var1+ var2 + var3, data = merged_dataset, na.action="na.delete")
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 6046 LR chi2 21.97 R2 0.005 C 0.531
0 3151 d.f. 11 g 0.138 Dxy 0.062
1 2895 Pr(> chi2) 0.0246 gr 1.148 gamma 0.062
max |deriv| 1e-13 gp 0.034 tau-a 0.031
Brier 0.249
Coef S.E. Wald Z Pr(>|Z|)
Intercept -0.0752 0.0348 -2.16 0.0305
var1 10.6916 2.1476 0.32 0.7474
var2 -0.1595 0.4125 -0.39 0.6990
var3 -0.0563 0.0266 -2.12 0.0341
My question is are these coefficients odds ratios or not? If not, how can I get the odds ratios coefficients?
Hi there here is an approach. Note that it helps if you include some sample data for us to work with.
Generating some fake data...
fake_data <- matrix(rnorm(300), ncol = 3)
y_start <- 1/(1+exp(-(fake_data %*% c(1, .3, 2))))
y <- rbinom(100, size = 1, prob = y_start)
dat <- data.frame(y, fake_data)
Now we fit the model:
library(rms)
fit <- lrm(y ~ ., data = dat)
The model coefficients will be in the form of log-odds (still on the log scale)
# Log-odds
coef(fit)
Intercept X1 X2 X3
0.03419513 0.92890297 0.48097414 1.86036897
If you want to move to odds then we need to use exponentiation to transfer from the log scale.
# Odds
exp(coef(fit))
Intercept X1 X2 X3
1.034787 2.531730 1.617649 6.426107
So in this example you odds of achieving Y increases by 2.5 with an increase in X1.
The following code works out quite well (based on my previous question). But I have to change the variance estimator (ols, hc0, hc1, hc2, hc3) every time before I run the code. I would like to solve this problem with a loop.
Hereafter, I briefly describe the code. Within the code, 1000 regression models for each sample size (n = 25, 50, 100, 250, 500, 1000) are created. Then, each regression model out of the 1000 is estimated by OLS. After that, I calculate t-statistics based on the different beta values of x3 out of the 1000 samples. The null hypothesis reads: H0: beta03 = beta3, that is the calculated beta value of x3 equals the 'real' value which I defined as 1. In the last step, I check how often the null hypothesis is rejected (significance level = 0.05). My final goal is to create a code which spits out the procentual rejection rate of the null hypothesis for each sample size and variance estimator. Thus, the result should be a matrix whereas right now I get a vector as a result. I would be pleased if anyone of you could help me with that. Here you can see my code:
library(car)
sample_size = c("n=25"=25, "n=50"=50, "n=100"=100, "n=250"=250, "n=500"=500, "n=1000"=1000)
B <- 1000
beta0 <- 1
beta1 <- 1
beta2 <- 1
beta3 <- 1
alpha <- 0.05
simulation <- function(n, beta3h0){
t.test.values <- rep(NA, B)
#simulation of size
for(rep in 1:B){
#data generation
d1 <- runif(n, 0, 1)
d2 <- rnorm(n, 0, 1)
d3 <- rchisq(n, 1, ncp=0)
x1 <- (1 + d1)
x2 <- (3*d1 + 0.6*d2)
x3 <- (2*d1 + 0.6*d3)
# homoskedastic error term: exi <- rchisq(n, 4, ncp = 0)
exi <- sqrt(x3 + 1.6)*rchisq(n, 4, ncp = 0)
y <- beta0 + beta1*x1 + beta2*x2 + beta3*x3 + exi
mydata <- data.frame(y, x1, x2, x3)
#ols estimation
lmobj <- lm(y ~ x1 + x2 + x3, mydata)
#extraction
betaestim <- coef(lmobj)[4]
betavar <- vcov(lmobj)[4,4]
#robust variance estimators: hc0, hc1, hc2, hc3
betavar0 <- hccm(lmobj, type="hc0")[4,4]
betavar1 <- hccm(lmobj, type="hc1")[4,4]
betavar2 <- hccm(lmobj, type="hc2")[4,4]
betavar3 <- hccm(lmobj, type="hc3")[4,4]
#t statistic
t.test.values[rep] <- (betaestim - beta3h0)/sqrt(betavar)
}
mean(abs(t.test.values) > qt(p=c(1-alpha/2), df=n-4))
}
sapply(sample_size, simulation, beta3h0 = 1)
You don't need a double nested loop. Just make sure you get a matrix inside your loop. Update your current simulation with the following:
## set up a matrix
## replacing `t.test.values <- rep(NA, B)`
t.test.values <- matrix(nrow = 5, ncol = B) ## 5 estimators
## update / fill a column
## replacing `t.test.values[rep] <- (betaestim - beta3h0)/sqrt(betavar)`
t.test.values[, rep] <- abs(betaestim - beta3h0) / sqrt(c(betavar, betavar0, betavar1, betavar2, betavar3))
## row means
## replacing `mean(abs(t.test.values) > qt(p=c(1-alpha/2), df=n-4))`
rowMeans(t.test.values > qt(1-alpha/2, n-4))
Now, simulation would return a vector of length 5. For each sample size, the monte carlo estimate of t-statistic p-value is returned for all 5 variance estimators. Then, when you call sapply, you get a matrix result:
sapply(sample_size, simulation, beta3h0 = 1)
# n=25 n=50 n=100 n=250 n=500 n=1000
#[1,] 0.132 0.237 0.382 0.696 0.917 0.996
#[2,] 0.198 0.241 0.315 0.574 0.873 0.994
#[3,] 0.157 0.220 0.299 0.569 0.871 0.994
#[4,] 0.119 0.173 0.248 0.545 0.859 0.994
#[5,] 0.065 0.122 0.197 0.510 0.848 0.993