I using an lm() like function called robu() from library robumeta within my own function foo.
However, I'm manipulating the formula argument such that when it is missing the default formula would be: formula(dint~1) or else any formula that user defines.
It works fine, however, in the output of foo the printed formula call always is: Model: missing(f) if formula(dint ~ 1) regardless of what formula is inputted in the foo.
Can I correct this part of output so that it only shows the exact formula used? (see below examples)
dat <- data.frame(dint = 1:9, SD = 1:9*.1,
time = c(1,1,2,3,4,3,2,4,1),
study.name = rep(c("bob", "jim", "jon"), 3))
library(robumeta)
# MY FUNCTION:
foo <- function(f, data){
robu(formula = if(missing(f)) formula(dint~1) else formula(f), data = data, studynum = study.name, var = SD^2)
}
# EXAMPLES OF USE:
foo(data = dat) ## HERE I expect: `Model: dint ~ 1`
foo(dint~as.factor(time), data = dat) ## HERE I expect: `Model: dint ~ time`
One option is to update the 'ml' object
foo <- function(f, data){
fmla <- if(missing(f)) {
formula(dint ~ 1)
} else {
formula(f)
}
model <- robu(formula = fmla, data = data, studynum = study.name, var = SD^2)
model$ml <- fmla
model
}
-checking
foo(data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ 1
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 96.83379
Tau.sq = 9.985899
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 4.99 0.577 8.65 2 0.0131 2.51 7.48 **
---
Signif. codes: < .01 *** < .05 ** < .10 *
---
Note: If df < 4, do not trust the results
foo(dint~ as.factor(time), data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ as.factor(time)
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 97.24601
Tau.sq = 11.60119
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 3.98 2.50 1.588 2.00 0.253 -6.80 14.8
2 as.factor.time.2 1.04 4.41 0.236 1.47 0.842 -26.27 28.3
3 as.factor.time.3 1.01 1.64 0.620 1.47 0.617 -9.10 11.1
4 as.factor.time.4 2.52 2.50 1.007 2.00 0.420 -8.26 13.3
---
Signif. codes: < .01 *** < .05 ** < .10 *
Related
I'm trying to use emmeans to test "contrasts of contrasts" with custom orthogonal contrasts applied to a zero-inflated negative binomial model. The study design has 4 groups (study_group: grp1, grp2, grp3, grp4), each of which is assessed at 3 timepoints (time: Time1, Time2, Time3).
With the code below, I am able to get very close to, but not exactly, what I want. The contrasts that emerge are expressed in terms of ratios such as grp1/grp2, grp1/grp3,..., grp3/grp4 ("lower over higher"; see output following code).
What would be immensely helpful to me to have a way to flip these ratios to be grp2/grp1, grp3/grp1,..., grp4/grp3 ("higher over lower"). I've tried sticking reverse=TRUE in various spots, but to no effect.
Short of re-leveling the study_group factor, is there anyway to do this in emmeans?
Thanks!
library(glmmTMB)
library(emmeans)
set.seed(3456)
# Building grid for study design: 4 groups of 3 sites,
# each with 20 participants observed 3 times
site <- rep(1:12, each=60)
pid <- 1000*site+10*(rep(rep(1:20,each=3),12))
study_group <- c(rep("grp1",180), rep("grp2",180), rep("grp3",180), rep("grp4",180))
grp_num <- c(rep(0,180), rep(1,180), rep(2,180), rep(3,180))
time <- c(rep(c("Time1", "Time2", "Time3"),240))
time_num <- c(rep(c(0:2),240))
# Site-level random effects (intercepts)
site_eff_count = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
site_eff_zeros = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
# Simulating a neg binomial outcome
y_count <- rnbinom(n = 720, mu=exp(3.25 + grp_num*0.15 + time_num*-0.20 + grp_num*time_num*0.15 + site_eff_count), size=0.8)
# Simulating some extra zeros
log_odds = (-1.75 + grp_num*0.2 + time_num*-0.40 + grp_num*time_num*0.50 + site_eff_zeros)
prob_1 = plogis(log_odds)
prob_0 = 1 - prob_1
y_zeros <- rbinom(n = 720, size = 1, prob = prob_0)
# Building datasest with ZINB-ish outcome
data_ZINB <- data.frame(site, pid, study_group, time, y_count, y_zeros)
data_ZINB$y_obs <- ifelse(y_zeros==1, y_count, 0)
# Estimating ZINB GLMM in glmmTMB
mod_ZINB <- glmmTMB(y_obs ~ 1
+ study_group + time + study_group*time
+ (1|site),
family=nbinom2,
zi = ~ .,
data=data_ZINB)
#summary(mod_ZINB)
# Getting model-estimated "cell" means for conditional (non-zero) sub-model
# in response (not linear predictor) scale
count_means <- emmeans(mod_ZINB,
pairwise ~ time | study_group,
component="cond",
type="response",
adjust="none")
# count_means
# Defining custom contrast function for orthogonal time contrasts
# contr1 = Time 2 - Time 1
# contr2 = Time 3 - Times 1 and 2
compare_arms.emmc <- function(levels) {
k <- length(levels)
contr1 <- c(-1,1,0)
contr2 <- c(-1,-1,2)
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - 1), function(i) {
if(i==1) contr1 else contr2
}))
names(coef) <- c("T1vT2", "T1T2vT3")
attr(coef, "adjust") = "none"
coef
}
# Estimating pairwise between-group "contrasts of contrasts"
# i.e., testing if time contrasts differ across groups
compare_arms_contrast <- contrast(count_means[[1]],
interaction = c("compare_arms", "pairwise"),
by = NULL)
compare_arms_contrast
applying theemmeans::contrast function as above yields this:
time_compare_arms study_group_pairwise ratio SE df null t.ratio p.value
T1vT2 grp1 / grp2 1.091 0.368 693 1 0.259 0.7957
T1T2vT3 grp1 / grp2 0.623 0.371 693 1 -0.794 0.4276
T1vT2 grp1 / grp3 1.190 0.399 693 1 0.520 0.6034
T1T2vT3 grp1 / grp3 0.384 0.241 693 1 -1.523 0.1283
T1vT2 grp1 / grp4 0.664 0.245 693 1 -1.108 0.2681
.
.
.
T1T2vT3 grp3 / grp4 0.676 0.556 693 1 -0.475 0.6346
Tests are performed on the log scale
The answer, provided by Russ Lenth in the comments and in the emmeans documentation for the contrast function, is to replace pairwise with revpairwise in the contrast function call.
I understand how to extract chains from a Stan model but I was wondering if there was any quick way to extract the values displayed on the default Stan output table.
Here is some toy data
# simulate linear model
a <- 3 # intercept
b <- 2 # slope
# we can have both the predictor and the noise vary
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps
Which when we analyse
mod <- lm(y ~ x, df)
We can extract coefficients from
mod$coefficients
# (Intercept) x
# 3.355967 2.151597
I wondered if there is any way to do the equivalent with a Stan output table
# Step 1: Make List
data_reg <- list(N = 28, x = x, y = y)
# Step 2: Create Model String
write("
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
vector[N] mu;
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
for ( i in 1:N ) {
mu[i] = alpha + beta * x[i];
}
y ~ normal(mu, sigma);
}
", file = "temp.stan")
# Step 3: Generate MCMC Chains
fit1 <- stan(file = "temp.stan",
data = data_reg,
chains = 2,
warmup = 1000,
iter = 2000,
cores = 2,
refresh = 1000)
Now, when we call the model
fit1
# Output
# mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
# alpha 3.33 0.01 0.40 2.57 3.06 3.33 3.59 4.13 1229 1
# beta 2.14 0.01 0.40 1.37 1.89 2.14 2.40 2.98 1470 1
# sigma 1.92 0.01 0.27 1.45 1.71 1.90 2.09 2.51 1211 1
# lp__ -31.92 0.05 1.30 -35.27 -32.50 -31.63 -30.96 -30.43 769 1
Is there any way to index and extract elements from the Output table displayed above?
If you only want means, then the get_posterior_mean function will work. Otherwise, you assign the result of print(fit1) or summary(print1) to an object, you can extract stuff from that object, but it is probably better to just do as.matrix(fit1) or as.data.frame(fit1) and calculate whatever you want yourself on the resulting columns.
I'm trying to use robumeta and I keep running into an error.
I'm using 113 observations on 8 variables:
EM <- read.csv(file="SchoolMotivationRisk.csv", header=TRUE,sep=",")
The eight variables are: studynum yi var.effect.size sei aget1 aget2 permale sexmix.
Doing str(EM) yields 'data.frame': 113 obs. of 8 variables.
The problem is when I go to fit:
res<-robu(formula = yi ~ 1, var.effect.size = var.effect.size, studynum = studynum, modelweights = "CORR", rho= 0.8, small=TRUE, data=EM)
I get the following error:
Error in data.frame(effect.size = mf[, 1], stats::model.matrix(formula, : arguments imply differing number of rows: 113, 0
Is there a way around this?
Also, the output from dput(EM) is at https://pastebin.com/vmMwy1u4
The parameter is var.eff.size and not var.effect.size
library(robumeta)
robu(formula = yi ~ 1, var.eff.size = var.effect.size,
studynum = studynum, modelweights = "CORR", rho= 0.8, small=TRUE, data=EM)
#RVE: Correlated Effects Model with Small-Sample Corrections
#Model: yi ~ 1
#Number of studies = 17
#Number of outcomes = 113 (min = 2 , mean = 6.65 , median = 7 , max = 12 )
#Rho = 0.8
#I.sq = 57.54005
#Tau.sq = 0.004609755
# Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
#1 X.Intercept. 0.113 0.0184 6.12 14 0.0000263 0.0733 0.152 ***
#---
#Signif. codes: < .01 *** < .05 ** < .10 *
#---
#Note: If df < 4, do not trust the results
I'm trying to reproduce the 95% CI that Stata produces when you run a model with clustered standard errors. For example:
regress api00 acs_k3 acs_46 full enroll, cluster(dnum)
Regression with robust standard errors Number of obs = 395
F( 4, 36) = 31.18
Prob > F = 0.0000
R-squared = 0.3849
Number of clusters (dnum) = 37 Root MSE = 112.20
------------------------------------------------------------------------------
| Robust
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
acs_k3 | 6.954381 6.901117 1.008 0.320 -7.041734 20.9505
acs_46 | 5.966015 2.531075 2.357 0.024 .8327565 11.09927
full | 4.668221 .7034641 6.636 0.000 3.24153 6.094913
enroll | -.1059909 .0429478 -2.468 0.018 -.1930931 -.0188888
_cons | -5.200407 121.7856 -0.043 0.966 -252.193 241.7922
------------------------------------------------------------------------------
I am able to reproduce the coefficients and the standard errors:
library(readstata13)
library(texreg)
library(sandwich)
library(lmtest)
clustered.se <- function(model_result, data, cluster) {
model_variables <-
intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model)
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M / (M - 1)) * ((N - 1) / (N - K))
uj <-
apply(estfun(model_result), 2, function(x)
tapply(x, cl, sum))
vcovCL <- dfc * sandwich(model_result, meat = crossprod(uj) / N)
standard.errors <- coeftest(model_result, vcov. = vcovCL)[, 2]
p.values <- coeftest(model_result, vcov. = vcovCL)[, 4]
clustered.se <-
list(vcovCL = vcovCL,
standard.errors = standard.errors,
p.values = p.values)
return(clustered.se)
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <-
lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll,
data = elemapi2)
clustered_se <-
clustered.se(model_result = lm1,
data = elemapi2,
cluster = "dnum")
htmlreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
star.symbol = "\\*",
digits = 7
)
=============================
Model 1
-----------------------------
(Intercept) -5.2004067
(121.7855938)
acs_k3 6.9543811
(6.9011174)
acs_46 5.9660147 *
(2.5310751)
full 4.6682211 ***
(0.7034641)
enroll -0.1059909 *
(0.0429478)
-----------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
=============================
*** p < 0.001, ** p < 0.01, * p < 0.05
Alas, I cannot reproduce the 95% confidence Interval:
screenreg(
lm1,
override.se = clustered_se$standard.errors,
override.p = clustered_se$p.value,
digits = 7,
ci.force = TRUE
)
========================================
Model 1
----------------------------------------
(Intercept) -5.2004067
[-243.8957845; 233.4949710]
acs_k3 6.9543811
[ -6.5715605; 20.4803228]
acs_46 5.9660147 *
[ 1.0051987; 10.9268307]
full 4.6682211 *
[ 3.2894567; 6.0469855]
enroll -0.1059909 *
[ -0.1901670; -0.0218148]
----------------------------------------
R^2 0.3848830
Adj. R^2 0.3785741
Num. obs. 395
RMSE 112.1983218
========================================
* 0 outside the confidence interval
If I do it 'by hand', I get the same thing than with texreg:
level <- 0.95
a <- 1-(1 - level)/2
coeff <- lm1$coefficients
se <- clustered_se$standard.errors
lb <- coeff - qnorm(a)*se
ub <- coeff + qnorm(a)*se
> lb
(Intercept) acs_k3 acs_46 full enroll
-243.895784 -6.571560 1.005199 3.289457 -0.190167
> ub
(Intercept) acs_k3 acs_46 full enroll
233.49497100 20.48032276 10.92683074 6.04698550 -0.02181481
What is Stata doing and how can I reproduce it in R?
PS: This is a follow up question.
PS2: The Stata data is available here.
It looks like Stata is using confidence intervals based on t(36) rather than Z (i.e. Normal errors).
Taking the values from the Stata output
coef=6.954381; rse= 6.901117 ; lwr= -7.041734; upr= 20.9505
(upr-coef)/rse
## [1] 2.028095
(lwr-coef)/rse
## [1] -2.028094
Computing/cross-checking the tail values for t(36):
pt(2.028094,36)
## [1] 0.975
qt(0.975,36)
## [1] 2.028094
I don't know how you pass confidence intervals to texreg. Since you haven't given a reproducible example (I don't have elemapi2.dta) I can't say exactly how you would get the df, but it looks like you would want tdf <- length(unique(elemapi2$dnum))-1
level <- 0.95
a <- 1- (1 - level)/2
bounds <- coef(lm1) + c(-1,1)*clustered_se*qt(a,tdf)
Indeed Stata is using the t distribution rather than the normal distribution. There is now a really easy solution to getting confidence intervals that match Stata into texreg using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).
> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, clusters = cyl, se_type = "stata")
> screenreg(lmro)
===========================
Model 1
---------------------------
(Intercept) 30.10 *
[13.48; 46.72]
hp -0.07
[-0.15; 0.01]
---------------------------
R^2 0.60
Adj. R^2 0.59
Num. obs. 32
RMSE 3.86
===========================
* 0 outside the confidence interval
I'm trying to solve a two-component decay model in R using the nls function, but running into errors. The equation is:
Where t is time, Ctot is C1+C2, and p1 and p2 are known proportions of Ctot.
my data (dd) is:
> head(dd,n=15)
t Ctot
1 0.00 6.62
2 0.33 6.45
3 0.50 6.38
4 0.67 6.44
5 0.83 6.38
6 1.00 6.39
7 1.17 6.35
8 1.33 6.33
9 1.50 6.33
10 1.67 6.28
11 1.83 6.17
12 2.00 6.11
13 2.17 6.07
14 2.33 5.89
15 2.50 5.86
Using nls I have tried:
p1 <- 0.3
p2 <- 0.7
z <- nls(Ctot~(p1*C1*(exp(-k1*t)))+(p2*C2*(exp(-k2*t))), data=dd, start=list(C1=6, C2=0.1, k1=0.01, k2=0.01))
However I am getting:
z <- nls(Ctot~(p1*C1*(exp(-k1*t)))+(p2*C2*(exp(-k2*t))), data=dd, start=list(C1=6, C2=0.1, k1=0.01, k2=0.01))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
I would be grateful if anyone has suggestions!
The data seems fairly limited and clearly incomplete since it only the head. If we make up some data for testing methods ... and leave out the confusing p1 and p2:
t=seq(0, 20, by=.3)
Ctot = 3 * exp( -1 * t) + 4 * exp(-5*t)
# following hte example on gnm::gnm's help page:
saved.fits <- list(); library(gnm)
for (i in 1:10) {
saved.fits[[i]] <- suppressWarnings( gnm(Ctot ~ Exp(1 + t, inst = 1) +
Exp(1 + t, inst = 2),
verbose=FALSE))}
plot(Ctot~t)
lines(saved.fits[[3]]$fitted~t)
lines(saved.fits[[3]]$fitted~t,col="red")
I wasn't familiar with the gnm package and so ended up reading the first few sections and then the worked 2 component data fitting example in its vignette: https://cran.r-project.org/web/packages/gnm/vignettes/gnmOverview.pdf . Most of the fits will be as expected, but some will find a local maximum in likelihood that is not a global max:
> saved.fits[[1]]$coefficients
(Intercept) Exp(. + t, inst = 1).(Intercept)
1.479909e-12 1.098612e+00
Exp(1 + ., inst = 1).t Exp(. + t, inst = 2).(Intercept)
-1.000000e+00 1.386294e+00
Exp(1 + ., inst = 2).t
-5.000000e+00
attr(,"eliminated")
[1] 0
> exp( saved.fits[[1]]$coefficients[4] )
Exp(. + t, inst = 2).(Intercept)
4
> exp( saved.fits[[1]]$coefficients[2] )
Exp(. + t, inst = 1).(Intercept)
3
With the data shown in the question it does not seem to work well but if you are open to other parametric models then this 3 parameter model seems reasonable.
fm <- nls(Ctot ~ 1 / (a + b * t^c), dd, st = list(a = 1, b = 1, c = 1))
plot(dd)
lines(fitted(fm) ~ t, dd, col = "red")