Mantel Haenszel stratified estimates in survival data in R? - r

I've got survival data with an outcome, an exposure, and a variable I'd like to stratify on but can't find a function to perform Mantel-Haenszel rate ratio. For example with the lung dataset from survival I'd like to look at the outcome of status based on sex but stratified by age. I've set up age brackets with
library(tidyverse)
library(survival)
lung2 <- lung %>%
mutate(agecat = as.factor(case_when(age < 50 ~ 0,
age < 70 ~ 1,
age >= 70 ~ 2)))
epi.2by2 from epiR gets me close with
library(epiR)
epi.2by2(table(as.factor(lung2$status),
as.factor(lung2$sex),
lung2$agecat),
method = "cohort.count")
#> Outcome + Outcome - Total Inc risk * Odds
#> Exposed + 26 37 63 41.3 0.703
#> Exposed - 112 53 165 67.9 2.113
#> Total 138 90 228 60.5 1.533
#>
#>
#> Point estimates and 95% CIs:
#> -------------------------------------------------------------------
#> Inc risk ratio (crude) 0.61 (0.44, 0.83)
#> Inc risk ratio (M-H) 0.61 (0.45, 0.84)
#> Inc risk ratio (crude:M-H) 0.99
#> Odds ratio (crude) 0.33 (0.18, 0.61)
#> Odds ratio (M-H) 0.34 (0.19, 0.63)
#> Odds ratio (crude:M-H) 0.97
#> Attrib risk in the exposed (crude) * -26.61 (-40.70, -12.52)
#> Attrib risk in the exposed (M-H) * -25.88 (-42.58, -9.19)
#> Attrib risk (crude:M-H) 1.03
#> -------------------------------------------------------------------
#> M-H test of homogeneity of PRs: chi2(2) = 0.191 Pr>chi2 = 0.909
#> M-H test of homogeneity of ORs: chi2(2) = 0.394 Pr>chi2 = 0.821
#> Test that M-H adjusted OR = 1: chi2(1) = 12.299 Pr>chi2 = <0.001
#> Wald confidence limits
#> M-H: Mantel-Haenszel; CI: confidence interval
#> * Outcomes per 100 population units
But it doesn't take time to event data (rates) into consideration. It has a method = "cohort.time" option, but I can't seem to get it to work. Ultimately the output I'd like would be similar to STATAs stmh sex, by(agecat) which would give risk ratio estimates for each strata with upper and lower 95% confidence interval as well as overall estimate for risk ratio with chi-square and p-value. mhor from epiDisplay gives output close to what I'm looking for
library(epiDisplay)
mhor(lung2$status,
lung2$sex,
lung2$agecat, design = "cohort",
graph = FALSE)
#>
#> Stratified analysis by Var3
#> OR lower lim. upper lim. P value
#> Var3 0 0.476 0.0532 3.786 0.653417
#> Var3 1 0.363 0.1642 0.788 0.006387
#> Var3 2 0.243 0.0424 1.222 0.060217
#> M-H combined 0.344 0.1880 0.631 0.000453
#>
#> M-H Chi2(1) = 12.3 , P value = 0
#> Homogeneity test, chi-squared 2 d.f. = 0.38 , P value = 0.828
but it only gives odds ratios and not rate ratios.

Related

lrtest() not working: posthoc testing for multinomial with vglm() not working with lrtest()

I am running a multinomial analysis with vglm(). It all works, but then I try to follow the instructions from the following website (https://rcompanion.org/handbook/H_08.html) to do a pairwise test, because emmeans cannot handle pairwise for vglm models. The lrtest() part gives me the following error:
Error in lrtest.default(model) :
'list' object cannot be coerced to type 'double'
I cannot figure out what is wrong, I even copy and pasted the exact code that the website used (see below) and get the same error with their own code and dataset. Any ideas?
Their code and suggestion for doing pairwise testing with vglm() is the only pairwise testing option I found for vglm() anywhere on the web.
Here is the code along with all the expected output and extra details from their website (it is simpler than mine but gets same error anyways).
Input = ("
County Sex Result Count
Bloom Female Pass 9
Bloom Female Fail 5
Bloom Male Pass 7
Bloom Male Fail 17
Cobblestone Female Pass 11
Cobblestone Female Fail 4
Cobblestone Male Pass 9
Cobblestone Male Fail 21
Dougal Female Pass 9
Dougal Female Fail 7
Dougal Male Pass 19
Dougal Male Fail 9
Heimlich Female Pass 15
Heimlich Female Fail 8
Heimlich Male Pass 14
Heimlich Male Fail 17
")
Data = read.table(textConnection(Input),header=TRUE)
### Order factors otherwise R will alphabetize them
Data$County = factor(Data$County,
levels=unique(Data$County))
Data$Sex = factor(Data$Sex,
levels=unique(Data$Sex))
Data$Result = factor(Data$Result,
levels=unique(Data$Result))
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
### Remove unnecessary objects
rm(Input)
Multinomial regression
library(VGAM)
model = vglm(Result ~ Sex + County + Sex:County,
family=multinomial(refLevel=1),
weights = Count,
data = Data)
summary(model)
library(car)
Anova(model,
type="II",
test="Chisq")```
Analysis of Deviance Table (Type II tests)
Response: Result
Df Chisq Pr(>Chisq)
Sex 1 6.7132 0.00957 **
County 3 4.1947 0.24120
Sex:County 3 7.1376 0.06764 .
library(rcompanion)
nagelkerke(model)
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.0797857
Cox and Snell (ML) 0.7136520
Nagelkerke (Cragg and Uhler) 0.7136520
$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
7 -10.004 20.009 0.0055508
library(lmtest)
lrtest(model)
Likelihood ratio test
Model 1: Result ~ Sex + County + Sex:County
Model 2: Result ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 8 -115.39
2 15 -125.39 7 20.009 0.005551 **
Post-hoc analysis
At the time of writing, the lsmeans package cannot be used with vglm models.
One option for post-hoc analysis would be to conduct analyses on reduced models, including only two levels of a factor. For example, if the variable County x Sex term had been significant, the following code could be used to create a reduced dataset with only Bloom–Female and Bloom–Male, and analyze this data with vglm.
Data.b = Data[Data$County=="Bloom" &
(Data$Sex=="Female"| Data$Sex=="Male") , ]
Data.b$County = factor(Data.b$County)
Data.b$Sex = factor(Data.b$Sex)
summary(Data.b)
County Sex Result Count
Bloom:4 Female:2 Pass:2 Min. : 5.0
Male :2 Fail:2 1st Qu.: 6.5
Median : 8.0
Mean : 9.5
3rd Qu.:11.0
Max. :17.0
library(VGAM)
model.b = vglm(Result ~ Sex,
family=multinomial(refLevel=1),
weights = Count,
data = Data.b)
lrtest(model.b)
Likelihood ratio test
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -23.612
2 3 -25.864 1 4.5041 0.03381 *
Summary table of results
Comparison p-value
Bloom–Female - Bloom–Male 0.034
Cobblestone–Female - Cobblestone–Male 0.0052
Dougal–Female - Dougal–Male 0.44
Heimlich–Female - Heimlich–Male 0.14
p.value = c(0.034, 0.0052, 0.44, 0.14)
p.adj = p.adjust(p.value,
method = "fdr")
p.adj = signif(p.adj,
2)
p.adj
[1] 0.068 0.021 0.440 0.190
Comparison p-value p.adj
Bloom–Female - Bloom–Male 0.034 0.068
Cobblestone–Female - Cobblestone–Male 0.0052 0.021
Dougal–Female - Dougal–Male 0.44 0.44
Heimlich–Female - Heimlich–Male 0.14 0.19
It looks to me like qdrq() can be used. As I commented, you can't use the lazy interface, you have to give all the specific needed parameters:
> library(emmeans)
> RG = qdrg(formula(model), Data, coef(model), vcov(model), link = "log")
> RG
'emmGrid' object with variables:
Sex = Female, Male
County = Bloom, Cobblestone, Dougal, Heimlich
Transformation: “log”
> emmeans(RG, consec ~ Sex | County)
$emmeans
County = Bloom:
Sex emmean SE df asymp.LCL asymp.UCL
Female -0.588 0.558 Inf -1.68100 0.5054
Male 0.887 0.449 Inf 0.00711 1.7675
County = Cobblestone:
Sex emmean SE df asymp.LCL asymp.UCL
Female -1.012 0.584 Inf -2.15597 0.1328
Male 0.847 0.398 Inf 0.06643 1.6282
County = Dougal:
Sex emmean SE df asymp.LCL asymp.UCL
Female -0.251 0.504 Inf -1.23904 0.7364
Male -0.747 0.405 Inf -1.54032 0.0459
County = Heimlich:
Sex emmean SE df asymp.LCL asymp.UCL
Female -0.629 0.438 Inf -1.48668 0.2295
Male 0.194 0.361 Inf -0.51320 0.9015
Results are given on the log (not the response) scale.
Confidence level used: 0.95
$contrasts
County = Bloom:
contrast estimate SE df z.ratio p.value
Male - Female 1.475 0.716 Inf 2.060 0.0394
County = Cobblestone:
contrast estimate SE df z.ratio p.value
Male - Female 1.859 0.707 Inf 2.630 0.0085
County = Dougal:
contrast estimate SE df z.ratio p.value
Male - Female -0.496 0.646 Inf -0.767 0.4429
County = Heimlich:
contrast estimate SE df z.ratio p.value
Male - Female 0.823 0.567 Inf 1.450 0.1470
Results are given on the log (not the response) scale.
If I understand this model correctly, the response is the log of the ratio of the 2nd multinomial response to the 1st. So what we see above is estimated differences of logs and setimated differences of those differences. If run with type = "response" you would get estimated ratios, and ratios of those ratios.
Probably something changed in either the VGAM package or the lmtest package since that was written.
But the following will work for a likelihood ratio test for vglm models:
VGAM::lrtest(model)
VGAM::lrtest(model, model2)

Missing standard errors and confidence for mixed model and ggeffects

I'm trying to use ggeffects::ggpredict to make some effects plots for my model. I find that the standard errors and confidence limits are missing for many of the results. I can reproduce the problem with some simulated data. It seems specifically for observations where the standard error puts the predicted probability close to 0 or 1.
I tried to get predictions on the link scale to diagnose if it's a problem with the translation from link to response, but I don't believe this is supported by the package.
Any ideas how to address this? Many thanks.
library(tidyverse)
library(lme4)
library(ggeffects)
# number of simulated observations
n <- 1000
# simulated data with a numerical predictor x, factor predictor f, response y
# the simulated effects of x and f are somewhat weak compared to the noise, so expect high standard errors
df <- tibble(
x = seq(-0.1, 0.1, length.out = n),
g = floor(runif(n) * 3),
f = letters[1 + g] %>% as.factor(),
y = pracma::sigmoid(x + (runif(n) - 0.5) + 0.1 * (g - mean(g))),
z = if_else(y > 0.5, "high", "low") %>% as.factor()
)
# glmer model
model <- glmer(z ~ x + (1 | f), data = df, family = binomial)
print(summary(model))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#> Approximation) [glmerMod]
#> Family: binomial ( logit )
#> Formula: z ~ x + (1 | f)
#> Data: df
#>
#> AIC BIC logLik deviance df.resid
#> 1373.0 1387.8 -683.5 1367.0 997
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.3858 -0.9928 0.7317 0.9534 1.3600
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> f (Intercept) 0.0337 0.1836
#> Number of obs: 1000, groups: f, 3
#>
#> Fixed effects:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.02737 0.12380 0.221 0.825
#> x -4.48012 1.12066 -3.998 6.39e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> x -0.001
# missing standard errors
ggpredict(model, c("x", "f")) %>% print()
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # Predicted probabilities of z
#>
#> # f = a
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.54, 0.69]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
#>
#> # f = b
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.56, 0.67]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
#>
#> # f = c
#>
#> x | Predicted | 95% CI
#> --------------------------------
#> -0.10 | 0.62 | [0.54, 0.69]
#> 0.00 | 0.51 |
#> 0.10 | 0.40 |
ggpredict(model, c("x", "f")) %>% as_tibble() %>% print(n = 20)
#> Data were 'prettified'. Consider using `terms="x [all]"` to get smooth plots.
#> # A tibble: 9 x 6
#> x predicted std.error conf.low conf.high group
#> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 -0.1 0.617 0.167 0.537 0.691 a
#> 2 -0.1 0.617 0.124 0.558 0.672 b
#> 3 -0.1 0.617 0.167 0.537 0.691 c
#> 4 0 0.507 NA NA NA a
#> 5 0 0.507 NA NA NA b
#> 6 0 0.507 NA NA NA c
#> 7 0.1 0.396 NA NA NA a
#> 8 0.1 0.396 NA NA NA b
#> 9 0.1 0.396 NA NA NA c
Created on 2022-04-12 by the reprex package (v2.0.1)
I think this may be due to the singular model fit.
I dug down into the guts of the code as far as here, where there appears to be a mismatch between the dimensions of the covariance matrix of the predictions (3x3) and the number of predicted values (15).
I further suspect that the problem may happen here:
rows_to_keep <- as.numeric(rownames(unique(model_matrix_data[
intersect(colnames(model_matrix_data), terms)])))
Perhaps the function is getting confused because the conditional modes/BLUPs for every group are the same (which will only be true, generically, when the random effects variance is zero) ... ?
This seems worth opening an issue on the ggeffects issues list ?

Printing name of outcome above coxph output and exponentiating coefficients (R version 4.1.2 (2021-11-01) -- "Bird Hippie")

I ran some code below that looks at running Cox regression across multiple outcome types (stroke, cancer, respiratory) that appear in separate columns. purrr seems to do this quite well. But I would also like to
print the name of each outcome type above the corresponding regression model and
print the coefficients as hazard ratios with 95% CIs.
I know this is quite a big ask but is important since my real dataset has almost 20 outcome types. Any help would be much appreciated!
library(survival)
library(purrr)
mydata <- read.table(header=T,
text="age Sex survival stroke cancer respiratory
51 2 1.419178082 2 1 1
60 1 5 1 2 2
49 2 1.082191781 2 2 2
83 1 0.038356164 1 1 2
68 2 0.77260274 2 1 2
44 2 2.336986301 1 2 1
76 1 1.271232877 1 2 2")
outcomes <- names(mydata[4:6])
purrr::map(outcomes, ~coxph(as.formula(paste("Surv(survival,", .x, ") ~ Sex + age")),
mydata))
I'm not quite sure if this is what you are looking for, but if you run the following code:
result <- purrr::map(outcomes, function(x) {
f <- as.formula(paste("Surv(survival,", x, ") ~ Sex + age"))
model <- coxph(f, mydata)
model$call$formula <- f
s <- summary(model)
cat(x, ':\n', paste0(apply(s$coefficients, 1,
function(x) {
paste0("HR : ", round(exp(x[1]), 2),
' (95% CI ', round(exp(x[1] - 1.96 * x[3]), 2),
' - ', round(exp(x[1] + 1.96 * x[3]), 2), ')')}),
collapse = '\n'), '\n\n', sep = '')
invisible(model)
})
It will print out:
#> stroke:
#> HR : 650273590159.06 (95% CI 0 - Inf)
#> HR : 1.36 (95% CI 0.75 - 2.49)
#>
#> cancer:
#> HR : 1121.58 (95% CI 0 - 770170911.09)
#> HR : 1.33 (95% CI 0.78 - 2.28)
#>
#> respiratory:
#> HR : 24.1 (95% CI 0.31 - 1884.85)
#> HR : 1.2 (95% CI 0.99 - 1.45)
And your list of models will be stored with the correct call above them:
result
#> [[1]]
#> Call:
#> coxph(formula = Surv(survival, stroke) ~ Sex + age, data = mydata)
#>
#> coef exp(coef) se(coef) z p
#> Sex 2.720e+01 6.503e+11 2.111e+04 0.001 0.999
#> age 3.105e-01 1.364e+00 3.066e-01 1.013 0.311
#>
#> Likelihood ratio test=6.52 on 2 df, p=0.03834
#> n= 7, number of events= 3
#>
#> [[2]]
#> Call:
#> coxph(formula = Surv(survival, cancer) ~ Sex + age, data = mydata)
#>
#> coef exp(coef) se(coef) z p
#> Sex 7.0225 1121.5843 6.8570 1.024 0.306
#> age 0.2870 1.3325 0.2739 1.048 0.295
#>
#> Likelihood ratio test=2.58 on 2 df, p=0.2753
#> n= 7, number of events= 4
#>
#> [[3]]
#> Call:
#> coxph(formula = Surv(survival, respiratory) ~ Sex + age, data = mydata)
#>
#> coef exp(coef) se(coef) z p
#> Sex 3.18232 24.10259 2.22413 1.431 0.1525
#> age 0.18078 1.19815 0.09772 1.850 0.0643
#>
#> Likelihood ratio test=5.78 on 2 df, p=0.05552
#> n= 7, number of events= 5

Calculating the standard errors of autoregressive terms and sigma in censored regression model

I have fitted a censored regression model in R using carx package.
The data I'm using has 130 observations
head(SAdata)
# A tibble: 6 x 10
CC lcl Dum ci CPI LI FDI FPI OI Inflows
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.617 2.00 0 2.00 4.94 23.3 -0.207 -0.0320 -0.924 -1.16
2 0.410 2.00 0 2.00 2.92 20.5 -0.246 -0.128 0.0749 -0.299
3 0.196 2.00 0 2.00 4.26 17.2 -0.177 -0.0763 -0.482 -0.736
4 0.0518 2.00 0 2.00 5.90 15.5 0.00493 -0.313 -1.59 -1.90
5 -0.255 2.00 0 2.00 3.29 14.8 -0.0187 -0.200 -1.38 -1.59
6 -0.392 2.00 0 2.00 4.43 14 -0.00292 -0.0860 -0.00519 -0.0941
> dim(SAdata)
[1] 130 10
library(carx)
SAmodel <- carx(y=SAdata$CC, x=SAdata[,c("CPI","LI","FDI","FPI", "OI")], lcl=1.996866, p=2, CI.compute = FALSE, CI.level = 0.95)
summary(SAmodel)
Call:
carx.default(y = SAdata$CC, x = SAdata[, c("CPI", "LI", "FDI", "FPI", "OI")], lcl = 1.996866, p = 2, CI.compute = FALSE, CI.level = 0.95)
Coefficients:
Estimate
CPI 0.0804
LI -0.0088
FDI -0.0395
FPI -0.0166
OI 0.0488
AR1 1.6324
AR2 -0.7097
sigma 0.4286
AIC:
[1] -72.92045
But what I got in return is only the estimates without confidence intervals, the confidence intervals in this package are calculated with bootstrapping, which takes a lot of time, when I set the argument IC.compute = T with 1000 iterations (from what've read that is the least number of iterations), it took a whole day and it didn't finish. For this reason, I tried to manually calculate the standard errors of every coefficient, but I couldn't find a way to calculate the standard error for the autoregressive terms and sigma.
Here's what I tried
#Calculating the residuals of the fitted model
SAres = residuals(SAmodel,type="raw")
# Find the sum of the squared residuals
rss <- sum(SAres^2)
# And use that to find the estimate of sigma^2, commonly called S
S <- sqrt(rss / (length(SAres) - length(SAmodel$coefficients)))
# Make the X matrix; a column of 1s for the intercept and one for each variable
X <- cbind(rep( nrow(SAdata)), SAdata$CPI, SAdata$LI,SAdata$FDI, SAdata$FPI,
SAdata$OI)
# Multiply matrices using %*%, transpose them with t(),
# and invert them with solve(); and directly apply the formula above with:
std.errors <- S * sqrt(diag(solve(t(X) %*% X)))
std.errors
[1] 0.001232184 0.037669933 0.010483153 0.068843648 0.040779940 0.063888636
I don't how to include the autoregressive terms (AR1, AR2) and sigma, because their calculations are based on the response variable. How to get standard errors of these parameters?
I need to calculate the standard errors so later I'll be able to compute the confidence intervals of each coefficient.
Any help is deeply appreciated

estimate a repeated measures random effects model with a nested structure using `plm()`

Is it possible to estimate a repeated measures random effects model with a nested structure using plm() from the plm package?
I know it is possible with lmer() from the lme4 package. However, lmer() rely on a likelihood framework and I am curious to do it with plm().
Here's my minimal working example, inspired by this question. First some required packages and data,
# install.packages(c("plm", "lme4", "texreg", "mlmRev"), dependencies = TRUE)
data(egsingle, package = "mlmRev")
the data-set egsingle is a unbalanced panel consisting of 1721 school children, grouped in 60 schools, across five time points. For details see ?mlmRev::egsingle
Some light data management
dta <- egsingle
dta$Female <- with(dta, ifelse(female == 'Female', 1, 0))
Also, a snippet of the relevant data
dta[118:127,c('schoolid','childid','math','year','size','Female')]
#> schoolid childid math year size Female
#> 118 2040 289970511 -1.830 -1.5 502 1
#> 119 2040 289970511 -1.185 -0.5 502 1
#> 120 2040 289970511 0.852 0.5 502 1
#> 121 2040 289970511 0.573 1.5 502 1
#> 122 2040 289970511 1.736 2.5 502 1
#> 123 2040 292772811 -3.144 -1.5 502 0
#> 124 2040 292772811 -2.097 -0.5 502 0
#> 125 2040 292772811 -0.316 0.5 502 0
#> 126 2040 293550291 -2.097 -1.5 502 0
#> 127 2040 293550291 -1.314 -0.5 502 0
Now, relying heavily on Robert Long's answer, this is how I estimate a repeated measures random effects model with a nested structure using lmer() from the lme4 package,
dta$year <- as.factor(dta$year)
require(lme4)
Model.1 <- lmer(math ~ Female + size + year + (1 | schoolid /childid), dta)
# summary(Model.1)
I looked in man page for plm() and it has an indexing command, index, but it only takes a single index and time, i.e., index = c("childid", "year"), ignoring the schoolid the model would look like this,
dta$year <- as.numeric(dta$year)
library(plm)
Model.2 <- plm(math~Female+size+year, dta, index = c("childid", "year"), model="random")
# summary(Model.2)
To sum up the question
How can I, or is it even possible, to specify a repeated measures random effects model with a nested structure, like Model.1, using plm() from the plm package?
Below is the actual estimation results form the two models,
# require(texreg)
names(Model.2$coefficients) <- names(coefficients(Model.1)$schoolid) #ugly!
texreg::screenreg(list(Model.1, Model.2), digits = 3) # pretty!
#> ==============================================================
#> Model 1 Model 2
#> --------------------------------------------------------------
#> (Intercept) -2.693 *** -2.671 ***
#> (0.152) (0.085)
#> Female 0.008 -0.025
#> (0.042) (0.046)
#> size -0.000 -0.000 ***
#> (0.000) (0.000)
#> year-1.5 0.866 *** 0.878 ***
#> (0.059) (0.059)
#> year-0.5 1.870 *** 1.882 ***
#> (0.058) (0.059)
#> year0.5 2.562 *** 2.575 ***
#> (0.059) (0.059)
#> year1.5 3.133 *** 3.149 ***
#> (0.059) (0.060)
#> year2.5 3.939 *** 3.956 ***
#> (0.060) (0.060)
#> --------------------------------------------------------------
#> AIC 16590.715
#> BIC 16666.461
#> Log Likelihood -8284.357
#> Num. obs. 7230 7230
#> Num. groups: childid:schoolid 1721
#> Num. groups: schoolid 60
#> Var: childid:schoolid (Intercept) 0.672
#> Var: schoolid (Intercept) 0.180
#> Var: Residual 0.334
#> R^2 0.004
#> Adj. R^2 0.003
#> ==============================================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05
Based on Helix123's comment I wrote the following model specification for a repeated measures random effects model with a nested structure, in plm() from the plm package using Wallace and Hussain's (1969) method, i.e. random.method = "walhus", for estimation of the variance components,
p_dta <- pdata.frame(dta, index = c("childid", "year", "schoolid"))
Model.3 <- plm(math ~ Female + size + year, data = p_dta, model = "random",
effect = "nested", random.method = "walhus")
The results, seen in Model.3 below, is as close to identical, to the estimates in Model.1, as I could expect. Only the intercept is slightly different (see output below).
I wrote the above based on the example from Baltagi, Song and Jung (2001) provided in ?plm. In the Baltagi, Song and Jung (2001)-example the variance components are estimated first using Swamy and Arora (1972), i.e. random.method = "swar", and second with using Wallace and Hussain's (1969). Only the Nerlove (1971) transformation does not converge using the Song and Jung (2001)-data. Whereas it was only Wallace and Hussain's (1969)-method that could converge using the egsingle data-set.
Any authoritative references on this would be appreciated. I'll keep working at it.
names(Model.3$coefficients) <- names(coefficients(Model.1)$schoolid)
texreg::screenreg(list(Model.1, Model.3), digits = 3,
custom.model.names = c('Model 1', 'Model 3'))
#> ==============================================================
#> Model 1 Model 3
#> --------------------------------------------------------------
#> (Intercept) -2.693 *** -2.697 ***
#> (0.152) (0.152)
#> Female 0.008 0.008
#> (0.042) (0.042)
#> size -0.000 -0.000
#> (0.000) (0.000)
#> year-1.5 0.866 *** 0.866 ***
#> (0.059) (0.059)
#> year-0.5 1.870 *** 1.870 ***
#> (0.058) (0.058)
#> year0.5 2.562 *** 2.562 ***
#> (0.059) (0.059)
#> year1.5 3.133 *** 3.133 ***
#> (0.059) (0.059)
#> year2.5 3.939 *** 3.939 ***
#> (0.060) (0.060)
#> --------------------------------------------------------------
#> AIC 16590.715
#> BIC 16666.461
#> Log Likelihood -8284.357
#> Num. obs. 7230 7230
#> Num. groups: childid:schoolid 1721
#> Num. groups: schoolid 60
#> Var: childid:schoolid (Intercept) 0.672
#> Var: schoolid (Intercept) 0.180
#> Var: Residual 0.334
#> R^2 0.000
#> Adj. R^2 -0.001
#> ==============================================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05#>

Resources