Convert part of a statistical function's output into a data.frame

Convert part of a statistical function's output into a data.frame - r

I was wondering if there might be a way to turn the following part of the OUTPUT of the res and res2 objects into a data.frame?
Note: answer below works with res but not res2.
A functional answer is appreciated as the data below is just toy.
library(metafor)
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
#== OUTPUT (CAN WE TURN ONLY BELOW PART INTO A data.frame?):
#Variance Components:
# estim sqrt nlvls fixed factor
#sigma^2.1 0.0651 0.2551 11 no district
#sigma^2.2 0.0327 0.1809 56 no district/school
#Test for Heterogeneity:
#Q(df = 55) = 578.8640, p-val < .0001
# AND
res2 <- rma.mv(yi, vi, random = ~ factor(school) | district, data=dat)
#== OUTPUT (CAN WE TURN ONLY BELOW PART INTO A data.frame?):
#Variance Components:
#outer factor: district (nlvls = 11)
#inner factor: factor(school) (nlvls = 11)
# estim sqrt fixed
#tau^2 0.0978 0.3127 no
#rho 0.6653 no
#Test for Heterogeneity:
#Q(df = 55) = 578.8640, p-val < .0001

If there is no default/standard way to extract the data then you can manipulate the output using capture.output.
return_data <- function(res) {
tmp <- capture.output(res)
#data start from second line after "Variance Components:"
start <- which(tmp == "Variance Components:") + 2
index <- which(tmp == "")
#Data ends before the empty line after "Variance Components:"
end <- index[which.max(index > start)] - 1
data <- read.table(text = paste0(tmp[start:end], collapse = '\n'), header = T)
heterogeneity_index <- which(tmp == "Test for Heterogeneity:") + 1
list(data = data, heterogeneity = tmp[heterogeneity_index])
}
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
return_data(res)
#$data
# estim sqrt nlvls fixed factor
#sigma^2.1 0.0651 0.2551 11 no district
#sigma^2.2 0.0327 0.1809 56 no district/school
#$heterogeneity
#[1] "Q(df = 55) = 578.8640, p-val < .0001"

Would this suit your purposes? The 'Test for Heterogeneity' doesn't really fit in the dataframe, so I added it as a seperate column and it gets duplicated as a result. I'm not sure how else you could do it.
library(tidyverse)
#install.packages("metafor")
library(metafor)
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
#>
#> Loading the 'metafor' package (version 3.0-2). For an
#> introduction to the package please type: help(metafor)
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res
#>
#> Multivariate Meta-Analysis Model (k = 56; method: REML)
#>
#> Variance Components:
#>
#> estim sqrt nlvls fixed factor
#> sigma^2.1 0.0651 0.2551 11 no district
#> sigma^2.2 0.0327 0.1809 56 no district/school
#>
#> Test for Heterogeneity:
#> Q(df = 55) = 578.8640, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> 0.1847 0.0846 2.1845 0.0289 0.0190 0.3504 *
#>
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vc <- cbind(estim = res$sigma2,
sqrt = res$sigma,
nlvls = res$s.nlevels,
fixed = ifelse(res$vc.fix$sigma2, "yes", "no"),
factor = res$s.names,
R = ifelse(res$Rfix, "yes", "no"),
Test_for_heterogeneity = paste0("Q(df = ", res$k - res$p, ") = ", metafor:::.fcf(res$QE, res$digits[["test"]]), ", p-val ", metafor:::.pval(res$QEp,
res$digits[["pval"]], showeq = TRUE, sep = " "))
)
rownames(vc) <- c("sigma^2.1", "sigma^2.2")
result <- as.data.frame(vc)
result
#> estim nlvls fixed factor R Test_for_heterogeneity
#> sigma^2.1 "0.0650619442753117" "11" "no" "district" "no" "Q(df = 55) = 578.8640, p-val < .0001"
#> sigma^2.2 "0.0327365170279351" "56" "no" "district/school" "no" "Q(df = 55) = 578.8640, p-val < .0001"
Created on 2021-10-06 by the reprex package (v2.0.1)

Related

How can the effect size of a PERMANOVA be calculated?

I use the "vegan" package to perform a PERMANOVA (adonis2()), and I also want to calculate the effect size (ω²). For this, I tried to use omega_squared() from the "effectsize" package, but I failed. I think it does not understand the output table, specifically the part with the mean squares. Is it possible to fix this or do I have to calculate manually?
library(vegan)
#> Lade nötiges Paket: permute
#> Lade nötiges Paket: lattice
#> This is vegan 2.6-4
library(effectsize)
data(dune)
data(dune.env)
ado <- adonis2(dune ~ Management, data = dune.env, permutations = 100)
ado
#> Permutation test for adonis under reduced model
#> Terms added sequentially (first to last)
#> Permutation: free
#> Number of permutations: 100
#>
#> adonis2(formula = dune ~ Management, data = dune.env, permutations = 100)
#> Df SumOfSqs R2 F Pr(>F)
#> Management 3 1.4686 0.34161 2.7672 0.009901 **
#> Residual 16 2.8304 0.65839
#> Total 19 4.2990 1.00000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
w2 <- omega_squared(ado)
#> Error in `[[<-.data.frame`(`*tmp*`, "Mean_Square", value = numeric(0)): Ersetzung hat 0 Zeilen, Daten haben 3
interpret_omega_squared(w2)
#> Error in interpret(es, rules): Objekt 'w2' nicht gefunden
Created on 2022-11-15 with reprex v2.0.2
EDIT
I tried to do it manually:
library(vegan, quietly = T, warn.conflicts = F)
#> This is vegan 2.6-4
library(effectsize)
library(dplyr, quietly = T, warn.conflicts = F)
library(tibble)
library(purrr)
data(dune)
data(dune.env)
ado <- adonis2(dune ~ Management, data = dune.env, permutations = 100)
w2 <- omega_squared(ado) # Does not work
#> Error in `[[<-.data.frame`(`*tmp*`, "Mean_Square", value = numeric(0)): Ersetzung hat 0 Zeilen, Daten haben 3
interpret_omega_squared(w2) # Does not work
#> Error in interpret(es, rules): Objekt 'w2' nicht gefunden
ado_tidy <- tibble( # manually create Adonis test result table
parameter = c("Management", "Residual", "Total"),
df = ado %>% pull("Df"), # Degree of freedom
ss = ado %>% pull("SumOfSqs"), # sum of squares
meansqs = ss / df, # mean squares
p_r2 = ado %>% pull("R2"), # partial R²
f = ado %>% pull("F"), # F value
p = ado %>% pull("Pr(>F)") # p value
)
ado_tidy
#> # A tibble: 3 x 7
#> parameter df ss meansqs p_r2 f p
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Management 3 1.47 0.490 0.342 2.77 0.00990
#> 2 Residual 16 2.83 0.177 0.658 NA NA
#> 3 Total 19 4.30 0.226 1 NA NA
# Formula:
# W2 = (DFm * (F - 1)) / ((DFm * (F - 1)) + (DFm + 1))
W2 <- abs(
(ado_tidy %>% pull(df) %>% chuck(3) * (ado_tidy %>% pull(f) %>% chuck(1) - 1)) /
((ado_tidy %>% pull(df) %>% chuck(3) * (ado_tidy %>% pull(f) %>% chuck(1) - 1) +
ado_tidy %>% pull(df) %>% chuck(3) + 1)
)
)
W2
#> [1] 0.6267099
interpret_omega_squared(W2, rules = "field2013")
#> [1] "large"
#> (Rules: field2013)
Created on 2022-11-15 with reprex v2.0.2
Hopefully, the equation is correct...

Here is the MicEco::adonis_OmegaSq function edited so that it works both with the current vegan::adonis2 and deprecated vegan::adonis:
#' Calculate (partial) Omega-squared (effect-size calculation) for PERMANOVA and add it to the input object
#'
#' #param adonisOutput An adonis object
#' #param partial Should partial omega-squared be calculated (sample size adjusted). Default TRUE
#' #return Original adonis object with the (partial) Omega-squared values added
#' #import vegan
#' #export
adonis_OmegaSq <- function(adonisOutput, partial = TRUE){
if(!(is(adonisOutput, "adonis") || is(adonisOutput, "anova.cca")))
stop("Input should be an adonis object")
if (is(adonisOutput, "anova.cca")) {
aov_tab <- adonisOutput
aov_tab$MeanSqs <- aov_tab$SumOfSqs / aov_tab$Df
aov_tab$MeanSqs[length(aov_tab$Df)] <- NA
} else {
aov_tab <- adonisOutput$aov.tab
}
heading <- attr(aov_tab, "heading")
MS_res <- aov_tab[pmatch("Residual", rownames(aov_tab)), "MeanSqs"]
SS_tot <- aov_tab[rownames(aov_tab) == "Total", "SumsOfSqs"]
N <- aov_tab[rownames(aov_tab) == "Total", "Df"] + 1
if(partial){
omega <- apply(aov_tab, 1, function(x) (x["Df"]*(x["MeanSqs"]-MS_res))/(x["Df"]*x["MeanSqs"]+(N-x["Df"])*MS_res))
aov_tab$parOmegaSq <- c(omega[1:(length(omega)-2)], NA, NA)
} else {
omega <- apply(aov_tab, 1, function(x) (x["SumsOfSqs"]-x["Df"]*MS_res)/(SS_tot+MS_res))
aov_tab$OmegaSq <- c(omega[1:(length(omega)-2)], NA, NA)
}
if (is(adonisOutput, "adonis"))
cn_order <- c("Df", "SumsOfSqs", "MeanSqs", "F.Model", "R2",
if (partial) "parOmegaSq" else "OmegaSq", "Pr(>F)")
else
cn_order <- c("Df", "SumOfSqs", "F", if (partial) "parOmegaSq" else "OmegaSq",
"Pr(>F)")
aov_tab <- aov_tab[, cn_order]
attr(aov_tab, "names") <- cn_order
attr(aov_tab, "heading") <- heading
if (is(adonisOutput, "adonis"))
adonisOutput$aov.tab <- aov_tab
else
adonisOutput <- aov_tab
return(adonisOutput)
}
source() this function and it should work. In my test it gave the same results for both adonis2 and adonis.

paste formula in function (c constant?)

I'm doing a function pasting a formula and then returning a feols result. But, I get a c at the beginning. How can I solve this?
library(dplyr)
library(fixest)
data(base_did)
base_did = base_did %>% mutate(D = 5*rnorm(1080),
x2 = 10*rnorm(1080),
rand_wei = abs(rnorm(1080)))
f <- function(data, arg=NULL){
arg = enexpr(arg)
if (length(arg) == 0) {
formula = "D ~ 1"
}
else {
formula = paste(arg, collapse = " + ")
formula = paste("D ~ ", formula, sep = "")
}
formula = paste(formula, " | id + period", sep = "")
denom.lm <- feols(as.formula(formula), data = data,
weights = abs(data$rand_wei))
return(denom.lm)
}
f(base_did, arg = c(x1,x2))
#Error in feols(as.formula(formula), data = data, weights = abs(data$rand_wei)) :
# Evaluation of the right-hand-side of the formula raises an error:
# In NULL: Evaluation of .Primitive("c") returns an object of length 1
#while the data set has 1080 rows.
If I return(formula) at the end. I get [1] "D ~ c + x1 + x2 | id + period".
But I need only D ~ x1 + x2 | id + period.

Perhaps one option to make your function work would be to pass the arguments via ... so that c is not needed and which would prevent the c to be added to your formula. To make this work you also have switch to enexprs inside your function.
Note: I slightly adjusted your function for the reprex to return just the formula.
library(dplyr, warn = FALSE)
library(fixest)
data(base_did)
base_did = base_did %>% mutate(D = 5*rnorm(1080),
x2 = 10*rnorm(1080),
rand_wei = abs(rnorm(1080)))
f <- function(data, ...){
arg = enexprs(...)
if (length(arg) == 0) {
formula = "D ~ 1"
}
else {
formula = paste(arg, collapse = " + ")
formula = paste("D ~ ", formula, sep = "")
}
formula = paste(formula, " | id + period", sep = "")
as.formula(formula)
}
f(base_did, x1, x2)
#> D ~ x1 + x2 | id + period
#> <environment: 0x7fe8f3567618>
f(base_did)
#> D ~ 1 | id + period
#> <environment: 0x7fe8f366f848>
UPDATE There is probably a better approach but after some research a possible option would be:
Note: When passing multiple arguments via c enexpr will return a call object which behaves like a list and where the first element contains the function name, i.e. c. That's why you get the c added to your formula.
f <- function(data, arg = NULL) {
arg <- enexpr(arg)
if (length(arg) == 0) {
formula <- "D ~ 1"
} else {
if (length(arg) > 1) arg <- vapply(as.list(arg[-1]), rlang::as_string, FUN.VALUE = character(1))
formula <- paste(arg, collapse = " + ")
formula <- paste("D ~ ", formula, sep = "")
}
formula <- paste(formula, " | id + period", sep = "")
as.formula(formula)
}
f(base_did, c(x1, x2))
#> D ~ x1 + x2 | id + period
#> <environment: 0x7fa763431388>
f(base_did, x1)
#> D ~ x1 | id + period
#> <environment: 0x7fa763538c40>
f(base_did)
#> D ~ 1 | id + period
#> <environment: 0x7fa765e22028>

You can tremendously simplify your function if you use fixest's built-in formula manipulation tools (see here). In particular the dot-square-bracket operator:
library(fixest)
data(base_did)
n = 1080
base = within(base_did, {
D = 5 * rnorm(n)
x2 = 10 *rnorm(n)
rand_wei = abs(rnorm(n))
})
f = function(data, ctrl = "1"){
feols(D ~ .[ctrl] | id + period,
data = data, weights = ~abs(rand_wei))
}
est1 = f(base)
est2 = f(base, ~x1) # with a formula
est3 = f(base, c("x1", "x2")) # with a character vector
etable(est1, est2, est3)
#> est1 est2 est3
#> Dependent Var.: D D D
#>
#> x1 0.0816 (0.0619) 0.0791 (0.0618)
#> x2 -0.0157 (0.0186)
#> Fixed-Effects: -------- --------------- ----------------
#> id Yes Yes Yes
#> period Yes Yes Yes
#> _______________ ________ _______________ ________________
#> S.E. type -- by: id by: id
#> Observations 1,080 1,080 1,080
#> R2 0.12810 0.13005 0.13094
#> Within R2 -- 0.00224 0.00326
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
EDIT: note that the formula thing (est2) only works with version >= 0.11.0.

Problem using function with "as.formula" in a loop in R

I've created a function that returns an ANOVA table, and it uses formula to create the formula of the oneway.testfunction.
A simplified version of the function is:
anova_table <- function(df, dv, group){
dv_t <- deparse(substitute(dv))
group_t <- deparse(substitute(group))
anova <- oneway.test(formula = formula(paste(dv_t, "~", group_t)),
data = df,
var.equal = F)
return(anova)
}
It works fine when I use it outside a loop:
data("mpg")
mpg <- mpg %>% mutate_if(is.character, as.factor)
anova_table(mpg, displ, drv)
However, I'd like it to work also inside a loop.
When I try the following code, I get this error message:
"Error in model.frame.default(formula = formula(paste(dv_t, "~", group_t)), :
object is not a matrix"
I'm not sure what I'm doing wrong.
vars_sel <- mpg %>% select(where(is.numeric)) %>% names()
vars_sel <- dput(vars_sel)
vars_sel <- syms(vars_sel)
for(i in vars_sel){
var <- sym(i)
print(anova_table(mpg, var, drv))
}
Any help would be much appreciated!

Because of how your function works, the var in your loop is being taken literally, so the function is looking for a column called var in mpg which doesn't exist. You can get round this by building and evaluating a call to your function in the loop:
for(i in vars_sel){
a <- eval(as.call(list(anova_table, df = mpg, dv = i, group = quote(drv))))
print(a)
}
#>
#> One-way analysis of means (not assuming equal variances)
#>
#> data: displ and drv
#> F = 143.9, num df = 2.000, denom df = 67.605, p-value < 2.2e-16
#>
#>
#> One-way analysis of means (not assuming equal variances)
#>
#> data: year and drv
#> F = 0.59072, num df = 2.000, denom df = 67.876, p-value = 0.5567
#>
#>
#> One-way analysis of means (not assuming equal variances)
#>
#> data: cyl and drv
#> F = 129.2, num df = 2.000, denom df = 82.862, p-value < 2.2e-16
#>
#>
#> One-way analysis of means (not assuming equal variances)
#>
#> data: cty and drv
#> F = 89.54, num df = 2.000, denom df = 78.879, p-value < 2.2e-16
#>
#>
#> One-way analysis of means (not assuming equal variances)
#>
#> data: hwy and drv
#> F = 127.14, num df = 2.000, denom df = 71.032, p-value < 2.2e-16
Created on 2022-09-25 with reprex v2.0.2

Extracting P-value column from output Anova (car package)

I am using the 'car' package function Anova for some statistical testing.
It gives the following output:
Y = cbind(curdata$V1, curdata$V2, curdata$V3)
mymdl = lm(Y ~ curdata$V4 + curdata$V5)
myanova = Anova(mymdl)
Type II MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
curdata$V4 1 0.27941 2.9728 3 23 0.05280 .
curdata$V5 1 0.33570 3.8743 3 23 0.02228 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would like to extract the values in the 'Pr(>F)' column, so I can place these p-values in another matrix for later correction of multiple comparisons.
I have tried using unlist, but it still does not provide the p-values found in the column.
Any help with this would be greatly appreciated.

If we have multiple response variables, it is a Manova. We could capture the output and use regex
as.numeric(sub(".*\\s*(\\d+\\.[0-9e-]+)\\s*[*.]*", "\\1", capture.output(out)[4:5]))
#[1] 8.836e-06 2.200e-16
data
mymdl <- lm(cbind(Sepal.Length, Sepal.Width) ~ Petal.Width +
Petal.Length, data = iris)
out <- Anova(mymdl)

Maybe not the most practical way, but you can play around columns using separate() from tidyr:
library(car)
library(dplyr)
library(tidyr)
#Code
v1 <- data.frame(capture.output(myanova))
v1 <- v1[3:5,,drop=F]
names(v1)<-'v1'
v2 <- separate(v1,v1,c(paste0('v',1:21)),sep = '\\s')
v2 <- v2[-1,]
Output:
as.numeric(v2$v21)
[1] 8.836e-06 2.200e-16
Warning: you would need to change 1:21 if necessary if more columns are present in the capture action.

TLDR:
# define helper:
get_summary_for_print <- car:::print.Anova.mlm
body(get_summary_for_print) <- local({tmp <- body(get_summary_for_print);tmp[-(length(tmp)-(0:1))]})
#use it:
get_summary_for_print(Anova(mymdl))$`Pr(>F)`
Unfortunately there is no designated way. But you can look at the source of car:::print.Anova.mlm (by typing this in the R console) to learn how it gets the values you want:
function (x, ...)
{
if ((!is.null(x$singular)) && x$singular)
stop("singular error SSP matrix; multivariate tests unavailable\ntry summary(object, multivariate=FALSE)")
test <- x$test
repeated <- x$repeated
ntests <- length(x$terms)
tests <- matrix(NA, ntests, 4)
if (!repeated)
SSPE.qr <- qr(x$SSPE)
for (term in 1:ntests) {
eigs <- Re(eigen(qr.coef(if (repeated) qr(x$SSPE[[term]]) else SSPE.qr,
x$SSP[[term]]), symmetric = FALSE)$values)
tests[term, 1:4] <- switch(test, Pillai = Pillai(eigs,
x$df[term], x$error.df), Wilks = Wilks(eigs, x$df[term],
x$error.df), `Hotelling-Lawley` = HL(eigs, x$df[term],
x$error.df), Roy = Roy(eigs, x$df[term], x$error.df))
}
ok <- tests[, 2] >= 0 & tests[, 3] > 0 & tests[, 4] > 0
ok <- !is.na(ok) & ok
tests <- cbind(x$df, tests, pf(tests[ok, 2], tests[ok, 3],
tests[ok, 4], lower.tail = FALSE))
rownames(tests) <- x$terms
colnames(tests) <- c("Df", "test stat", "approx F", "num Df",
"den Df", "Pr(>F)")
tests <- structure(as.data.frame(tests), heading = paste("\nType ",
x$type, if (repeated)
" Repeated Measures", " MANOVA Tests: ", test, " test statistic",
sep = ""), class = c("anova", "data.frame"))
print(tests, ...)
invisible(x)
}
<bytecode: 0x56032ea80990>
<environment: namespace:car>
In this case, there is quite a few lines of code involved to compute the p-values. However, we can easily create a modified version of the print function to return the table (tests) instead of only printing it (print(tests, ...)) and returning the original object (invisible(x)):
get_summary_for_print <- car:::print.Anova.mlm # copy the original print function (inclusive environment)
body(get_summary_for_print) <- # replace the code of our copy
local({ # to avoid pollution of environment by tmp
tmp <- body(get_summary_for_print) # to avoid code duplication
tmp[-(length(tmp)-(0:1))] # remove the last two code lines of the function
})
And use it for example like this:
library(car)
#> Loading required package: carData
res <- Anova(lm(cbind(Sepal.Width, Sepal.Length, Petal.Width) ~ Species + Petal.Length, iris))
res
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> Species 2 0.70215 26.149 6 290 < 2.2e-16 ***
#> Petal.Length 1 0.63487 83.461 3 144 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
str(get_summary_for_print(res))
#> Classes 'anova' and 'data.frame': 2 obs. of 6 variables:
#> $ Df : num 2 1
#> $ test stat: num 0.702 0.635
#> $ approx F : num 26.1 83.5
#> $ num Df : num 6 3
#> $ den Df : num 290 144
#> $ Pr(>F) : num 7.96e-25 2.41e-31
#> - attr(*, "heading")= chr "\nType II MANOVA Tests: Pillai test statistic"

Why is lm_robust() HC3 standard error smaller than coeftest() HC0 standard error?

I am using lm_robust of package 'estimatr' for a fixed effect model including HC3 robust standard errors. I had to switch from vcovHC(), because my data sample was just to large to be handled by it.
using following line for the regression:
lm_robust(log(SPREAD) ~ PERIOD, data = dat, fixed_effects = ~ STOCKS + TIME, se_type = "HC3")
The code runs fine, and the coefficients are the same as using fixed effects from package plm. Since I can not use coeftest to estimate HC3 standard errors with the plm output due to a too large data sample, I compared the HC3 estimator of lm_robustwith the HC1 of coeftest(model, vcov= vcovHC(model, type = HC1))
As result the HC3 standarderror of lm_robust is much smaller than HC1 from coeftest.
Does somebody has an explanation, since HC3 should be more restrictive than HC1. I appreciate any recommendations and solutions.
EDIT model used for coeftest:
plm(log(SPREAD) ~ PERIOD, data = dat, index = c("STOCKS", "TIME"), effect = "twoway", method = "within")

It appears that the vcovHC() method for plm automatically estimates cluster-robust standard errors, while for lm_robust(), it does not. Therefore, the HC1 estimation of the standard error for plm will appear inflated compared to lm_robust (of lm for that matter).
Using some toy data:
library(sandwich)
library(tidyverse)
library(plm)
library(estimatr)
library(lmtest)
set.seed(1981)
x <- sin(1:1000)
y <- 1 + x + rnorm(1000)
f <- as.character(sort(rep(sample(1:100), 10)))
t <- as.character(rep(sort(sample(1:10)), 100))
dat <- tibble(y = y, x = x, f = f, t = t)
lm_fit <- lm(y ~ x + f + t, data = dat)
plm_fit <- plm(y ~ x, index = c("f", "t"), model = "within", effect = "twoways", data = dat)
rb_fit <- lm_robust(y ~ x, fixed_effects = ~ f + t, data = dat, se_type = "HC1", return_vcov = TRUE)
sqrt(vcovHC(lm_fit, type = "HC1")[2, 2])
#> [1] 0.04752337
sqrt(vcovHC(plm_fit, type = "HC1"))
#> x
#> x 0.05036414
#> attr(,"cluster")
#> [1] "group"
sqrt(rb_fit$vcov)
#> x
#> x 0.04752337
rb_fit <- lm_robust(y ~ x, fixed_effects = ~ f + t, data = dat, se_type = "HC3", return_vcov = TRUE)
sqrt(vcovHC(lm_fit, type = "HC3")[2, 2])
#> [1] 0.05041177
sqrt(vcovHC(plm_fit, type = "HC3"))
#> x
#> x 0.05042142
#> attr(,"cluster")
#> [1] "group"
sqrt(rb_fit$vcov)
#> x
#> x 0.05041177
There does not appear to be equivalent cluster-robust standard error types in the two packages. However, the SEs get closer when specifying cluster-robust SEs in lm_robust():
rb_fit <- lm_robust(y ~ x, fixed_effects = ~ f + t, clusters = f, data = dat, se_type = "CR0")
summary(rb_fit)
#>
#> Call:
#> lm_robust(formula = y ~ x, data = dat, clusters = f, fixed_effects = ~f +
#> t, se_type = "CR0")
#>
#> Standard error type: CR0
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
#> x 0.925 0.05034 18.38 1.133e-33 0.8251 1.025 99
#>
#> Multiple R-squared: 0.3664 , Adjusted R-squared: 0.2888
#> Multiple R-squared (proj. model): 0.3101 , Adjusted R-squared (proj. model): 0.2256
#> F-statistic (proj. model): 337.7 on 1 and 99 DF, p-value: < 2.2e-16
coeftest(plm_fit, vcov. = vcovHC(plm_fit, type = "HC1"))
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> x 0.925009 0.050364 18.366 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2020-04-16 by the reprex package (v0.3.0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Convert part of a statistical function's output into a data.frame - r

Related

How can the effect size of a PERMANOVA be calculated?

paste formula in function (c constant?)

Problem using function with "as.formula" in a loop in R

Extracting P-value column from output Anova (car package)

Why is lm_robust() HC3 standard error smaller than coeftest() HC0 standard error?

Categories

Resources