I want to extract the standard errors from the output of the tsls command in the sem R package.
Using some generic code as an example:
fit = tsls(Y ~ X, ~Z)
summary(fit)
The summary function outputs several things besides the regression estimates (e.g., model formulas, summary of the residuals).
I want an equivalent to fit$coef that outputs standard errors. But that doesn't seem to be an option. All the code used to do the equivalent for glm and lm output doesn't seem to work here. Is there any way to hack the output?
Sometimes it takes a little bit of digging to find where these values are coming from. The best place to look, if you don't get any clues from str(fit), would be to look at what summary.tsls is doing.
With some help from getAnywhere("summary.tsls"), we see:
getAnywhere("summary.tsls")
# A single object matching ‘summary.tsls’ was found
# It was found in the following places
# registered S3 method for summary from namespace sem
# namespace:sem
# with value
#
# function (object, digits = getOption("digits"), ...)
# {
# ###
# ### \\\SNIP///
# ###
# std.errors <- sqrt(diag(object$V))
# ###
# ### \\\SNIP///
# ###
# }
# <bytecode: 0x503c530>
# <environment: namespace:sem>
So, to get the value you are looking for, you need to calculate it yourself with:
sqrt(diag(fit$V))
A reproducible example:
library(sem)
fit <- tsls(Q ~ P + D, ~ D + F + A, data=Kmenta)
summary(fit)
#
# 2SLS Estimates
#
# Model Formula: Q ~ P + D
#
# Instruments: ~D + F + A
#
# Residuals:
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -3.4300 -1.2430 -0.1895 0.0000 1.5760 2.4920
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 94.63330387 7.92083831 11.94738 1.0762e-09 ***
# P -0.24355654 0.09648429 -2.52431 0.021832 *
# D 0.31399179 0.04694366 6.68869 3.8109e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.9663207 on 17 degrees of freedom
sqrt(diag(fit$V))
# (Intercept) P D
# 7.92083831 0.09648429 0.04694366
Related
For Y = % of population with income below poverty level and X = per capita income of population, I have constructed a box-cox plot and found that the lambda = 0.02020:
bc <- boxcox(lm(Percent_below_poverty_level ~ Per_capita_income, data=tidy.CDI), plotit=T)
bc$x[which.max(bc$y)] # gives lambda
Now I want to fit a simple linear regression using the transformed data, so I've entered this code
transform <- lm((Percent_below_poverty_level**0.02020) ~ (Per_capita_income**0.02020))
transform
But all I get is the error message
'Error in terms.formula(formula, data = data) : invalid power in formula'. What is my mistake?
You could use bcPower() from the car package.
## make sure you do install.packages("car") if you haven't already
library(car)
data(Prestige)
p <- powerTransform(prestige ~ income + education + type ,
data=Prestige,
family="bcPower")
summary(p)
# bcPower Transformation to Normality
# Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
# Y1 1.3052 1 0.9408 1.6696
#
# Likelihood ratio test that transformation parameter is equal to 0
# (log transformation)
# LRT df pval
# LR test, lambda = (0) 41.67724 1 1.0765e-10
#
# Likelihood ratio test that no transformation is needed
# LRT df pval
# LR test, lambda = (1) 2.623915 1 0.10526
mod <- lm(bcPower(prestige, 1.3052) ~ income + education + type, data=Prestige)
summary(mod)
#
# Call:
# lm(formula = bcPower(prestige, 1.3052) ~ income + education +
# type, data = Prestige)
#
# Residuals:
# Min 1Q Median 3Q Max
# -44.843 -13.102 0.287 15.073 62.889
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -3.736e+01 1.639e+01 -2.279 0.0250 *
# income 3.363e-03 6.928e-04 4.854 4.87e-06 ***
# education 1.205e+01 2.009e+00 5.999 3.78e-08 ***
# typeprof 2.027e+01 1.213e+01 1.672 0.0979 .
# typewc -1.078e+01 7.884e+00 -1.368 0.1746
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 22.25 on 93 degrees of freedom
# (4 observations deleted due to missingness)
# Multiple R-squared: 0.8492, Adjusted R-squared: 0.8427
# F-statistic: 131 on 4 and 93 DF, p-value: < 2.2e-16
Powers (more often represented by ^ than ** in R, FWIW) have a special meaning inside formulas [they represent interactions among variables rather than mathematical operations]. So if you did want to power-transform both sides of your equation you would use the I() or "as-is" operator:
I(Percent_below_poverty_level^0.02020) ~ I(Per_capita_income^0.02020)
However, I think you should do what #DaveArmstrong suggested anyway:
it's only the predictor variable that gets transformed
the Box-Cox transformation is actually (y^lambda-1)/lambda (although the shift and scale might not matter for your results)
I am using the nls package in R to perform a nonlinear fit. I have specified my independent variable as follows:
t <- seq(1,7)
and my dependent variables as P <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
I then have tried:
m <- nls(P ~ 1 / (c + q*exp(-b*t))^(1/v)),
but every time I get:
"Error in c + q * exp(-b * t) : non-numeric argument to binary
operator"
Every one of my variables is numeric. Any ideas?
Thanks!
You have more than one problem in your script. The main issue is that you should never use names which are used by R: t is the matrix transpose, c is a common method to create vectors, and q is the quit instruction. nls() will not try to fit them, as they are already defined. I recommend using more meaningful and less dangerous variables such as Coef1, Coef2, …
The second problem is that you are trying to fit a model with 4 variables to a dataset with 7 data... This may yield singularities and other problems.
For the sake of the argument, I have reduced your model to three variables, and changed some names:
Time <- seq(1,7)
Prob <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
plot(Time, Prob)
And now we perform the nls() fit:
Fit <- nls(Prob ~ 1 / (Coef1 + Coef2 * exp(-Coef3 * Time)))
X <- data.frame(Time = seq(0, 7, length.out = 100))
Y <- predict(object = Fit, newdata = X)
lines(X$Time, Y)
And a summary of the results:
summary(Fit)
# Formula: Prob ~ 1/(Coef1 + Coef2 * exp(-Coef3 * Time))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# Coef1 1.00778 0.06113 16.487 7.92e-05 ***
# Coef2 23.43349 14.42378 1.625 0.1796
# Coef3 1.04899 0.21892 4.792 0.0087 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.06644 on 4 degrees of freedom
#
# Number of iterations to convergence: 12
# Achieved convergence tolerance: 3.04e-06
I know it is not exactly what you wanted, but I hope it helps.
I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)
I would like to get pairwise comparisons of adjusted means using lsmeans(), while supplying a robust coefficient-covariance matrix (e.g. vcovHC). Usually functions on regression models provide a vcov argument, but I can't seem to find any such argument in the lsmeans package.
Consider this dummy example, originally from CAR:
require(car)
require(lmtest)
require(sandwich)
require(lsmeans)
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
coeftest(mod.moore.2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 1.372669 7.4292 4.111e-09 ***
## fcategorymedium -1.176000 1.902026 -0.6183 0.539805
## fcategoryhigh -0.080889 1.809187 -0.0447 0.964555
## partner.statushigh 4.606667 1.556460 2.9597 0.005098 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coeftest(mod.moore.2, vcov.=vcovHAC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
## fcategorymedium -1.176000 1.574682 -0.7468 0.459435
## fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
## partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lsmeans(mod.moore.2, list(pairwise ~ fcategory), adjust="none")[[2]]
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.902026 41 0.618 0.5398
## low - high 0.08088889 1.809187 41 0.045 0.9646
## medium - high -1.09511111 1.844549 41 -0.594 0.5560
##
## Results are averaged over the levels of: partner.status
As you can see, lsmeans() estimates p-values using the default variance-covariance matrix.
How can I obtain pairwise contrasts using the vcovHAC variance estimate?
It turns out that there is a wonderful and seamless interface between lsmeans and multcomp packages (see ?lsm), whereas lsmeans provides support for glht().
require(multcomp)
x <- glht(mod.moore.2, lsm(pairwise ~ fcategory), vcov=vcovHAC)
## Note: df set to 41
summary(x, test=adjusted("none"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = conformity ~ fcategory + partner.status, data = Moore)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## low - medium == 0 1.17600 1.57468 0.747 0.459
## low - high == 0 0.08089 2.14610 0.038 0.970
## medium - high == 0 -1.09511 1.86197 -0.588 0.560
## (Adjusted p values reported -- none method)
This is at least one way to achieve this. I'm still hoping someone knows of an approach using lsmeans only...
Another way to approach this is to hack into the lsmeans object, and manually replace the variance-covariance matrix prior to summary-ing the object.
mod.lsm <- lsmeans(mod.moore.2, ~ fcategory)
mod.lsm#V <- vcovHAC(mod.moore.2) ##replace default vcov with custom vcov
pairs(mod.lsm, adjust = "none")
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.574682 41 0.747 0.4594
## low - high 0.08088889 2.146102 41 0.038 0.9701
## medium - high -1.09511111 1.861969 41 -0.588 0.5597
##
## Results are averaged over the levels of: partner.status
I'm not sure if this was possible using the 'lsmeans' package but it is using the updated emmeans package.
Moore <- within(carData::Moore, {
partner.status <- factor(partner.status, c("low", "high"))
fcategory <- factor(fcategory, c("low", "medium", "high"))
})
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
lmtest::coeftest(mod.moore.2, vcov.= sandwich::vcovHAC)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
#> fcategorymedium -1.176000 1.574682 -0.7468 0.459435
#> fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
#> partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC(mod.moore.2),
adjust = "none")$contrasts
#> contrast estimate SE df t.ratio p.value
#> medium - low -1.1760 1.57 41 -0.747 0.4594
#> high - low -0.0809 2.15 41 -0.038 0.9701
#>
#> Results are averaged over the levels of: partner.status
Created on 2021-07-08 by the reprex package (v0.3.0)
Note, you can't just write the following
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC,
adjust = "none")$contrasts
due to conflict with the sandwich::vcovHAC command which also has an adjust option. (I had incorrectly thought this was a bug).
OR
use update to inject a custom vcov matrix into your emmeans/emmGrid object.
Example:
# create an emmeans object from your fitted model
emmob <- emmeans(thismod, ~ predictor)
# generate a robust vcov matrix using a function
# from the sandwich or clubSandwich package
vcovR <- vcovHC(thismod, type="HC3")
# turn the resulting object into a (square) matrix
vcovRm <- matrix(vcovR, ncol=ncol(vcovR))
# update the V slot of the emmeans/emmGrid object
emmob <- update(emmob, V=vcovRm)
I'm new to R. We have an assignment that i'm working on. The assignment is on creating R package to mimic Anova table. I have created all the necessary function that is mandated in the assignment. The function calculates the correct values, but I couldn't make it display like ANOVA table that R's built in anova() function can. This is my summary.oneway function
summary.oneway <- function(object, ...){
#model <- oneway(object)
fval <- object$FValue
TAB <- list(t(object$AOV), "Mean Sq."= rbind(object$MSB, object$MSW),
"F Value" = fval, p.value = object$p.value)
res <- list(call=object$call, onewayAnova = TAB)
class(res) <- "summary.oneway"
res
}
This is the output:
Analysis of Variance:
oneway.formula(formula = coag ~ diet, data = coagdata)
[[1]]
Sum of Squares Deg. of Freedom
diet 228 3
Residual 112 20
$`Mean Sq.`
1
[1,] 76.0
[2,] 5.6
$`F Value`
1
13.57143
$p.value
1
4.658471e-05
Actual ANOVA output:
Analysis of Variance Table
Response: coag
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.571 4.658e-05 ***
Residuals 20 112 5.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How can I achieve this format? Where and what am I missing?
Thank you so much for your help.
Kuni
The Anova output uses the print method print.anova you may want to take look at methods(print) and specifically stats:::print.anova
You will most likely want to create your own print function
print.oneway <- function(object, ...) {
foo
bar
}