contrast of contrast with emmeans (second differences) - r

I am using emmeans to conduct a contrast of a contrast (i.e., testing for an interaction effect through 1st/2nd differences).
It involves 3 steps:
estimate means using “emmeans”
estimate if there is a difference in means (1st difference) using “pairs”
estimate if there is a difference in the difference (2nd difference) using ????
While I can execute steps 1 and 2 (see reprex below with fictions data), i’m stuck on step 3. Tips?
(the contrast of a contrast shown in the vignette here is for alternative functional forms, which is somewhat different than what I want to test)
suppressPackageStartupMessages({
library(emmeans)})
# create ex. data set. 1 row per respondent (dataset shows 2 resp).
cedata.1 <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2),
QES = c(1,1,2,2,3,3,1,1,2,2,3,3), # Choice set
Alt = c(1,2,1,2,1,2,1,2,1,2,1,2), # Alt 1 or Alt 2 in choice set
Choice = c(0,1,1,0,1,0,0,1,0,1,0,1), # Dep variable. if Chosen (1) or not (0)
LOC = c(0,0,1,1,0,1,0,1,1,0,0,1), # Indep variable per Choice set, binary categorical
SIZE = c(1,1,1,0,0,1,0,0,1,1,0,1), # Indep variable per Choice set, binary categorical
gender = c(1,1,1,1,1,1,0,0,0,0,0,0) # Indep variable per indvidual, binary categorical
)
# estimate model
glm.model <- glm(Choice ~ LOC*SIZE, data=cedata.1, family = binomial(link = "logit"))
# estimate means (i.e., values used to calc 1st diff).
comp1.loc.size <- emmeans(glm.model, ~ LOC * SIZE)
# calculate 1st diff (and p value)
pairs(comp1.loc.size, simple = "SIZE") # gives result I want
#> LOC = 0:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 -1.39 1.73 Inf -0.800 0.4235
#>
#> LOC = 1:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 0.00 1.73 Inf 0.000 1.0000
#>
#> Results are given on the log odds ratio (not the response) scale.
# calculate 2nd diff (and p value)
# ** the following gives the relevant values for doing the 2nd diff comparison (i.e., -1.39 and 0.00)...but how to make the statistical comparison?
pairs(comp1.loc.size, simple = "SIZE")
#> LOC = 0:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 -1.39 1.73 Inf -0.800 0.4235
#>
#> LOC = 1:
#> contrast estimate SE df z.ratio p.value
#> 0 - 1 0.00 1.73 Inf 0.000 1.0000
#>
#> Results are given on the log odds ratio (not the response) scale.

pairs(pairs(comp1.loc.size, simple = "SIZE"), by = NULL)

Another solution:
# estimate means (i.e., values used to calc 1st diff).
comp1.loc.size <- emmeans(glm.model, ~ LOC | SIZE)
# second difference:
pairs(pairs(emmeans::regrid(comp1.loc.size)), by = NULL)
PS: This solution is almost a copy of the solution here: Testing contrast of contrast (first/second difference) in outcome

Related

SLR of transformed data in R

For Y = % of population with income below poverty level and X = per capita income of population, I have constructed a box-cox plot and found that the lambda = 0.02020:
bc <- boxcox(lm(Percent_below_poverty_level ~ Per_capita_income, data=tidy.CDI), plotit=T)
bc$x[which.max(bc$y)] # gives lambda
Now I want to fit a simple linear regression using the transformed data, so I've entered this code
transform <- lm((Percent_below_poverty_level**0.02020) ~ (Per_capita_income**0.02020))
transform
But all I get is the error message
'Error in terms.formula(formula, data = data) : invalid power in formula'. What is my mistake?
You could use bcPower() from the car package.
## make sure you do install.packages("car") if you haven't already
library(car)
data(Prestige)
p <- powerTransform(prestige ~ income + education + type ,
data=Prestige,
family="bcPower")
summary(p)
# bcPower Transformation to Normality
# Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
# Y1 1.3052 1 0.9408 1.6696
#
# Likelihood ratio test that transformation parameter is equal to 0
# (log transformation)
# LRT df pval
# LR test, lambda = (0) 41.67724 1 1.0765e-10
#
# Likelihood ratio test that no transformation is needed
# LRT df pval
# LR test, lambda = (1) 2.623915 1 0.10526
mod <- lm(bcPower(prestige, 1.3052) ~ income + education + type, data=Prestige)
summary(mod)
#
# Call:
# lm(formula = bcPower(prestige, 1.3052) ~ income + education +
# type, data = Prestige)
#
# Residuals:
# Min 1Q Median 3Q Max
# -44.843 -13.102 0.287 15.073 62.889
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -3.736e+01 1.639e+01 -2.279 0.0250 *
# income 3.363e-03 6.928e-04 4.854 4.87e-06 ***
# education 1.205e+01 2.009e+00 5.999 3.78e-08 ***
# typeprof 2.027e+01 1.213e+01 1.672 0.0979 .
# typewc -1.078e+01 7.884e+00 -1.368 0.1746
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 22.25 on 93 degrees of freedom
# (4 observations deleted due to missingness)
# Multiple R-squared: 0.8492, Adjusted R-squared: 0.8427
# F-statistic: 131 on 4 and 93 DF, p-value: < 2.2e-16
Powers (more often represented by ^ than ** in R, FWIW) have a special meaning inside formulas [they represent interactions among variables rather than mathematical operations]. So if you did want to power-transform both sides of your equation you would use the I() or "as-is" operator:
I(Percent_below_poverty_level^0.02020) ~ I(Per_capita_income^0.02020)
However, I think you should do what #DaveArmstrong suggested anyway:
it's only the predictor variable that gets transformed
the Box-Cox transformation is actually (y^lambda-1)/lambda (although the shift and scale might not matter for your results)

How can I calculate the standard error of the poisson.test in R?

I have such a dataset,
ID Freq.x Freq.y
1 1 8
2 5 3
...
I calculated the ratio between two rate parameters of Freq.x and Freq.y by using R-programming language poisson.test function, but I want to calculate the standard error. How can I do that?
You don't have any reproducible data in your question, so let's make some:
set.seed(69)
x <- rpois(100, lambda = 7)
y <- rpois(100, lambda = 8)
You can get the standard error for each of these two variables like this:
se_x <- sqrt(mean(x)) / length(x)
se_y <- sqrt(mean(y)) / length(y)
se_x
#> [1] 0.02638181
se_y
#> [1] 0.02840775
and you can compare the two to determine if the underlying rate is significantly different like this:
poisson.test(c(sum(x), sum(y)))
#>
#> Comparison of Poisson rates
#>
#> data: c(sum(x), sum(y)) time base: 1
#> count1 = 696, expected count1 = 751.5, p-value = 0.004533
#> alternative hypothesis: true rate ratio is not equal to 1
#> 95 percent confidence interval:
#> 0.7781748 0.9556714
#> sample estimates:
#> rate ratio
#> 0.8624535
It's not clear what you mean by the standard error of the poisson.test though.

How to convert fitdistrplus::fitdist summary into tidy format?

I have the following code:
x <- c(
0.367141764080875, 0.250037975705769, 0.167204185003365, 0.299794433447383,
0.366885973041269, 0.300453205296379, 0.333686861081341, 0.33301168850398,
0.400142004893329, 0.399433677388411, 0.366077304765104, 0.166402979455671,
0.466624230750293, 0.433499934139897, 0.300017278751768, 0.333673696762895,
0.29973685692478
)
fn <- fitdistrplus::fitdist(x,"norm")
summary(fn)
#> Fitting of the distribution ' norm ' by maximum likelihood
#> Parameters :
#> estimate Std. Error
#> mean 0.32846024 0.01918923
#> sd 0.07911922 0.01355908
#> Loglikelihood: 19.00364 AIC: -34.00727 BIC: -32.34084
#> Correlation matrix:
#> mean sd
#> mean 1 0
#> sd 0 1
Basically, it takes a vector and tried to fit the distribution
using fitdistrplus package.
I tried looking at the broom package, but it doesn't have
a function that covers that.
When you call broom::tidy(fn) you receive an error that says:
Error: No tidy method for objects of class fitdist
This is because this function from broom only has a finite number objects that are "good to use", see methods(tidy) for the complete list. (Read more about S3 methods in R. More here).
So the function doesn't work for an object fitdist but works for a fitdistr object from MASS (more "famous").
We can then assign to fn that class, and then use broom:
class(fn) <- ("fitdist", "fitdistr")
# notice that I've kept the original class and added the other
# you shouldn't overwrite classes. ie: don't to this: class(fn) <- "fitdistr"
broom::tidy(fn)
# # A tibble: 2 x 3
# term estimate std.error
# <chr> <dbl> <dbl>
# 1 mean 0.328 0.0192
# 2 sd 0.0791 0.0136
Note that you can only see the parameters. If you wish to see more and organize everything as "tidy", you should tell us more about your expected output.
broom::tidy() gets you this far, if you want more I'd start by defining my own method function that works for a class fitdist object using as reference the tidy.fitdistr method, and adapting it.
Example of how I'd adapt from the original broom::tidy() code, using the S3 method for the class fitdist.
Define your own method (similar to how you define your own function):
# necessary libraries
library(dplyr)
library(broom)
# method definition:
tidy.fitdist <- function(x, ...) { # notice the use of .fitdist
# you decide what you want to keep from summary(fn)
# use fn$ecc... to see what you can harvest
e1 <- tibble(
term = names(x$estimate),
estimate = unname(x$estimate),
std.error = unname(x$sd)
)
e2 <- tibble(
term = c("loglik", "aic", "bic"),
value = c(unname(x$loglik), unname(x$aic), unname(x$bic))
)
e3 <- x$cor # I prefer this to: as_tibble(x$cor)
list(e1, e2, e3) # you can name each element for a nicer result
# example: list(params = e1, scores = e2, corrMatr = e3)
}
This is how you can call this new method now:
tidy(fn) # to be more clear this is calling your tidy.fitdist(fn) under the hood.
# [[1]]
# # A tibble: 2 x 3
# term estimate std.error
# <chr> <dbl> <dbl>
# 1 mean 0.328 0.0192
# 2 sd 0.0791 0.0136
#
# [[2]]
# # A tibble: 3 x 2
# term value
# <chr> <dbl>
# 1 loglik 19.0
# 2 aic -34.0
# 3 bic -32.3
#
# [[3]]
# mean sd
# mean 1 0
# sd 0 1
Notice that the class is:
class(fn)
[1] "fitdist"
So now you don't actually need to assign the fitdistr (from MASS) class as before.
Not sure exactly what you need, but you can try:
tidy_fn <- rbind(fn$estimate,fn$sd)
https://stats.stackexchange.com/questions/23539/use-fitdist-parameters-in-variables

Stargazer pulls apart variables when observations dropped

I use stargazer to create a table for multiple models. They are actually the same model but the first is based on all observations, while the other drop different observations respectively. All variables are named the same, so what surprises me is that when I export the table to Latex, two lines, one for a dummy variable and another for an interaction term, are duplicated.
What is really strange is that I cannot replicate the results, but I will post a minimal working example nonetheless. Perhaps you can help me based on my description alone.
This is the code for my MWE:
library(tibble)
library(stargazer)
df <- as_tibble(data.frame(first = rnorm(100, 50), second = rnorm(100, 30), third = rnorm(100, 100), fourth = c(rep(0, 50), rep(1, 50))))
model.1 <- lm(first ~ second + third + fourth + third*fourth, data = df)
model.2 <- lm(first ~ second + third + fourth + third*fourth, data = df[!rownames(df) %in% "99",])
stargazer(model.1, model.2)
I will now post the Latex output includes the error that I am trying to fix (with this snippet it seems to work just fine).
What I would like to have, of course is the code as produced by this snippet (I feel very stupid for not being able to reproduce it):
you could take a look at the names of your model's coefficients using coefficients(). Mare sure they are identical, i.e. identical(names(model.1), names(model.2)) Then use stargazer's keep statement to make sure you get the coefficients you want,
Here with the example above keeping selected variables;
coefficients(model.1)
#> (Intercept) second third fourth third:fourth
#> 57.27352606 0.02674072 -0.08236250 20.23596216 -0.20288137
coefficients(model.2)
#> (Intercept) second third fourth third:fourth
#> 57.06149556 0.03305134 -0.08214812 20.85087288 -0.20885718
identical(names(model.1), names(model.2))
#> [1] TRUE
I'm using the type = "text" to make it more friendly to SO, but I guess it's the same with LaTeX,
stargazer(model.1, model.2, type = "text", keep=c("third","third:fourth"))
#>
#> =========================================================
#> Dependent variable:
#> -------------------------------------
#> first
#> (1) (2)
#> ---------------------------------------------------------
#> third -0.082 -0.082
#> (0.166) (0.167)
#>
#> third:fourth -0.203 -0.209
#> (0.222) (0.223)
#>
#> ---------------------------------------------------------
#> Observations 100 99
#> R2 0.043 0.044
#> Adjusted R2 0.002 0.004
#> Residual Std. Error 1.044 (df = 95) 1.047 (df = 94)
#> F Statistic 1.056 (df = 4; 95) 1.089 (df = 4; 94)
#> =========================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
but it might be hard to rule out that it's a local issue if we cannot find a way to reproduce your issue.

How to extract the confidence limits of LSMEANS?

I am using the oranges data provided with lsmeans.
library(lsmeans)
oranges.rg1<-lm(sales1 ~ price1 + price2 + day + store, data = oranges)
days.lsm <- lsmeans(oranges.rg1, "day")
days_contr.lsm <- contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))
The confidence intervals can be visualized by ploting plot(contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))), but they are not showed at days_contr.lsm
> days_contr.lsm
contrast estimate SE df t.ratio p.value
1 - avg(5,6) -7.8538769 2.194243 23 -3.579 0.0058
2 - avg(5,6) -6.9234858 2.127341 23 -3.255 0.0125
3 - avg(5,6) 0.2462789 2.155529 23 0.114 0.9979
4 - avg(5,6) -4.6760034 2.110761 23 -2.215 0.1184
How can I extract the confidence intervals to a data.frame?
> days_contr.lsm
contrast estimate SE df t.ratio p.value lower.CL upper.CL
1 - avg(5,6) -7.8538769 2.194243 23 -3.579 0.0058 ? ?
2 - avg(5,6) -6.9234858 2.127341 23 -3.255 0.0125 ? ?
3 - avg(5,6) 0.2462789 2.155529 23 0.114 0.9979 ? ?
4 - avg(5,6) -4.6760034 2.110761 23 -2.215 0.1184 ? ?
confint(contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))) worked fine
At risk of beating a dead horse, I feel that the main point of the question is getting the confidence intervals, given that what is seen in days_contr.lsm is only the t ratios and P values.
This happened because the default method for summarizing contrast() results is to show tests and not CIs, whereas the default method for summarizing emmeans() results is to show CIs and not tests. The infer argument of summary.emmGrid() controls what you see. Thus, you can get both CIs and tests using
summary(days_contr.lsm, infer = c(TRUE, TRUE))
and this would fill-in the question marks in the OP. The summary() result, by the way, is of class c("summary_emm", "data.frame"); it is a data.frame with a special print method that often shows some additional annotations.
There are additional emmGrid methods confint() and test() that run summary() with infer = c(TRUE, FALSE) and infer = c(FALSE, TRUE) respectively (though both have additional capabilities). The as.data.frame() method is just as.data.frame(summary(...)). For details, see tge help page for emmeans::summary.emmGrid.

Resources