remove nonEst row(s) in emmeans result - r

I have unbalanced design so when I apply emmeans to my model at specific levels, the absent nested factor (which is present in other levels) is marked as nonEst in my output table. How do I change my code so that the table below shows the three estimable rows only?
emmeans(model, specs = ~ Rot/Crop | Herb, at = list(Rot = "3", Herb="conv"))
Herb = conv:
Rot Crop emmean SE df lower.CL upper.CL
3 alfalfa nonEst NA NA NA NA
3 corn 3.50 0.283 270 2.94 4.06
3 oat 3.44 0.283 270 2.88 3.99
3 soybean 2.65 0.253 270 2.15 3.15
Confidence level used: 0.95

An option is to tidy it with broom and then remove the NA rows with na.omit
library(emmeans)
library(broom)
library(dplyr)
emmeans(model, specs = ~ Rot/Crop | Herb, at = list(Rot = "3", Herb="conv")) %>%
tidy %>%
na.omit
Or with as.data.frame/subset
subset(as.data.frame( emmeans(model, specs = ~ Rot/Crop | Herb,
at = list(Rot = "3", Herb="conv"))), !is.na(emmean))
Using a reproducible example
warp.lm <- lm(breaks ~ wool * tension, data = head(warpbreaks, 30))
emmeans (warp.lm, ~ wool | tension)
#tension = L:
# wool emmean SE df lower.CL upper.CL
# A 44.6 4.24 26 35.85 53.3
# B 23.3 7.34 26 8.26 38.4
#tension = M:
# wool emmean SE df lower.CL upper.CL
# A 24.0 4.24 26 15.29 32.7
# B nonEst NA NA NA NA
#tension = H:
# wool emmean SE df lower.CL upper.CL
# A 24.6 4.24 26 15.85 33.3
# B nonEst NA NA NA NA
emmeans (warp.lm, ~ wool | tension) %>%
tidy %>%
na.omit
# A tibble: 4 x 7
# wool tension estimate std.error df statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A L 44.6 4.24 26 10.5 7.29e-11
#2 B L 23.3 7.34 26 3.18 3.78e- 3
#3 A M 24.0 4.24 26 5.67 5.84e- 6
#4 A H 24.6 4.24 26 5.80 4.15e- 6
Or in base R, coerce it to data.frame and then subset the non-NA rows
subset(as.data.frame(emmeans (warp.lm, ~ wool | tension)), !is.na(emmean))
# wool tension emmean SE df lower.CL upper.CL
#1 A L 44.55556 4.235135 26 35.850110 53.26100
#2 B L 23.33333 7.335470 26 8.255059 38.41161
#3 A M 24.00000 4.235135 26 15.294554 32.70545
#5 A H 24.55556 4.235135 26 15.850110 33.26100

Related

Convert a list from data frame (emmGrid class)

I would like to convert a list to dataframe (picture as below)
I did use do.call(rbind.data.frame, contrast), however, I got this Error in xi[[j]] : this S4 class is not subsettable. I still can read them separately. Anyone know about this thing?
This list I got when running the ART anova test by using the package ARTool
Update
This my orignial code to calculate and get the model done.
Organism_df_posthoc <- bird_metrics_long_new %>%
rbind(plant_metrics_long_new) %>%
mutate(Type = factor(Type, levels = c("Forest", "Jungle rubber", "Rubber", "Oil palm"))) %>%
mutate(Category = factor(Category)) %>%
group_by(Category) %>%
mutate_at(c("PD"), ~(scale(.) %>% as.vector())) %>%
ungroup() %>%
nest_by(n1) %>%
mutate(fit = list(art.con(art(PD ~ Category + Type + Category:Type, data = data),
"Category:Type",adjust = "tukey", interaction = T)))
And the output of fit is that I showed already.
With rbind, instead of rbind.data.frame, there is a specific method for 'emmGrid' object and it can directly use the correct method by matching the class if we specify just rbind
do.call(rbind, contrast)
-output
wool tension emmean SE df lower.CL upper.CL
A L 44.6 3.65 48 33.6 55.5
A M 24.0 3.65 48 13.0 35.0
A H 24.6 3.65 48 13.6 35.5
B L 28.2 3.65 48 17.2 39.2
B M 28.8 3.65 48 17.8 39.8
B H 18.8 3.65 48 7.8 29.8
A L 44.6 3.65 48 33.6 55.5
A M 24.0 3.65 48 13.0 35.0
A H 24.6 3.65 48 13.6 35.5
B L 28.2 3.65 48 17.2 39.2
B M 28.8 3.65 48 17.8 39.8
B H 18.8 3.65 48 7.8 29.8
Confidence level used: 0.95
Conf-level adjustment: bonferroni method for 12 estimates
The reason is that there is a specific method for rbind when we load the emmeans
> methods('rbind')
[1] rbind.data.frame rbind.data.table* rbind.emm_list* rbind.emmGrid* rbind.grouped_df* rbind.zoo*
The structure in the example created matches the OP's structure showed
By using rbind.data.frame, it doesn't match because the class is already emmGrid
data
library(multcomp)
library(emmeans)
warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)
contrast <- list(warp.emmGrid, warp.emmGrid)
If the OP used 'ARTool' and if the columns are different, the above solution may not work because rbind requires all objects to have the same column names. We could convert to tibble by looping over the list with map (from purrr) and bind them
library(ARTool)
library(purrr)
library(tibble)
map_dfr(contrast, as_tibble)
-output
# A tibble: 42 × 8
contrast estimate SE df t.ratio p.value Moisture_pairwise Fertilizer_pairwise
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 m1 - m2 -23.1 4.12 8.00 -5.61 0.00226 NA NA
2 m1 - m3 -33.8 4.12 8.00 -8.20 0.000169 NA NA
3 m1 - m4 -15.2 4.12 8.00 -3.68 0.0256 NA NA
4 m2 - m3 -10.7 4.12 8 -2.59 0.118 NA NA
5 m2 - m4 7.92 4.12 8 1.92 0.291 NA NA
6 m3 - m4 18.6 4.12 8 4.51 0.00849 NA NA
7 NA 6.83 10.9 24 0.625 0.538 m1 - m2 f1 - f2
8 NA 15.3 10.9 24 1.40 0.174 m1 - m3 f1 - f2
9 NA -5.83 10.9 24 -0.533 0.599 m1 - m4 f1 - f2
10 NA 8.50 10.9 24 0.777 0.445 m2 - m3 f1 - f2
# … with 32 more rows
data
data(Higgins1990Table5, package = "ARTool")
m <- art(DryMatter ~ Moisture*Fertilizer + (1|Tray), data=Higgins1990Table5)
a1 <- art.con(m, ~ Moisture)
a2 <- art.con(m, "Moisture:Fertilizer", interaction = TRUE)
contrast <- list(a1, a2)

Looping linear regression output in a data frame in r

I have a dataset below in which I want to do linear regression for each country and state and then cbind the predicted values in the dataset:
Final data frame after adding three more columns:
I have done it for one country and one area but want to do it for each country and area and put the predicted, upper and lower limit values back in the data set by cbind:
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))
data_1 <- data[(data$country=="US" & data$Area=="G"),]
model <- lm(amount ~ week, data = data_1)
pre <- predict(model,newdata = data_1,interval = "prediction",level = 0.95)
pre
How can I loop this for other combination of country and Area?
...and a Base R solution:
data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))
splitVar <- paste0(data$country,"-",data$Area)
dfList <- split(data,splitVar)
result <- do.call(rbind,lapply(dfList,function(x){
model <- lm(amount ~ week, data = x)
cbind(x,predict(model,newdata = x,interval = "prediction",level = 0.95))
}))
result
...the results:
country Area week amount fit lwr upr
UK-A.11 UK A 1 45 36.6 -6.0463638 79.24636
UK-A.12 UK A 2 34 37.1 -1.3409128 75.54091
UK-A.13 UK A 3 23 37.6 0.6671656 74.53283
UK-A.14 UK A 4 43 38.1 -0.3409128 76.54091
UK-A.15 UK A 5 43 38.6 -4.0463638 81.24636
US-G.1 US G 1 12 20.8 -27.6791493 69.27915
US-G.2 US G 2 23 21.7 -21.9985147 65.39851
US-G.3 US G 3 34 22.6 -19.3841749 64.58417
US-G.4 US G 4 32 23.5 -20.1985147 67.19851
US-G.5 US G 5 12 24.4 -24.0791493 72.87915
US-I.6 US I 1 12 20.8 -33.8985900 75.49859
US-I.7 US I 2 34 30.5 -18.8046427 79.80464
US-I.8 US I 3 45 40.2 -7.1703685 87.57037
US-I.9 US I 4 65 49.9 0.5953573 99.20464
US-I.10 US I 5 45 59.6 4.9014100 114.29859
We can also use function augment from package broom to get your desired information:
library(purrr)
library(broom)
data %>%
group_by(country, Area) %>%
nest() %>%
mutate(models = map(data, ~ lm(amount ~ week, data = .)),
aug = map(models, ~ augment(.x, interval = "prediction"))) %>%
unnest(aug) %>%
select(country, Area, amount, week, .fitted, .lower, .upper)
# A tibble: 15 x 7
# Groups: country, Area [3]
country Area amount week .fitted .lower .upper
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 US G 12 1 20.8 -27.7 69.3
2 US G 23 2 21.7 -22.0 65.4
3 US G 34 3 22.6 -19.4 64.6
4 US G 32 4 23.5 -20.2 67.2
5 US G 12 5 24.4 -24.1 72.9
6 US I 12 1 20.8 -33.9 75.5
7 US I 34 2 30.5 -18.8 79.8
8 US I 45 3 40.2 -7.17 87.6
9 US I 65 4 49.9 0.595 99.2
10 US I 45 5 59.6 4.90 114.
11 UK A 45 1 36.6 -6.05 79.2
12 UK A 34 2 37.1 -1.34 75.5
13 UK A 23 3 37.6 0.667 74.5
14 UK A 43 4 38.1 -0.341 76.5
15 UK A 43 5 38.6 -4.05 81.2
Here is a tidyverse way to do this for every combination of country and Area.
library(tidyverse)
data %>%
group_by(country, Area) %>%
nest() %>%
mutate(model = map(data, ~ lm(amount ~ week, data = .x)),
result = map2(model, data, ~data.frame(predict(.x, newdata = .y,
interval = "prediction",level = 0.95)))) %>%
ungroup %>%
select(-model) %>%
unnest(c(data, result))
# country Area week amount fit lwr upr
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 US G 1 12 20.8 -27.7 69.3
# 2 US G 2 23 21.7 -22.0 65.4
# 3 US G 3 34 22.6 -19.4 64.6
# 4 US G 4 32 23.5 -20.2 67.2
# 5 US G 5 12 24.4 -24.1 72.9
# 6 US I 1 12 20.8 -33.9 75.5
# 7 US I 2 34 30.5 -18.8 79.8
# 8 US I 3 45 40.2 -7.17 87.6
# 9 US I 4 65 49.9 0.595 99.2
#10 US I 5 45 59.6 4.90 114.
#11 UK A 1 45 36.6 -6.05 79.2
#12 UK A 2 34 37.1 -1.34 75.5
#13 UK A 3 23 37.6 0.667 74.5
#14 UK A 4 43 38.1 -0.341 76.5
#15 UK A 5 43 38.6 -4.05 81.2
And one more:
library(tidyverse)
data %>%
mutate(CountryArea=paste0(country,Area) %>% factor %>% fct_inorder) %>%
split(.$CountryArea) %>%
map(~lm(amount~week, data=.)) %>%
map(predict, interval = "prediction",level = 0.95) %>%
reduce(rbind) %>%
cbind(data, .)
country Area week amount fit lwr upr
1 US G 1 12 20.8 -27.6791493 69.27915
2 US G 2 23 21.7 -21.9985147 65.39851
3 US G 3 34 22.6 -19.3841749 64.58417
4 US G 4 32 23.5 -20.1985147 67.19851
5 US G 5 12 24.4 -24.0791493 72.87915
6 US I 1 12 20.8 -33.8985900 75.49859
7 US I 2 34 30.5 -18.8046427 79.80464
8 US I 3 45 40.2 -7.1703685 87.57037
9 US I 4 65 49.9 0.5953573 99.20464
10 US I 5 45 59.6 4.9014100 114.29859
11 UK A 1 45 36.6 -6.0463638 79.24636
12 UK A 2 34 37.1 -1.3409128 75.54091
13 UK A 3 23 37.6 0.6671656 74.53283
14 UK A 4 43 38.1 -0.3409128 76.54091
15 UK A 5 43 38.6 -4.0463638 81.24636

why error: arrange() failed at implicit mutate() step

The following code was executed:
tb <- tibble(
year <- rep(2001:2020,10)
)
tb %<>% arrange(year) %>%
mutate(
id <- rep(1:10,20),
r1 <- rnorm(200,0,1),
r2 <- rnorm(200,1,1),
r3 <- rnorm(200,2,1)
)
Then the error message popped up:
Error: arrange() failed at implicit mutate() step.
x Could not create a temporary column for ..1.
ℹ ..1 is year.
Can anyone shed light on what the reason is?
Try this. It looks like a variable assignation issue. Try replacing <- by = and %<>% by %>%. Here a possible solution:
#Data
tb <- tibble(
year = rep(2001:2020,10)
)
#Code
tb %>% arrange(year) %>%
mutate(
id = rep(1:10,20),
r1 = rnorm(200,0,1),
r2 = rnorm(200,1,1),
r3 = rnorm(200,2,1)
)
Output:
# A tibble: 200 x 5
year id r1 r2 r3
<int> <int> <dbl> <dbl> <dbl>
1 2001 1 1.10 1.62 2.92
2 2001 2 0.144 1.18 1.08
3 2001 3 -0.118 2.32 3.15
4 2001 4 -0.912 0.701 1.36
5 2001 5 -1.44 -0.648 1.11
6 2001 6 -0.797 1.95 -0.333
7 2001 7 1.25 -0.113 1.85
8 2001 8 0.772 1.62 2.32
9 2001 9 -0.220 1.51 1.29
10 2001 10 -0.425 1.37 3.24
# ... with 190 more rows

regression by group and retain all the columns in R

I am doing a linear regression by group and want to extract the residuals of the regression
library(dplyr)
set.seed(124)
dat <- data.frame(ID = sample(111:503, 18576, replace = T),
ID2 = sample(11:50, 18576, replace = T),
ID3 = sample(1:14, 18576, replace = T),
yearRef = sample(1998:2014, 18576, replace = T),
value = rnorm(18576))
resid <- dat %>% dplyr::group_by(ID3) %>%
do(augment(lm(value ~ yearRef, data=.))) %>% ungroup()
How do I retain the ID, ID2 as well in the resid. At the moment, it only retains the ID3 in the final data frame
Use group_split then loop through each group using map_dfr to bind ID, ID2 and augment output using bind_cols
library(dplyr)
library(purrr)
dat %>% group_split(ID3) %>%
map_dfr(~bind_cols(select(.x,ID,ID2), augment(lm(value~yearRef, data=.x))), .id = "ID3")
# A tibble: 18,576 x 12
ID3 ID ID2 value yearRef .fitted .se.fit .resid .hat .sigma .cooksd
<chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 196 16 -0.385 2009 -0.0406 0.0308 -0.344 1.00e-3 0.973 6.27e-5
2 1 372 47 -0.793 2012 -0.0676 0.0414 -0.726 1.81e-3 0.973 5.05e-4
3 1 470 15 -0.496 2011 -0.0586 0.0374 -0.438 1.48e-3 0.973 1.50e-4
4 1 242 40 -1.13 2010 -0.0496 0.0338 -1.08 1.21e-3 0.973 7.54e-4
5 1 471 34 1.28 2006 -0.0135 0.0262 1.29 7.26e-4 0.972 6.39e-4
6 1 434 35 -1.09 1998 0.0586 0.0496 -1.15 2.61e-3 0.973 1.82e-3
7 1 467 45 -0.0663 2011 -0.0586 0.0374 -0.00769 1.48e-3 0.973 4.64e-8
8 1 334 27 -1.37 2003 0.0135 0.0305 -1.38 9.86e-4 0.972 9.92e-4
9 1 186 25 -0.0195 2003 0.0135 0.0305 -0.0331 9.86e-4 0.973 5.71e-7
10 1 114 34 1.09 2014 -0.0857 0.0500 1.18 2.64e-3 0.973 1.94e-3
# ... with 18,566 more rows, and 1 more variable: .std.resid <dbl>
Taking the "many models" approach, you can nest the data on ID3 and use purrr::map to create a list-column of the broom::augment data frames. The data list-column has all the original columns aside from ID3; map into that and select just the ones you want. Here I'm assuming you want to keep any column that starts with "ID", but you can change this. Then unnest both the data and the augment data frames.
library(dplyr)
library(tidyr)
dat %>%
group_by(ID3) %>%
nest() %>%
mutate(aug = purrr::map(data, ~broom::augment(lm(value ~ yearRef, data = .))),
data = purrr::map(data, select, starts_with("ID"))) %>%
unnest(c(data, aug))
#> # A tibble: 18,576 x 12
#> # Groups: ID3 [14]
#> ID3 ID ID2 value yearRef .fitted .se.fit .resid .hat .sigma
#> <int> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 11 431 15 0.619 2002 0.0326 0.0346 0.586 1.21e-3 0.995
#> 2 11 500 21 -0.432 2000 0.0299 0.0424 -0.462 1.82e-3 0.995
#> 3 11 392 28 -0.246 1998 0.0273 0.0515 -0.273 2.67e-3 0.995
#> 4 11 292 40 -0.425 1998 0.0273 0.0515 -0.452 2.67e-3 0.995
#> 5 11 175 36 -0.258 1999 0.0286 0.0468 -0.287 2.22e-3 0.995
#> 6 11 419 23 3.13 2005 0.0365 0.0273 3.09 7.54e-4 0.992
#> 7 11 329 17 -0.0414 2007 0.0391 0.0274 -0.0806 7.57e-4 0.995
#> 8 11 284 23 -0.450 2006 0.0378 0.0268 -0.488 7.25e-4 0.995
#> 9 11 136 28 -0.129 2006 0.0378 0.0268 -0.167 7.25e-4 0.995
#> 10 11 118 17 -1.55 2013 0.0470 0.0470 -1.60 2.24e-3 0.995
#> # … with 18,566 more rows, and 2 more variables: .cooksd <dbl>,
#> # .std.resid <dbl>

Box-Cox back transformation with emmeans not working properly

I am doing a linear regression with data that needs transformation, for it, I am using a Box-Cox power transformation, followed by back-transformation to write a report using the original scale. I've been trying to do this with the emmeans packages, and I followed the steps described in the emmeans package vignette, however, I find that the summary results for the estimated means are not at all similar to the untransformed data. In fact, the output is not transformed at all.
Here is a reproducible example using the examples from the emmeans package:
require(emmeans)
# Fit a model using an oddball transformation:
bctran <- make.tran("boxcox", 0.368)
warp.bc <- with(bctran,
lm(linkfun(breaks) ~ wool * tension, data = warpbreaks))
# Obtain back-transformed LS means:
emmeans(warp.bc, ~ tension | wool, type = "response")
# Fit a model without transformation:
warp <- lm(breaks ~ wool * tension, data = warpbreaks)
# Obtain LS means:
emmeans(warp, ~ tension | wool)
which returns:
> emmeans(warp.bc, ~ tension | wool, type = "response")
wool = A:
tension emmean SE df lower.CL upper.CL
L 8.07 0.419 48 7.23 8.92
M 5.91 0.419 48 5.07 6.75
H 5.94 0.419 48 5.10 6.79
wool = B:
tension emmean SE df lower.CL upper.CL
L 6.45 0.419 48 5.61 7.29
M 6.53 0.419 48 5.69 7.37
H 5.22 0.419 48 4.38 6.07
Confidence level used: 0.95
> emmeans(warp, ~ tension | wool)
wool = A:
tension emmean SE df lower.CL upper.CL
L 44.6 3.65 48 37.2 51.9
M 24.0 3.65 48 16.7 31.3
H 24.6 3.65 48 17.2 31.9
wool = B:
tension emmean SE df lower.CL upper.CL
L 28.2 3.65 48 20.9 35.6
M 28.8 3.65 48 21.4 36.1
H 18.8 3.65 48 11.4 26.1
Confidence level used: 0.95
when in fact the estimated mean for tension:L should be 42.37, as calculated using the formula:
> origin + (1 + param * pmax(eta))^(1/param)
> 0 + (1 + 0.368 * pmax(8.07))^(1/0.368)
[1] 42.37179
Is there something I am missing or not understanding properly?
Hmmmm. I reproduced this problem. I'm not sure what's wrong, but so far I can tell that bctran itself is in order:
> emm = as.data.frame(emmeans(warp.bc, ~tension|wool))
> emm
tension wool emmean SE df lower.CL upper.CL
1 L A 8.074761 0.4192815 48 7.231739 8.917783
2 M A 5.911710 0.4192815 48 5.068688 6.754732
3 H A 5.942335 0.4192815 48 5.099313 6.785357
4 L B 6.449869 0.4192815 48 5.606847 7.292891
5 M B 6.531085 0.4192815 48 5.688063 7.374107
6 H B 5.224939 0.4192815 48 4.381917 6.067961
> bctran$linkinv(emm$emmean)
[1] 42.42263 23.10060 23.32407 27.22827 27.88877 18.43951
So these back-transformed EMMs are in-order. I'll trace the code and see why the results aren't back-transformed.
Update
I found a logic error from a revision a few months ago whereby if a transformation is character (e.g., "log") it works fine, but if it is a list (e.g., your bctran) it is ignored.
I fixed that error in the next version to push to the github site (version >= 1.3.3.0999902), and the fix will be in the next CRAN update (version > 1.3.3).
> emmeans(warp.bc, ~ tension | wool)
wool = A:
tension emmean SE df lower.CL upper.CL
L 8.07 0.419 48 7.23 8.92
M 5.91 0.419 48 5.07 6.75
H 5.94 0.419 48 5.10 6.79
wool = B:
tension emmean SE df lower.CL upper.CL
L 6.45 0.419 48 5.61 7.29
M 6.53 0.419 48 5.69 7.37
H 5.22 0.419 48 4.38 6.07
Results are given on the Box-Cox (lambda = 0.368) (not the response) scale.
Confidence level used: 0.95
> emmeans(warp.bc, ~ tension | wool, type = "response")
wool = A:
tension response SE df lower.CL upper.CL
L 42.4 4.48 48 34.0 52.0
M 23.1 3.05 48 17.5 29.8
H 23.3 3.07 48 17.7 30.0
wool = B:
tension response SE df lower.CL upper.CL
L 27.2 3.38 48 20.9 34.6
M 27.9 3.44 48 21.5 35.3
H 18.4 2.65 48 13.6 24.3
Confidence level used: 0.95
Intervals are back-transformed from the Box-Cox (lambda = 0.368) scale
Notice that even without back-transforming, there is an annotation of that fact. The fact that no annotation at all is present in your results was a tip-off.

Resources