Convert a list from data frame (emmGrid class) - r

I would like to convert a list to dataframe (picture as below)
I did use do.call(rbind.data.frame, contrast), however, I got this Error in xi[[j]] : this S4 class is not subsettable. I still can read them separately. Anyone know about this thing?
This list I got when running the ART anova test by using the package ARTool
Update
This my orignial code to calculate and get the model done.
Organism_df_posthoc <- bird_metrics_long_new %>%
rbind(plant_metrics_long_new) %>%
mutate(Type = factor(Type, levels = c("Forest", "Jungle rubber", "Rubber", "Oil palm"))) %>%
mutate(Category = factor(Category)) %>%
group_by(Category) %>%
mutate_at(c("PD"), ~(scale(.) %>% as.vector())) %>%
ungroup() %>%
nest_by(n1) %>%
mutate(fit = list(art.con(art(PD ~ Category + Type + Category:Type, data = data),
"Category:Type",adjust = "tukey", interaction = T)))
And the output of fit is that I showed already.

With rbind, instead of rbind.data.frame, there is a specific method for 'emmGrid' object and it can directly use the correct method by matching the class if we specify just rbind
do.call(rbind, contrast)
-output
wool tension emmean SE df lower.CL upper.CL
A L 44.6 3.65 48 33.6 55.5
A M 24.0 3.65 48 13.0 35.0
A H 24.6 3.65 48 13.6 35.5
B L 28.2 3.65 48 17.2 39.2
B M 28.8 3.65 48 17.8 39.8
B H 18.8 3.65 48 7.8 29.8
A L 44.6 3.65 48 33.6 55.5
A M 24.0 3.65 48 13.0 35.0
A H 24.6 3.65 48 13.6 35.5
B L 28.2 3.65 48 17.2 39.2
B M 28.8 3.65 48 17.8 39.8
B H 18.8 3.65 48 7.8 29.8
Confidence level used: 0.95
Conf-level adjustment: bonferroni method for 12 estimates
The reason is that there is a specific method for rbind when we load the emmeans
> methods('rbind')
[1] rbind.data.frame rbind.data.table* rbind.emm_list* rbind.emmGrid* rbind.grouped_df* rbind.zoo*
The structure in the example created matches the OP's structure showed
By using rbind.data.frame, it doesn't match because the class is already emmGrid
data
library(multcomp)
library(emmeans)
warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)
contrast <- list(warp.emmGrid, warp.emmGrid)
If the OP used 'ARTool' and if the columns are different, the above solution may not work because rbind requires all objects to have the same column names. We could convert to tibble by looping over the list with map (from purrr) and bind them
library(ARTool)
library(purrr)
library(tibble)
map_dfr(contrast, as_tibble)
-output
# A tibble: 42 × 8
contrast estimate SE df t.ratio p.value Moisture_pairwise Fertilizer_pairwise
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 m1 - m2 -23.1 4.12 8.00 -5.61 0.00226 NA NA
2 m1 - m3 -33.8 4.12 8.00 -8.20 0.000169 NA NA
3 m1 - m4 -15.2 4.12 8.00 -3.68 0.0256 NA NA
4 m2 - m3 -10.7 4.12 8 -2.59 0.118 NA NA
5 m2 - m4 7.92 4.12 8 1.92 0.291 NA NA
6 m3 - m4 18.6 4.12 8 4.51 0.00849 NA NA
7 NA 6.83 10.9 24 0.625 0.538 m1 - m2 f1 - f2
8 NA 15.3 10.9 24 1.40 0.174 m1 - m3 f1 - f2
9 NA -5.83 10.9 24 -0.533 0.599 m1 - m4 f1 - f2
10 NA 8.50 10.9 24 0.777 0.445 m2 - m3 f1 - f2
# … with 32 more rows
data
data(Higgins1990Table5, package = "ARTool")
m <- art(DryMatter ~ Moisture*Fertilizer + (1|Tray), data=Higgins1990Table5)
a1 <- art.con(m, ~ Moisture)
a2 <- art.con(m, "Moisture:Fertilizer", interaction = TRUE)
contrast <- list(a1, a2)

Related

remove nonEst row(s) in emmeans result

I have unbalanced design so when I apply emmeans to my model at specific levels, the absent nested factor (which is present in other levels) is marked as nonEst in my output table. How do I change my code so that the table below shows the three estimable rows only?
emmeans(model, specs = ~ Rot/Crop | Herb, at = list(Rot = "3", Herb="conv"))
Herb = conv:
Rot Crop emmean SE df lower.CL upper.CL
3 alfalfa nonEst NA NA NA NA
3 corn 3.50 0.283 270 2.94 4.06
3 oat 3.44 0.283 270 2.88 3.99
3 soybean 2.65 0.253 270 2.15 3.15
Confidence level used: 0.95
An option is to tidy it with broom and then remove the NA rows with na.omit
library(emmeans)
library(broom)
library(dplyr)
emmeans(model, specs = ~ Rot/Crop | Herb, at = list(Rot = "3", Herb="conv")) %>%
tidy %>%
na.omit
Or with as.data.frame/subset
subset(as.data.frame( emmeans(model, specs = ~ Rot/Crop | Herb,
at = list(Rot = "3", Herb="conv"))), !is.na(emmean))
Using a reproducible example
warp.lm <- lm(breaks ~ wool * tension, data = head(warpbreaks, 30))
emmeans (warp.lm, ~ wool | tension)
#tension = L:
# wool emmean SE df lower.CL upper.CL
# A 44.6 4.24 26 35.85 53.3
# B 23.3 7.34 26 8.26 38.4
#tension = M:
# wool emmean SE df lower.CL upper.CL
# A 24.0 4.24 26 15.29 32.7
# B nonEst NA NA NA NA
#tension = H:
# wool emmean SE df lower.CL upper.CL
# A 24.6 4.24 26 15.85 33.3
# B nonEst NA NA NA NA
emmeans (warp.lm, ~ wool | tension) %>%
tidy %>%
na.omit
# A tibble: 4 x 7
# wool tension estimate std.error df statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A L 44.6 4.24 26 10.5 7.29e-11
#2 B L 23.3 7.34 26 3.18 3.78e- 3
#3 A M 24.0 4.24 26 5.67 5.84e- 6
#4 A H 24.6 4.24 26 5.80 4.15e- 6
Or in base R, coerce it to data.frame and then subset the non-NA rows
subset(as.data.frame(emmeans (warp.lm, ~ wool | tension)), !is.na(emmean))
# wool tension emmean SE df lower.CL upper.CL
#1 A L 44.55556 4.235135 26 35.850110 53.26100
#2 B L 23.33333 7.335470 26 8.255059 38.41161
#3 A M 24.00000 4.235135 26 15.294554 32.70545
#5 A H 24.55556 4.235135 26 15.850110 33.26100

Time series forecasting by lm() using lapply

I was trying to forecast a time series problem using lm() and my data looks like below
Customer_key date sales
A35 2018-05-13 31
A35 2018-05-20 20
A35 2018-05-27 43
A35 2018-06-03 31
BH22 2018-05-13 60
BH22 2018-05-20 67
BH22 2018-05-27 78
BH22 2018-06-03 55
Converted my df to a list format by
df <- dcast(df, date ~ customer_key,value.var = c("sales"))
df <- subset(df, select = -c(dt))
demandWithKey <- as.list(df)
Trying to write a function such that applying this function across all customers
my_fun <- function(x) {
fit <- lm(ds_load ~ date, data=df) ## After changing to list ds_load and date column names
## are no longer available for formula
fit_b <- forecast(fit$fitted.values, h=20) ## forecast using lm()
return(data.frame(c(fit$fitted.values, fit_b[["mean"]])))
}
fcast <- lapply(df, my_fun)
I know the above function doesn't work, but basically I'm looking for getting both the fitted values and forecasted values for a grouped data.
But I've tried all other methods using tslm() (converting into time series data) and so on but no luck I can get the lm() work somehow on just one customer though. Also many questions/posts were on just fitting the model but I would like to forecast too at same time.
lm() is for a regression model
but here you have a time serie so for forecasting the serie you have to use one of the time serie model (ARMA ARCH GARCH...)
so you can use the function in r : auto.arima() in "forecast" package
I don't know what you're up to exactly, but you could make this less complicated.
Using by avoids the need to reshape your data, it splits your data e.g. by customer ID as in your case and applies a function on the subsets (i.e. it's a combination of split and lapply; see ?by).
Since you want to compare fitted and forecasted values somehow in your result, you probably need predict rather than $fitted.values, otherwise the values won't be of same length. Because your independent variable is a date in weekly intervals, you may use seq.Date and take the first date as a starting value; the sequence has length actual values (nrow each customer) plus h= argument of the forecast.
For demonstration purposes I add the fitted values as first column in the following.
res <- by(dat, dat$cus_key, function(x) {
H <- 20 ## globally define 'h'
fit <- lm(sales ~ date, x)
fitted <- fit$fitted.values
pred <- predict(fit, newdata=data.frame(
date=seq(x$date[1], length.out= nrow(x) + H, by="week")))
fcst <- c(fitted, forecast(fitted, h=H)$mean)
fit.na <- `length<-`(unname(fitted), length(pred)) ## for demonstration
return(cbind(fit.na, pred, fcst))
})
Result
res
# dat$cus_key: A28
# fit.na pred fcst
# 1 41.4 41.4 41.4
# 2 47.4 47.4 47.4
# 3 53.4 53.4 53.4
# 4 59.4 59.4 59.4
# 5 65.4 65.4 65.4
# 6 NA 71.4 71.4
# 7 NA 77.4 77.4
# 8 NA 83.4 83.4
# 9 NA 89.4 89.4
# 10 NA 95.4 95.4
# 11 NA 101.4 101.4
# 12 NA 107.4 107.4
# 13 NA 113.4 113.4
# 14 NA 119.4 119.4
# 15 NA 125.4 125.4
# 16 NA 131.4 131.4
# 17 NA 137.4 137.4
# 18 NA 143.4 143.4
# 19 NA 149.4 149.4
# 20 NA 155.4 155.4
# 21 NA 161.4 161.4
# 22 NA 167.4 167.4
# 23 NA 173.4 173.4
# 24 NA 179.4 179.4
# 25 NA 185.4 185.4
# ----------------------------------------------------------------
# dat$cus_key: B16
# fit.na pred fcst
# 1 49.0 49.0 49.0
# 2 47.7 47.7 47.7
# 3 46.4 46.4 46.4
# 4 45.1 45.1 45.1
# 5 43.8 43.8 43.8
# 6 NA 42.5 42.5
# 7 NA 41.2 41.2
# 8 NA 39.9 39.9
# 9 NA 38.6 38.6
# 10 NA 37.3 37.3
# 11 NA 36.0 36.0
# 12 NA 34.7 34.7
# 13 NA 33.4 33.4
# 14 NA 32.1 32.1
# 15 NA 30.8 30.8
# 16 NA 29.5 29.5
# 17 NA 28.2 28.2
# 18 NA 26.9 26.9
# 19 NA 25.6 25.6
# 20 NA 24.3 24.3
# 21 NA 23.0 23.0
# 22 NA 21.7 21.7
# 23 NA 20.4 20.4
# 24 NA 19.1 19.1
# 25 NA 17.8 17.8
# ----------------------------------------------------------------
# dat$cus_key: C12
# fit.na pred fcst
# 1 56.4 56.4 56.4
# 2 53.2 53.2 53.2
# 3 50.0 50.0 50.0
# 4 46.8 46.8 46.8
# 5 43.6 43.6 43.6
# 6 NA 40.4 40.4
# 7 NA 37.2 37.2
# 8 NA 34.0 34.0
# 9 NA 30.8 30.8
# 10 NA 27.6 27.6
# 11 NA 24.4 24.4
# 12 NA 21.2 21.2
# 13 NA 18.0 18.0
# 14 NA 14.8 14.8
# 15 NA 11.6 11.6
# 16 NA 8.4 8.4
# 17 NA 5.2 5.2
# 18 NA 2.0 2.0
# 19 NA -1.2 -1.2
# 20 NA -4.4 -4.4
# 21 NA -7.6 -7.6
# 22 NA -10.8 -10.8
# 23 NA -14.0 -14.0
# 24 NA -17.2 -17.2
# 25 NA -20.4 -20.4
As you can see, prediction and forecast yield the same values, since both methods are based on the same single explanatory variable date in this case.
Toy data:
set.seed(42)
dat <- transform(expand.grid(cus_key=paste0(LETTERS[1:3], sample(12:43, 3)),
date=seq.Date(as.Date("2018-05-13"), length.out=5, by="week")),
sales=sample(20:80, 15, replace=TRUE))

how to create a data.frame with nested column structure

I wish to create a data.frame with two columns, and each column contains multiple columns. (I need it to feed plsr in the pls package)
It's like the oliveoil data.
> oliveoil
chemical.Acidity chemical.Peroxide chemical.K232 chemical.K270 chemical.DK sensory.yellow sensory.green
G1 0.7300 12.7000 1.9000 0.1390 0.0030 21.4 73.4
G2 0.1900 12.3000 1.6780 0.1160 -0.0040 23.4 66.3
G3 0.2600 10.3000 1.6290 0.1160 -0.0050 32.7 53.5
G4 0.6700 13.7000 1.7010 0.1680 -0.0020 30.2 58.3
G5 0.5200 11.2000 1.5390 0.1190 -0.0010 51.8 32.5
I1 0.2600 18.7000 2.1170 0.1420 0.0010 40.7 42.9
I2 0.2400 15.3000 1.8910 0.1160 0.0000 53.8 30.4
I3 0.3000 18.5000 1.9080 0.1250 0.0010 26.4 66.5
I4 0.3500 15.6000 1.8240 0.1040 0.0000 65.7 12.1
I5 0.1900 19.4000 2.2220 0.1580 -0.0030 45.0 31.9
S1 0.1500 10.5000 1.5220 0.1160 -0.0040 70.9 12.2
S2 0.1600 8.1400 1.5270 0.1063 -0.0020 73.5 9.7
S3 0.2700 12.5000 1.5550 0.0930 -0.0020 68.1 12.0
S4 0.1600 11.0000 1.5730 0.0940 -0.0030 67.6 13.9
S5 0.2400 10.8000 1.3310 0.0850 -0.0030 71.4 10.6
S6 0.3000 11.4000 1.4150 0.0930 -0.0040 71.4 10.0
sensory.brown sensory.glossy sensory.transp sensory.syrup
G1 10.1 79.7 75.2 50.3
G2 9.8 77.8 68.7 51.7
G3 8.7 82.3 83.2 45.4
G4 12.2 81.1 77.1 47.8
G5 8.0 72.4 65.3 46.5
I1 20.1 67.7 63.5 52.2
I2 11.5 77.8 77.3 45.2
I3 14.2 78.7 74.6 51.8
I4 10.3 81.6 79.6 48.3
I5 28.4 75.7 72.9 52.8
S1 10.8 87.7 88.1 44.5
S2 8.3 89.9 89.7 42.3
S3 10.8 78.4 75.1 46.4
S4 11.9 84.6 83.8 48.5
S5 10.8 88.1 88.5 46.7
S6 11.4 89.5 88.5 47.2
And it is a data.frame with 2 columns:
> is.data.frame(oliveoil)
[1] TRUE
> dim(oliveoil)
[1] 16 2
I tried the following code:
x = data.frame(a = c(1,2,3), b = c(1,3,4))
y = data.frame(c = c(3,4,5), d = c(5,4,2))
d = data.frame(x = x, y = y)
it returns:
> d
x.a x.b y.c y.d
1 1 1 3 5
2 2 3 4 4
3 3 4 5 2
but I cannot call x with d$x
> d$x
NULL
what I expect is:
> d$x
a b
1 1 1
2 2 3
3 3 4
I am expecting some arguments in the data.frame function make it work, something like:
d = data.frame(x = x, y = y, merge.columns = F)
But I cannot find any arguments doing this in the docs
The pls::plsr() function does not require data to be set up exactly like oliveoil. plsr() allows the response term to be a matrix, and oliveoil has a particular way of storing matrices, but you can supply any matrix to plsr().
For example, this fits a model without error:
y <- matrix(rnorm(n), nrow = 10)
x <- matrix(rnorm(n), nrow = 10)
plsr(y ~ x)
# Partial least squares regression , fitted with the kernel algorithm.
# Call:
# plsr(formula = y ~ x)
Also, consider that the yarn dataset is also used in the pls docs, which just stores regular matrices in a data frame rather than the I() approach used by oliveoil.
For a bit more explanation:
The sub-components of oliveoil are not actually of class data.frame.
If you run str(oliveoil), you'll see the sensory and chemical objects in oliveoil are cast as AsIs objects. They're not technically data frame-classed objects, and in fact they were probably matrices with named rows and columns to begin with.
str(oliveoil)
'data.frame': 16 obs. of 2 variables:
$ chemical: 'AsIs' num [1:16, 1:5] 0.73 0.19 0.26 0.67 0.52 0.26 0.24 0.3 0.35 0.19 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "G1" "G2" "G3" "G4" ...
.. ..$ : chr "Acidity" "Peroxide" "K232" "K270" ...
$ sensory : 'AsIs' num [1:16, 1:6] 21.4 23.4 32.7 30.2 51.8 40.7 53.8 26.4 65.7 45 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "G1" "G2" "G3" "G4" ...
.. ..$ : chr "yellow" "green" "brown" "glossy" ...
The AsIs class means they were stored in oliveoil using the I() function (I think "I" is for "Identity"). I() protects an object from being converted into something else during an operation, like storage into a data frame.
You can reproduce this with a simple example (although note that if you try and store two data frames in a data frame with I() you'll get an error):
n <- 100
matrix_a <- matrix(rnorm(n), nrow = 10)
matrix_b <- matrix(rnorm(n), nrow = 10)
df <- data.frame(a = I(matrix_a), b = I(matrix_b))
str(df)
'data.frame': 10 obs. of 2 variables:
$ a: 'AsIs' num [1:10, 1:10] -0.817 -0.233 -1.987 0.523 -1.596 ...
$ b: 'AsIs' num [1:10, 1:10] 1.9189 -0.7043 0.0624 0.0152 -0.5409 ...
And df now contains matrix_a as $a and matrix_b as $b:
df$a
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.8167554 -0.61629222 0.3673423 1.30882012 0.97618868 -0.53124825
[2,] -0.2329451 0.08556506 -0.5839086 0.86298000 1.20452166 0.09825958
[3,] -1.9873738 -0.93537922 0.1057309 0.63585036 -1.09604531 1.33080572
[4,] 0.5227912 1.89505993 1.1184905 1.20683770 -0.02431886 -1.15878634
# ...
You could also just save matrix_a and matrix_b as matrices, directly:
# also works
df2 <- data.frame(a = matrix_a, b = matrix_b, foo = letters[1:10])
TL;DR - plsr() takes any matrix, but if you want your data stored in a data frame, create a matrix and save it into a data frame, with or without I().

How to create several new group-based variables most efficiently?

Let's use the following example:
set.seed(2409)
N=5
T=10
id<- rep(LETTERS[1:N],each=T)
time<-rep(1:T, times=N)
var1<-runif(N*T,0,100)
var2<-runif(N*T,0,100)
var3<-runif(N*T,0,100)
var4<-runif(N*T,0,100)
var5<-runif(N*T,0,100)
df<-data.frame(id,time,var1,var2,var3,var4,var5); rm(N,T,id,time,var1,var2,var3,var4,var5)
I now try to execute a function for several of these variables (not the whole series of variables!) and create new variables accordingly.
I already have a suitable code for creating log variables. For this I would use the following code:
cols <- c("var1",
"var3",
"var5")
log <- log(df[cols])
colnames(log) <- paste(colnames(log), "log", sep = "_")
df <- cbind(df,log); rm(log, cols)
This would give me my additional log variables. But now I also want to create lagged and z-transformed variables. These functions refer to the individual IDs. So I wrote the following code that of course works, but is extremely long and inefficient in my real dataset where I apply the function to 38 variables each:
library(Hmisc)
library(dplyr)
df<-df %>%
group_by(id) %>%
mutate(var1_1=Lag(var1, shift=1),
var3_1=Lag(var3, shift=1),
var5_1=Lag(var5, shift=1),
var1_2=Lag(var1, shift=2),
var3_2=Lag(var3, shift=2),
var5_2=Lag(var5, shift=2),
var1_z=scale(var1),
var3_z=scale(var3),
var5_z=scale(var5)
)
I am very sure that there is also a way to make this more efficient. It would be desirable if I could define the original variable once and execute different functions and create new variables as a result.
Thank you very much!
You can use mutate_at with funs. This will apply the three functions in funs to each of the three variables in vars, creating 9 new columns.
library(dplyr)
df %>%
group_by(id) %>%
mutate_at(vars(var1, var3, var5),
funs(lag1 = lag(.), lag2 = lag(., 2), scale))
# # A tibble: 50 x 16
# # Groups: id [5]
# id time var1 var2 var3 var4 var5 var1_lag1 var3_lag1 var5_lag1
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 38.8 25.7 29.2 91.1 35.3 NA NA NA
# 2 A 2 87.1 22.3 8.27 31.5 93.7 38.8 29.2 35.3
# 3 A 3 61.7 38.8 0.887 63.0 50.4 87.1 8.27 93.7
# 4 A 4 0.692 60.1 71.5 74.0 41.6 61.7 0.887 50.4
# 5 A 5 60.1 13.3 90.4 80.6 47.5 0.692 71.5 41.6
# 6 A 6 46.4 3.67 36.7 86.9 67.5 60.1 90.4 47.5
# 7 A 7 80.4 72.1 82.2 25.5 70.3 46.4 36.7 67.5
# 8 A 8 48.8 25.7 93.4 19.8 81.2 80.4 82.2 70.3
# 9 A 9 48.2 31.5 82.1 47.2 49.2 48.8 93.4 81.2
# 10 A 10 21.8 32.6 76.5 19.7 41.1 48.2 82.1 49.2
# # ... with 40 more rows, and 6 more variables: var1_lag2 <dbl>, var3_lag2 <dbl>,
# # var5_lag2 <dbl>, var1_scale <dbl>, var3_scale <dbl>, var5_scale <dbl>
Here is an option with data.table
library(data.table)
nm1 <- c('var1', 'var3', 'var5')
nm2 <- paste0(nm1, rep(c('_lag1', '_lag2'), each = 3))
nm3 <- paste0(nm1, '_scale')
setDT(df)[, c(nm2, nm3) := c(shift(.SD, n = 1:2), lapply(.SD,
function(x) as.vector(scale(x)))), by = id, .SDcols = nm1]'

Nesting several groups of columns inside a data frame

The concept of nesting several columns into a single list-column is very powerful. However, I am not sure whether it is possible at all to nest more than one set of columns into several list-columns within the same pipeline using the nest function in {tidyr}. For instance, assume I have the following data frame:
df <- as.data.frame(replicate(6, runif(10) * 100))
colnames(df) <- c(
paste0("a", 1:2), # a1, a2
paste0("b", 1:4) # b1, b2, b3, b4
)
df
a1 a2 b1 b2 b3 b4
1 20.807348 69.339482 91.837151 99.76813 3.394350 33.780049
2 64.667733 20.676381 80.523369 38.42774 85.635208 60.111491
3 55.352501 55.699571 4.812923 38.65333 98.869203 80.345576
4 45.194094 16.511696 83.834651 51.48698 7.191081 16.697210
5 66.401642 89.041055 26.965636 67.90061 90.622428 59.552935
6 35.750100 55.997766 49.768556 68.45900 67.523080 58.993232
7 21.392823 5.335281 56.348328 35.68331 51.029617 66.290035
8 8.851236 19.486580 14.199370 22.49754 14.617592 18.236406
9 70.475652 6.229997 43.169364 12.63378 21.415589 2.163004
10 47.837613 37.641530 38.001288 71.15896 71.000568 2.135611
I would like to nest the "a" columns into a list-column AND nest the "b" columns into a second list-column because I would like to perform different computations on them.
Nesting the "a" columns works:
library(tidyr)
nest(df, a1, a2, .key = "a")
b1 b2 b3 b4 a
1 91.837151 99.76813 3.394350 33.780049 20.80735, 69.33948
2 80.523369 38.42774 85.635208 60.111491 64.66773, 20.67638
3 4.812923 38.65333 98.869203 80.345576 55.35250, 55.69957
4 83.834651 51.48698 7.191081 16.697210 45.19409, 16.51170
5 26.965636 67.90061 90.622428 59.552935 66.40164, 89.04105
6 49.768556 68.45900 67.523080 58.993232 35.75010, 55.99777
7 56.348328 35.68331 51.029617 66.290035 21.392823, 5.335281
8 14.199370 22.49754 14.617592 18.236406 8.851236, 19.486580
9 43.169364 12.63378 21.415589 2.163004 70.475652, 6.229997
10 38.001288 71.15896 71.000568 2.135611 47.83761, 37.64153
But it is impossible to nest the "b" columns AFTER the "a" columns have been nested:
nest(df, a1, a2, .key = "a") %>%
nest(b1, b2, b3, b4, .key = "b")
Error in grouped_df_impl(data, unname(vars), drop) :
Column `a` can't be used as a grouping variable because it's a list
which makes sense by reading the error message.
My work-around is to:
nest the "a" columns
perform the required computations on the "a" list-column
unnest the "a" list-column
nest the "b" columns
perform the required computations on the "b" list-column
unnest the "b" list-column
Is there a more straight-forward way to achieve this? Your help is much appreciated.
We can use map to do this
library(tidyverse)
out <- list('a', 'b') %>%
map(~ df %>%
select(matches(.x)) %>%
nest(names(.), .key = !! rlang::sym(.x))) %>%
bind_cols
out
# A tibble: 1 x 2
# a b
# <list> <list>
#1 <data.frame [10 × 2]> <data.frame [10 × 4]>
out %>%
unnest
# A tibble: 10 x 6
# a1 a2 b1 b2 b3 b4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 20.8 69.3 91.8 99.8 3.39 33.8
# 2 64.7 20.7 80.5 38.4 85.6 60.1
# 3 55.4 55.7 4.81 38.7 98.9 80.3
# 4 45.2 16.5 83.8 51.5 7.19 16.7
# 5 66.4 89.0 27.0 67.9 90.6 59.6
# 6 35.8 56.0 49.8 68.5 67.5 59.0
# 7 21.4 5.34 56.3 35.7 51.0 66.3
# 8 8.85 19.5 14.2 22.5 14.6 18.2
# 9 70.5 6.23 43.2 12.6 21.4 2.16
#10 47.8 37.6 38.0 71.2 71.0 2.14
We could do the separate computations on the 'a' and 'b' list of columns
out %>%
mutate(a = map(a, `*`, 4)) %>%
unnest
# A tibble: 10 x 6
# a1 a2 b1 b2 b3 b4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 83.2 277. 91.8 99.8 3.39 33.8
# 2 259. 82.7 80.5 38.4 85.6 60.1
# 3 221. 223. 4.81 38.7 98.9 80.3
# 4 181. 66.0 83.8 51.5 7.19 16.7
# 5 266. 356. 27.0 67.9 90.6 59.6
# 6 143. 224. 49.8 68.5 67.5 59.0
# 7 85.6 21.3 56.3 35.7 51.0 66.3
# 8 35.4 77.9 14.2 22.5 14.6 18.2
# 9 282. 24.9 43.2 12.6 21.4 2.16
#10 191. 151. 38.0 71.2 71.0 2.14
Having said that, it is also possible to select columns of interest with mutate_at instead of doing nest/unnest
df %>%
mutate_at(vars(matches('^a\\d+')), funs(.*4))

Resources