Purrr's Modify-In Function - r

I'm trying to use purrr's modify_in to modify elements of a list. An example of the list:
tib_list <- map(1:3, ~ tibble(col_one = runif(5),
col_two = runif(5), col_three = runif(5)))
Let's say I want to change elements 2 and 3 of the list to unselect col_one. I imagined doing this:
modify_in(tib_list, 2:length(tib_list), ~ select(.x, -col_one)
But this yields an error. I then thought of doing something like this, but this ends up duplicating the list
map(1:3, ~ modify_in(tib_list, .x, ~ select(.x, -col_one))

I think you wanted to use modify_at which lets you specify either element names or positions. modify_in allows us to use only one position like purrr::pluck.
library(tidyverse)
tib_list <- map(1:3, ~ tibble(col_one = runif(5), col_two = runif(5), col_three = runif(5)))
modify_at(tib_list, c(2,3), ~ select(.x, -col_one))
#> [[1]]
#> # A tibble: 5 x 3
#> col_one col_two col_three
#> <dbl> <dbl> <dbl>
#> 1 0.190 0.599 0.824
#> 2 0.214 0.172 0.106
#> 3 0.236 0.666 0.584
#> 4 0.373 0.903 0.252
#> 5 0.875 0.196 0.643
#>
#> [[2]]
#> # A tibble: 5 x 2
#> col_two col_three
#> <dbl> <dbl>
#> 1 0.513 0.113
#> 2 0.893 0.377
#> 3 0.275 0.675
#> 4 0.529 0.612
#> 5 0.745 0.405
#>
#> [[3]]
#> # A tibble: 5 x 2
#> col_two col_three
#> <dbl> <dbl>
#> 1 0.470 0.789
#> 2 0.181 0.289
#> 3 0.680 0.213
#> 4 0.772 0.114
#> 5 0.314 0.895
Created on 2021-08-27 by the reprex package (v0.3.0)
We can use modify_in with one position, but supplying a vector such as c(2,3) would mean that we want to access the third element of the second parent element in a nested list. This is why we see the error below.
# works
modify_in(tib_list, 2, ~ select(.x, -col_one))
#> [[1]]
#> # A tibble: 5 x 3
#> col_one col_two col_three
#> <dbl> <dbl> <dbl>
#> 1 0.109 0.697 0.0343
#> 2 0.304 0.645 0.851
#> 3 0.530 0.786 0.600
#> 4 0.708 0.0324 0.605
#> 5 0.898 0.232 0.567
#>
#> [[2]]
#> # A tibble: 5 x 2
#> col_two col_three
#> <dbl> <dbl>
#> 1 0.766 0.157
#> 2 0.0569 0.0422
#> 3 0.943 0.0850
#> 4 0.947 0.0806
#> 5 0.761 0.297
#>
#> [[3]]
#> # A tibble: 5 x 3
#> col_one col_two col_three
#> <dbl> <dbl> <dbl>
#> 1 0.878 0.864 0.540
#> 2 0.168 0.745 0.120
#> 3 0.943 0.338 0.535
#> 4 0.353 0.478 0.204
#> 5 0.267 0.669 0.478
# doesn't work
modify_in(tib_list, c(2,3), ~ select(.x, -col_one))
#> Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "c('double', 'numeric')"

I never used modify_in, but you could use
library(purrr)
library(dplyr)
tib_list %>%
imap(~ if (.y > 1) { select(.x, -col_one) } else { .x })
to get
[[1]]
# A tibble: 5 x 3
col_one col_two col_three
<dbl> <dbl> <dbl>
1 0.710 0.189 0.644
2 0.217 0.946 0.955
3 0.590 0.770 0.0180
4 0.135 0.101 0.888
5 0.640 0.645 0.346
[[2]]
# A tibble: 5 x 2
col_two col_three
<dbl> <dbl>
1 0.267 0.926
2 0.456 0.0902
3 0.659 0.707
4 0.421 0.0451
5 0.801 0.220
[[3]]
# A tibble: 5 x 2
col_two col_three
<dbl> <dbl>
1 0.437 0.649
2 0.256 0.466
3 0.331 0.594
4 0.586 0.558
5 0.625 0.444

We can use modify_if
modify_if(tib_list,.f = ~ .x %>% select(-col_one),
.p = seq_along(tib_list) != 1)
-output
[[1]]
# A tibble: 5 x 3
col_one col_two col_three
<dbl> <dbl> <dbl>
1 0.819 0.666 0.384
2 0.183 0.549 0.0211
3 0.374 0.240 0.252
4 0.359 0.913 0.792
5 0.515 0.402 0.217
[[2]]
# A tibble: 5 x 2
col_two col_three
<dbl> <dbl>
1 0.696 0.0269
2 0.433 0.147
3 0.235 0.743
4 0.589 0.748
5 0.635 0.851
[[3]]
# A tibble: 5 x 2
col_two col_three
<dbl> <dbl>
1 0.707 0.976
2 0.0966 0.130
3 0.574 0.572
4 0.854 0.680
5 0.819 0.582

Related

split dataframe with recurring columnames

I have imported an excel sheet in R which is a compilation of several dataframes with identical columnnames. To illustrate it looks like this:
df <- tibble( empty = c(runif(3), NA, NA, NA, NA),
A = c(runif(3), NA, NA, NA, NA),
B = c(runif(3), NA, NA, NA, NA),
C = c(runif(3), NA, NA, NA, NA),
empty = c(runif(6), NA),
A = c(runif(6), NA),
B = c(runif(6), NA),
C = c(runif(6), NA),
empty = c(runif(5), NA, NA),
A = c(runif(5), NA, NA),
B = c(runif(5), NA, NA),
C = c(runif(5), NA, NA),
.name_repair = "minimal")
How can I transform this dataframe in this result:
> df1
# A tibble: 3 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.200 0.0665 0.723 0.487
2 0.576 0.990 0.969 0.289
3 0.727 0.192 0.780 0.243
> df2
# A tibble: 6 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.556 0.698 0.796 0.357
2 0.308 0.542 0.867 0.103
3 0.643 0.792 0.385 0.882
4 0.675 0.504 0.489 0.0515
5 0.426 0.775 0.410 0.748
6 0.343 0.752 0.185 0.542
> df3
# A tibble: 5 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.229 0.0508 0.0880 0.486
2 0.146 0.295 0.562 0.731
3 0.292 0.804 0.133 0.0480
4 0.0404 0.399 0.366 0.152
5 0.226 0.702 0.476 0.416
The column with name empty has actually no name although I don't know how to assign this in this example.
The reason I ask this question is because I have several other sheets with a different number of similar columns per sheet (D, E etc).
I found a nice post here:
split data frame with recurring column names
although this post looks the same, it is quit different.
Thanks!
This puts the results in a list which should be more convenient than sequentially named data frames.
first_col = "empty"
name_groups = cumsum(names(df) == "empty")
result = split.default(df, name_groups)
# omit rows that have only missing values
result = lapply(result, \(x) x[rowSums(is.na(x)) < ncol(x), ])
result
# $`1`
# # A tibble: 3 × 4
# empty A B C
# <dbl> <dbl> <dbl> <dbl>
# 1 0.590 0.602 0.527 0.900
# 2 0.0450 0.713 0.936 0.911
# 3 0.567 0.781 0.349 0.686
#
# $`2`
# # A tibble: 6 × 4
# empty A B C
# <dbl> <dbl> <dbl> <dbl>
# 1 0.480 0.543 0.744 0.0684
# 2 0.0423 0.799 0.927 0.537
# 3 0.962 0.0745 0.851 0.0639
# 4 0.615 0.546 0.390 0.0985
# 5 0.258 0.857 0.139 0.172
# 6 0.944 0.375 0.356 0.715
#
# $`3`
# # A tibble: 5 × 4
# empty A B C
# <dbl> <dbl> <dbl> <dbl>
# 1 0.790 0.572 0.600 0.701
# 2 0.732 0.610 0.0395 0.283
# 3 0.130 0.168 0.120 0.0682
# 4 0.112 0.682 0.586 0.640
# 5 0.211 0.267 0.0189 0.606
If you really want df1, df2, ... in your global environment, add these lines:
names(result) = paste0("df", names(result))
list2env(result, envir = .GlobalEnv)
When the number of repetition is constant (here 4) then we could do something likes this:
base R:
df1 <- df[,1:4]
df2 <- df[,5:8]
df3 <- df[,9:12]
> df1
# A tibble: 7 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.120 0.448 0.0453 0.315
2 0.337 0.296 0.757 0.448
3 0.533 0.574 0.681 0.324
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
> df2
# A tibble: 7 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.420 0.306 0.472 0.107
2 0.639 0.666 0.349 0.768
3 0.469 0.311 0.100 0.744
4 0.00122 0.586 0.437 0.796
5 0.122 0.00989 0.289 0.408
6 0.570 0.253 0.877 0.197
7 NA NA NA NA
> df3
# A tibble: 7 x 4
empty A B C
<dbl> <dbl> <dbl> <dbl>
1 0.812 0.0464 0.473 0.638
2 0.340 0.482 0.269 0.164
3 0.0323 0.952 0.842 0.282
4 0.511 0.263 0.934 0.183
5 0.0711 0.483 0.763 0.639
6 NA NA NA NA
7 NA NA NA NA
df1 <- df[,1:4][1:3,]
df2 <- df[,5:8][1:6,]
df3 <- df[,9:12][1:5,]
Another possible solution, based on tidyverse:
library(tidyverse)
stack(df) %>%
filter(!is.na(values)) %>%
group_by(aux = cumsum(ind == "empty" & lag(ind, default = "") != "empty")) %>%
group_split() %>%
map(~ pivot_wider(.x %>% select(-aux), names_from = "ind",
values_from = "values", values_fn = list) %>% unnest(everything()))
#> [[1]]
#> # A tibble: 3 × 4
#> empty A B C
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.865 0.0634 0.127 0.136
#> 2 0.343 0.431 0.943 0.985
#> 3 0.482 0.635 0.150 0.263
#>
#> [[2]]
#> # A tibble: 6 × 4
#> empty A B C
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.0656 0.514 0.834 0.662
#> 2 0.977 0.657 0.878 0.427
#> 3 0.670 0.641 0.910 0.175
#> 4 0.402 0.0494 0.433 0.0241
#> 5 0.211 0.388 0.971 0.273
#> 6 0.681 0.355 0.749 0.0536
#>
#> [[3]]
#> # A tibble: 5 × 4
#> empty A B C
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.440 0.856 0.00734 0.0474
#> 2 0.0347 0.328 0.471 0.845
#> 3 0.106 0.393 0.303 0.811
#> 4 0.385 0.184 0.540 0.180
#> 5 0.564 0.579 0.414 0.0110

map_dfr outputting a row rather than a column

This is similar to purrr::map_dfr binds by columns, not row as expected but the solutions there aren't working for me. I have a dataframe like
beta_df <- structure(list(intercept = c(-2.75747056032685, -2.90831892599742,
-2.92478082251453, -2.99701559041538, -2.88885796048347, -3.09564193631675
), B1 = c(0.0898235360814854, 0.0291839369781567, 0.0881023522236231,
0.231703026085554, 0.0441573699433149, 0.258219673780526), B2 = c(-0.222367437619057,
0.770536384299238, 0.199648657850609, 0.0529038155448773, 0.00310458335580774,
0.132604387458483), B3 = c(1.26339268033385, 1.29883641278223,
0.949504940387809, 1.26904511447941, 0.863882674439083, 0.823907268679309
), B4 = c(2.13662994525526, 1.02340744740827, 0.959079691725652,
1.60672779812489, 1.19095838867883, -0.0693120654049908)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
#> # A tibble: 6 × 5
#> intercept B1 B2 B3 B4
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -2.76 0.0898 -0.222 1.26 2.14
#> 2 -2.91 0.0292 0.771 1.30 1.02
#> 3 -2.92 0.0881 0.200 0.950 0.959
#> 4 -3.00 0.232 0.0529 1.27 1.61
#> 5 -2.89 0.0442 0.00310 0.864 1.19
#> 6 -3.10 0.258 0.133 0.824 -0.0693
I'd like to turn this into a tibble with columns for the mean, 0.025 and 0.975 quantiles. For the quantile function this works:
beta_df %>%
map_dfr(quantile,0.025)
#> # A tibble: 5 × 1
#> `2.5%`
#> <dbl>
#> 1 -3.08
#> 2 0.0311
#> 3 -0.194
#> 4 0.829
#> 5 0.0592
And this gets me both quantiles
bind_cols(beta_df %>%
map_dfr(quantile, 0.025),
beta_df %>%
map_dfr(quantile, 0.975))
#> # A tibble: 5 × 2
#> `2.5%` `97.5%`
#> <dbl> <dbl>
#> 1 -3.08 -2.77
#> 2 0.0311 0.255
#> 3 -0.194 0.699
#> 4 0.829 1.30
#> 5 0.0592 2.07
But for mean,
beta_df %>%
map_dfr(mean)
#> # A tibble: 1 × 5
#> intercept B1 B2 B3 B4
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -2.93 0.124 0.156 1.08 1.14
Gives me a long row rather than a column. How can I turn the mean of each column of the original dataframe into a row of a single column dataframe labelled mean?
The reason is because the output of quantile() is a named vector whereas for the mean() is just a single value.
Lets create a custom function with the mean that outputs a named vector,
myMean <- function(x) {setNames(mean(x), nm = 'theMean')}
Applying that using map_dfr we get,
library(dplyr)
beta_df %>%
purrr::map_dfr(myMean)
# A tibble: 5 x 1
theMean
<dbl>
1 -2.93
2 0.124
3 0.156
4 1.08
5 1.14

How do I combine many tibbles by a simple code?

I have pop_1910, ... pop_2000. Each tibble has the following style. I want to combine these tibbles to one tibble. I know bind_rows to do that pop_1910 %>% bind_rows(pop_1920) %>% bind_rows(pop_1930). But it is a little bit tedious. Are there some efficient ways to combine many dataframes?
> pop_2000
# A tibble: 3,143 x 3
fips year pop
<chr> <dbl> <dbl>
1 01001 2000 33364
2 01003 2000 112162
3 01005 2000 23042
4 01007 2000 15432
5 01009 2000 40165
6 01011 2000 9142
7 01013 2000 16798
8 01015 2000 90175
9 01017 2000 29086
10 01019 2000 19470
If you have them inside a list, you can use reduce() to bind all in one move.
library(tidyverse)
my_df_list <- map(1:4, ~tibble(x = rnorm(5), y = rnorm(5)))
my_df_list
#> [[1]]
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 1.99 1.19
#> 2 0.273 0.208
#> 3 1.12 1.18
#> 4 0.00855 -0.593
#> 5 0.502 -0.926
#>
#> [[2]]
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 0.570 -0.709
#> 2 0.599 -0.408
#> 3 -0.687 1.38
#> 4 0.375 1.53
#> 5 0.0394 1.90
#>
#> [[3]]
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 -0.576 1.64
#> 2 0.147 -0.0384
#> 3 0.904 0.164
#> 4 -1.16 -1.02
#> 5 -0.678 1.32
#>
#> [[4]]
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 -0.849 -0.445
#> 2 -0.786 -0.991
#> 3 1.17 -1.00
#> 4 0.222 1.65
#> 5 -0.656 -0.808
reduce(my_df_list, bind_rows)
#> # A tibble: 20 x 2
#> x y
#> <dbl> <dbl>
#> 1 1.99 1.19
#> 2 0.273 0.208
#> 3 1.12 1.18
#> 4 0.00855 -0.593
#> 5 0.502 -0.926
#> 6 0.570 -0.709
#> 7 0.599 -0.408
#> 8 -0.687 1.38
#> 9 0.375 1.53
#> 10 0.0394 1.90
#> 11 -0.576 1.64
#> 12 0.147 -0.0384
#> 13 0.904 0.164
#> 14 -1.16 -1.02
#> 15 -0.678 1.32
#> 16 -0.849 -0.445
#> 17 -0.786 -0.991
#> 18 1.17 -1.00
#> 19 0.222 1.65
#> 20 -0.656 -0.808
Created on 2021-06-07 by the reprex package (v2.0.0)
you may also use map_dfr simply
purrr::map_dfr(my_list, ~.x)
This will give you a single df binded by rows.
OR in baseR
do.call(rbind, my_list)
Even easier is piping your list to dplyr::bind_rows(), e.g.
library(dplyr)
my_list %>% bind_rows()

Can I overlook a missing variable in a summing part of a function?

This is a shortened version of my real df. I have a function (called: calc) which creates a new variable called 'total', for simplicity this adds up three variables: a, b, c. When I add a dataframe, to that function, that does not feature one variable (say c) so only has a & b, the function falls over. Is there a 'function' / simple way that counts the variables regardless if they are missing?
calc <- function(x) {x %>% mutate(total = a + b + c)}
data.2 has two columns a & b with many rows of values, but when running that in the function it cannot find c so does not calculate.
new.df <- calc(data.2)
Many thanks.
If you want to perform rowwise sum or mean they have na.rm argument which you can use to ignore NA values.
library(dplyr)
calc <- function(x) {x %>% mutate(total = rowSums(select(., a:c), na.rm = TRUE))}
In general case if you are not able to find a function which gives you an out-of-box solution you can replace NA values with 0 maybe and then perform the operation that you want to perform.
calc <- function(x) {
x %>%
mutate(across(a:c, tidyr::replace_na, 0),
total = a + b + c)
}
You can use rowwise() and c_across() with any_of() (or any other tidyselect function) from dplyr (>= 1.0.0).
library(dplyr)
df <- data.frame(a = rnorm(10), b = rnorm(10))
dfc <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
calc <- function(x) {
x %>%
rowwise() %>%
mutate(total = sum(c_across(any_of(c("a", "b", "c"))))) %>%
ungroup()
}
calc(df)
#> # A tibble: 10 x 3
#> a b total
#> <dbl> <dbl> <dbl>
#> 1 -0.884 0.851 -0.0339
#> 2 -1.56 -0.464 -2.02
#> 3 -0.884 0.815 -0.0689
#> 4 -1.46 -0.259 -1.71
#> 5 0.211 -0.528 -0.317
#> 6 1.85 0.190 2.04
#> 7 -1.31 -0.921 -2.23
#> 8 0.450 0.394 0.845
#> 9 -1.14 0.428 -0.714
#> 10 -1.11 0.417 -0.698
calc(dfc)
#> # A tibble: 10 x 4
#> a b c total
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.0868 0.632 1.81 2.36
#> 2 0.568 -0.523 0.240 0.286
#> 3 -0.0325 0.377 -0.437 -0.0921
#> 4 0.660 0.456 1.28 2.39
#> 5 -0.123 1.75 -1.03 0.599
#> 6 0.641 1.39 0.902 2.93
#> 7 0.266 0.520 0.904 1.69
#> 8 -1.53 0.319 0.439 -0.776
#> 9 0.942 0.468 -1.69 -0.277
#> 10 0.254 -0.600 -0.196 -0.542
If you want to be able to generalize beyond those 3 variables you can use any tidyselect methodology.
df <- data.frame(a = rnorm(10), b = rnorm(10))
dfc <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
calc <- function(x) {
x %>%
rowwise() %>%
mutate(total = sum(c_across(everything()))) %>%
ungroup()
}
calc(df)
#> # A tibble: 10 x 3
#> a b total
#> <dbl> <dbl> <dbl>
#> 1 0.775 1.17 1.95
#> 2 -1.05 1.21 0.155
#> 3 2.07 -0.264 1.81
#> 4 1.11 0.793 1.90
#> 5 -0.700 -0.216 -0.916
#> 6 -1.04 -1.03 -2.07
#> 7 -0.525 1.60 1.07
#> 8 0.354 0.828 1.18
#> 9 0.126 0.110 0.236
#> 10 -0.0954 -0.603 -0.698
calc(dfc)
#> # A tibble: 10 x 4
#> a b c total
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.616 0.767 0.0462 0.196
#> 2 -0.370 -0.538 -0.186 -1.09
#> 3 0.337 1.11 -0.700 0.751
#> 4 -0.993 -0.531 -0.984 -2.51
#> 5 0.0538 1.50 -0.0808 1.47
#> 6 -0.907 -1.54 -0.734 -3.18
#> 7 -1.65 -0.242 1.43 -0.455
#> 8 -0.166 0.447 -0.281 -0.000524
#> 9 0.0637 -0.0185 0.754 0.800
#> 10 1.81 -1.09 -2.15 -1.42
Created on 2020-09-10 by the reprex package (v0.3.0)

Calculate all possible interactions in model_matrix

I'm simulating data with a fluctuating number of variables. As part of the situation, I am needing to calculate a model matrix with all possible combinations. See the following reprex for an example. I am able to get all two-interactions by specifying the formula as ~ .*.. However, this particular dataset has 3 variables (ndim <- 3). I can get all two- and three-way interactions by specifying the formula as ~ .^3. The issue is that there may be 4+ variables that I need to calculate, so I would like to be able to generalize this. I have tried specifying the formula as ~ .^ndim, but this throws an error.
Is there a way define the power in the formula with a variable?
library(tidyverse)
library(mvtnorm)
library(modelr)
ndim <- 3
data <- rmvnorm(100, mean = rep(0, ndim)) %>%
as_tibble(.name_repair = ~ paste0("dim_", seq_len(ndim)))
model_matrix(data, ~ .*.)
#> # A tibble: 100 x 7
#> `(Intercept)` dim_1 dim_2 dim_3 `dim_1:dim_2` `dim_1:dim_3`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.775 0.214 0.111 -0.166 -0.0857
#> 2 1 1.25 -0.0636 1.40 -0.0794 1.75
#> 3 1 1.07 -0.361 0.976 -0.384 1.04
#> 4 1 2.08 0.381 0.593 0.793 1.24
#> 5 1 -0.197 0.382 -0.257 -0.0753 0.0506
#> 6 1 0.266 -1.82 0.00411 -0.485 0.00109
#> 7 1 3.09 2.57 -0.612 7.96 -1.89
#> 8 1 2.03 0.247 0.112 0.501 0.226
#> 9 1 -0.397 0.204 1.55 -0.0810 -0.614
#> 10 1 0.597 0.335 0.533 0.200 0.319
#> # … with 90 more rows, and 1 more variable: `dim_2:dim_3` <dbl>
model_matrix(data, ~ .^3)
#> # A tibble: 100 x 8
#> `(Intercept)` dim_1 dim_2 dim_3 `dim_1:dim_2` `dim_1:dim_3`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.775 0.214 0.111 -0.166 -0.0857
#> 2 1 1.25 -0.0636 1.40 -0.0794 1.75
#> 3 1 1.07 -0.361 0.976 -0.384 1.04
#> 4 1 2.08 0.381 0.593 0.793 1.24
#> 5 1 -0.197 0.382 -0.257 -0.0753 0.0506
#> 6 1 0.266 -1.82 0.00411 -0.485 0.00109
#> 7 1 3.09 2.57 -0.612 7.96 -1.89
#> 8 1 2.03 0.247 0.112 0.501 0.226
#> 9 1 -0.397 0.204 1.55 -0.0810 -0.614
#> 10 1 0.597 0.335 0.533 0.200 0.319
#> # … with 90 more rows, and 2 more variables: `dim_2:dim_3` <dbl>,
#> # `dim_1:dim_2:dim_3` <dbl>
model_matrix(data, ~.^ndim)
#> Error in terms.formula(object, data = data): invalid power in formula
Created on 2019-02-15 by the reprex package (v0.2.1)
You can use use as.formula with paste in model_matrix:
model_matrix(data, as.formula(paste0("~ .^", ndim)))

Resources