I have a nested list of lists that include data tables at the bottom level. My goal is to map a function to a single table to transform the data. I'm currently trying to use purrr:map_depth and purrr::map_at in conjunction to create the plot. The reason I need to use map_at or map_if is the plot function I'm using takes different arguments depending on the table.
example below
library(data.table)
library(purrr)
example = list(
group1 = list(
all = data.table(
x = 1:10,
y = 10:1),
not_all = data.table(
x2 = 11:20,
y2 = 20:11)
),
group2 = list(
all = data.table(
x = 1:10,
y = 10:1),
not_all = data.table(
x2 = 11:20,
y2 = 20:11)
),
group3 = list(
all = data.table(
x = 1:10,
y = 10:1),
not_all = data.table(
x2 = 11:20,
y2 = 20:11)
),
group4 = list(
all = data.table(
x = 1:10,
y = 10:1),
not_all = data.table(
x2 = 11:20,
y2 = 20:11)
)
)
I'm planning to use highcharter::data_to_boxplot as the mapping function.
So far I've been unable to extract a single table to map to and don't have a solid grasp of the purrr syntax yet.
map_depth(example, 2, map_at(., "all", data_to_boxplot, variable = x))
# Error: character indexing requires a named object
map_depth(example, 2, ~map_at(., "all", data_to_boxplot, variable = x))
# this prints out the entire list
# would like to try something like this too but can't figure out the piping correctly
map_depth(example, 2) %>%
map_at(., "all", data_to_boxplot, variable = x)
# Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
Any help would be greatly appreciated!
I think you need a nested map-map_at-construct:
library(data.table)
library(highcharter)
library(purrr)
example %>%
map(~.x %>%
map_at("all", data_to_boxplot, variable = x))
This returns
$group1
$group1$all
# A tibble: 1 x 4
name data id type
<lgl> <list> <lgl> <chr>
1 NA <list [1]> NA boxplot
$group1$not_all
x2 y2
1: 11 20
2: 12 19
3: 13 18
4: 14 17
5: 15 16
6: 16 15
7: 17 14
8: 18 13
9: 19 12
10: 20 11
$group2
$group2$all
# A tibble: 1 x 4
name data id type
<lgl> <list> <lgl> <chr>
1 NA <list [1]> NA boxplot
$group3
$group3$all
# A tibble: 1 x 4
name data id type
<lgl> <list> <lgl> <chr>
1 NA <list [1]> NA boxplot
$group4
$group4$all
# A tibble: 1 x 4
name data id type
<lgl> <list> <lgl> <chr>
1 NA <list [1]> NA boxplot
How does this work?
example is a list of lists. map applies a function to each element of this list. These elements are also lists.
The custom function used in map is just another map-function, which is applied to an object named all (here at "level 2").
This is equivalent to
example %>%
map_depth(1,
~.x %>%
map_at("all", data_to_boxplot, variable = x),
)
Take a look at ?map_depth:
map_depth(.x, .depth, .f, ..., .ragged = FALSE)
with .depth defined as "Level of .x to map on. Use a negative value to count up from the lowest level of the list."
map_depth(x, 0, fun) is equivalent to fun(x).
map_depth(x, 1, fun) is equivalent to x <- map(x, fun).
map_depth(x, 2, fun) is equivalent to x <- map(x, ~ map(., fun)).
Related
Edit: Simply use rbind from base!
I have a list of tibbles with the same column names and orders, but possibly incompatible column types. I would like to vertically-concatenate the tables into one, à la tibble::add_row(), automatically converting types to the greatest common denominator where necessary (in the same way that e.g., c(1, 2, "a") returns c("1", "2", "a"). I don’t know the types of columns in advance.
For example,
> X = tibble(a = 1:3, b = c("a", "b", "c"))
# A tibble: 3 × 2
a b
<int> <chr>
1 1 a
2 2 b
3 3 c
> Y = tibble(a = "Any", b = 1)
# A tibble: 1 × 2
a b
<chr> <dbl>
1 Any 1
Desired output:
# A tibble: 4 × 2
a b
<chr> <chr>
1 1 a
2 2 b
3 3 c
4 Any 1
Is there a way to do this generically? I’m trying to write code for a package that is agnostic about data frames and tibbles (i.e., it doesn’t convert into one or the other).
Ideally, type promotion should reflect the behaviour of c(...) (NULL < raw < logical < integer < double < complex < character < list < expression) — except for factors, where I’d like to preserve the factor label (whatever its type), not the underlying index.
I think rbind(X, Y) has achieved what you want. Herer is another idea. Assume that X and Y have the same column names and orders, you could use map2() from purrr to apply c() over the corresponding columns from X and Y.
purrr::map2_dfc(X, Y, c)
# # A tibble: 4 × 2
# a b
# <chr> <chr>
# 1 1 a
# 2 2 b
# 3 3 c
# 4 Any 1
If X and Y do not have the same column names and orders, you could intersect their names and follow the same way:
cols <- intersect(names(X), names(Y))
purrr::map2_dfc(X[cols], Y[cols], c)
Utilising the overly liberal behaviour of base R by doing do.call(rbind, list(X, Y)) would get you some of the way there, but comes with some downsides, such as that the order in which you combine things matters (consider the output of as.character(TRUE) vs as.character(as.integer(TRUE)).
A better approach would probably be to look at all of your data frames to work out what final column types you need to cast to, and cast your columns to these types separately before combining the data frames. Here's a function that will do this:
library(tidyverse)
coerce_bind_rows <- function(...) {
casts <- list(
raw = NULL,
logical = as.logical,
integer = as.integer,
numeric = as.numeric,
double = as.double,
character = as.character,
list = as.list
)
dfs <- list(...)
dfs_fmt_objs <- map(dfs, mutate, across(where(is.object), format))
targets <-
dfs_fmt_objs |>
map(partial(map_chr, ... = , typeof)) |>
pmap(c) |>
map(factor, levels = names(casts), ordered = TRUE) |>
map(compose(as.character, max))
dfs_casted <-
dfs_fmt_objs |>
map(function(.data, .types = targets) {
for (.col in names(.types)) {
.fn <- casts[[.types[[.col]]]]
.data[[.col]] <- .fn(.data[[.col]])
}
.data
})
bind_rows(dfs_casted)
}
[Edited to format classed objects to handle factors as specified in update to the question]
Testing on your examples above:
X <- tibble(a = 1:3, b = c("a", "b", "c"))
Y <- tibble(a = "Any", b = 1)
coerce_bind_rows(X, Y)
#> # A tibble: 4 x 2
#> a b
#> <chr> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 Any 1
Testing on some data frames with a broader range of types:
W <- tibble(a = FALSE, b = raw(1L))
Z <- tibble(a = list(4), b = "d")
coerce_bind_rows(W, X, Y, Z)
#> # A tibble: 6 x 2
#> a b
#> <list> <chr>
#> 1 <lgl [1]> 00
#> 2 <int [1]> a
#> 3 <int [1]> b
#> 4 <int [1]> c
#> 5 <chr [1]> 1
#> 6 <dbl [1]> d
By the way, data frame columns have to be vectors (which include atomic vectors or lists), so you can't have a data frame with columns that are NULLs or expressions. But this approach should also work for everything between raw to list type vectors.
I have a tibble with the explicit "id" and colnames I need to convert to NA's. Is there anyway I can create the NA's without making my df a long dataset? I considered using the new rows_update function, but I'm not sure if this is correct because I only want certain columns to be NA.
library(dplyr)
to_na <- tribble(~x, ~col,
1, "z",
3, "y"
)
df <- tibble(x = c(1,2,3),
y = c(1,1,1),
z = c(2,2,2))
# desired output:
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 1 1 NA
#> 2 2 1 2
#> 3 3 NA 2
Created on 2020-07-03 by the reprex package (v0.3.0)
This definitely isn't the most elegant solution, but it gets the output you want.
library(dplyr)
library(purrr)
to_na <- tribble(~x, ~col,
1, "z",
3, "y"
)
df <- tibble(x = c(1,2,3),
y = c(1,1,1),
z = c(2,2,2))
map2(to_na$x, to_na$col, #Pass through these two objects in parallel
function(xval_to_missing, col) df %>% #Two objects above matched by position here.
mutate_at(col, #mutate_at the specified cols
~if_else(x == xval_to_missing, NA_real_, .) #if x == xval_to_missing, make NA, else keep as is.
) %>%
select(x, col) #keep x and the modified column.
) %>% #end of map2
reduce(left_join, by = "x") %>% #merge within the above list, by x.
relocate(x, y, z) #Keep your ordering
Output:
# A tibble: 3 x 3
x y z
<dbl> <dbl> <dbl>
1 1 1 NA
2 2 1 2
3 3 NA 2
We can use row/column indexing to assign the values to NA in base R
df <- as.data.frame(df)
df[cbind(to_na$x, match(to_na$col, names(df)))] <- NA
df
# x y z
#1 1 1 NA
#2 2 1 2
#3 3 NA 2
If we want to use rows_update
library(dplyr)
library(tidyr)
library(purrr)
lst1 <- to_na %>%
mutate(new = NA_real_) %>%
split(seq_len(nrow(.))) %>%
map(~ .x %>%
pivot_wider(names_from = col, values_from = new))
for(i in seq_along(lst1)) df <- rows_update(df, lst1[[i]])
df
# A tibble: 3 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 1 NA
#2 2 1 2
#3 3 NA 2
I have a function which returns a tibble. It runs OK, but I want to vectorize it.
library(tidyverse)
tibTest <- tibble(argX = 1:4, argY = 7:4)
square_it <- function(xx, yy) {
if(xx >= 4){
tibble(x = NA, y = NA)
} else if(xx == 3){
tibble(x = as.integer(), y = as.integer())
} else if (xx == 2){
tibble(x = xx^2 - 1, y = yy^2 -1)
} else {
tibble(x = xx^2, y = yy^2)
}
}
It runs OK in a mutate when I call it with map2, giving me the result I wanted:
tibTest %>%
mutate(sq = map2(argX, argY, square_it)) %>%
unnest()
## A tibble: 3 x 4
# argX argY x y
# <int> <int> <dbl> <dbl>
# 1 1 7 1 49
# 2 2 6 3 35
# 3 4 4 NA NA
My first attempt to vectorize it failed, and I can see why - I can't return a vector of tibbles.
square_it2 <- function(xx, yy){
case_when(
x >= 4 ~ tibble(x = NA, y = NA),
x == 3 ~ tibble(x = as.integer(), y = as.integer()),
x == 2 ~ tibble(x = xx^2 - 1, y = yy^2 -1),
TRUE ~ tibble(x = xx^2, y = yy^2)
)
}
# square_it2(4, 2) # FAILS
My next attempt runs OK on a simple input. I can return a list of tibbles, and that's what I want for the unnest
square_it3 <- function(xx, yy){
case_when(
xx >= 4 ~ list(tibble(x = NA, y = NA)),
xx == 3 ~ list(tibble(x = as.integer(), y = as.integer())),
xx == 2 ~ list(tibble(x = xx^2 - 1, y = yy^2 -1)),
TRUE ~ list(tibble(x = xx^2, y = yy^2))
)
}
square_it3(4, 2)
# [[1]]
# # A tibble: 1 x 2
# x y
# <lgl> <lgl>
# 1 NA NA
But when I call it in a mutate, it doesn't give me the result I had with square_it. I can sort of see what's
wrong. In the xx == 2 clause, xx acts as an atomic value of 2. But in
building the tibble, xx is a length-4 vector.
tibTest %>%
mutate(sq = square_it3(argX, argY)) %>%
unnest()
# # A tibble: 9 x 4
# argX argY x y
# <int> <int> <dbl> <dbl>
# 1 1 7 1 49
# 2 1 7 4 36
# 3 1 7 9 25
# 4 1 7 16 16
# 5 2 6 0 48
# 6 2 6 3 35
# 7 2 6 8 24
# 8 2 6 15 15
# 9 4 4 NA NA
How do I get the same result as I did with square_it, but from a vectorized function using case_when ?
We define row_case_when which has a similar formula interface as case_when except it has a first argument of .data, acts by row and expects that the value of each leg to be a data frame. It returns a data.frame/tibble. Wrapping in a list, rowwise and unnest are not needed.
case_when2 <- function (.data, ...) {
fs <- dplyr:::compact_null(rlang:::list2(...))
n <- length(fs)
if (n == 0) {
abort("No cases provided")
}
query <- vector("list", n)
value <- vector("list", n)
default_env <- rlang:::caller_env()
quos_pairs <- purrr::map2(fs, seq_along(fs), dplyr:::validate_formula,
rlang:::default_env, rlang:::current_env())
for (i in seq_len(n)) {
pair <- quos_pairs[[i]]
query[[i]] <- rlang::eval_tidy(pair$lhs, data = .data, env = default_env)
value[[i]] <- rlang::eval_tidy(pair$rhs, data = .data, env = default_env)
if (!is.logical(query[[i]])) {
abort_case_when_logical(pair$lhs, i, query[[i]])
}
if (query[[i]]) return(value[[i]])
}
}
row_case_when <- function(.data, ...) {
.data %>%
group_by(.group = 1:n(), !!!.data) %>%
do(case_when2(., ...)) %>%
mutate %>%
ungroup %>%
select(-.group)
}
Test run
It is used like this:
library(dplyr)
tibTest <- tibble(argX = 1:4, argY = 7:4) # test data from question
tibTest %>%
row_case_when(argX >= 4 ~ tibble(x = NA, y = NA),
argX == 3 ~ tibble(x = as.integer(), y = as.integer()),
argX == 2 ~ tibble(x = argX^2 - 1, y = argY^2 -1),
TRUE ~ tibble(x = argX^2, y = argY^2)
)
giving:
# A tibble: 3 x 4
argX argY x y
<int> <int> <dbl> <dbl>
1 1 7 1 49
2 2 6 3 35
3 4 4 NA NA
mutate_cond and mutate_when
These are not quite the same as row_case_when since they don't run through conditions taking the first true one but by using mutually exclusive conditions they can be used for certain aspects of this problem. They do not handle changing the number of rows in the result but we can use dplyr::filter to remove rows for a particular condition.
mutate_cond defined in dplyr mutate/replace several columns on a subset of rows is like mutate except the second argument is a condition and the subsequent arguments are applied only to rows for which that condition is TRUE.
mutate_when defined in
dplyr mutate/replace several columns on a subset of rows is similar to case_when except it applies to rows, the replacement values are provided in a list and alternate arguments are conditions and lists. Also all legs are always run applying the replacement values to the rows satisfying the conditions (as opposed to, for each row, performing the replacement on just the first true leg). To get a similar effect to row_case_when be sure that the conditions are mutually exclusive.
# mutate_cond example
tibTest %>%
filter(argX != 3) %>%
mutate(x = NA_integer_, y = NA_integer_) %>%
mutate_cond(argX == 2, x = argX^2 - 1L, y = argY^2 - 1L) %>%
mutate_cond(argX < 2, x = argX^2, y = argY^2)
# mutate_when example
tibTest %>%
filter(argX != 3) %>%
mutate_when(TRUE, list(x = NA_integer_, y = NA_integer_),
argX == 2, list(x = argX^2 - 1L, y = argY^2 - 1L),
argX < 2, list(x = argX^2, y = argY^2))
You need to ensure you are creating a 1-row tibble with each call of the function, then vectorize that.
This works whether you have rowwise groups or not.
You can do this with switch wrapped in a map2:
Here's a reprex:
library(tidyverse)
tibTest <- tibble(argX = 1:4, argY = 7:4)
square_it <- function(xx, yy) {
map2(xx, yy, function(x, y){
switch(which(c(x >= 4,
x == 3,
x == 2,
x < 4 & x != 3 & x != 2)),
tibble(x = NA, y = NA),
tibble(x = as.integer(), y = as.integer()),
tibble(x = x^2 - 1, y = y^2 -1),
tibble(x = x^2, y = y^2))})
}
tibTest %>% mutate(sq = square_it(argX, argY)) %>% unnest(cols = sq)
#> # A tibble: 3 x 4
#> argX argY x y
#> <int> <int> <dbl> <dbl>
#> 1 1 7 1 49
#> 2 2 6 3 35
#> 3 4 4 NA NA
Created on 2020-05-16 by the reprex package (v0.3.0)
Often I need to spread multiple value columns, as in this question. But I do it often enough that I'd like to be able to write a function that does this.
For example, given the data:
set.seed(42)
dat <- data_frame(id = rep(1:2,each = 2),
grp = rep(letters[1:2],times = 2),
avg = rnorm(4),
sd = runif(4))
> dat
# A tibble: 4 x 4
id grp avg sd
<int> <chr> <dbl> <dbl>
1 1 a 1.3709584 0.6569923
2 1 b -0.5646982 0.7050648
3 2 a 0.3631284 0.4577418
4 2 b 0.6328626 0.7191123
I'd like to create a function that returns something like:
# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
How can I do that?
We'll return to the answer provided in the question linked to, but for the moment let's start with a more naive approach.
One idea would be to spread each value column individually, and then join the results, i.e.
library(dplyr)
library(tidyr)
library(tibble)
dat_avg <- dat %>%
select(-sd) %>%
spread(key = grp,value = avg) %>%
rename(a_avg = a,
b_avg = b)
dat_sd <- dat %>%
select(-avg) %>%
spread(key = grp,value = sd) %>%
rename(a_sd = a,
b_sd = b)
> full_join(dat_avg,
dat_sd,
by = 'id')
# A tibble: 2 x 5
id a_avg b_avg a_sd b_sd
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
(I used a full_join just in case we run into situations where not all combinations of the join columns appear in all of them.)
Let's start with a function that works like spread but allows you to pass the key and value columns as characters:
spread_chr <- function(data, key_col, value_cols, fill = NA,
convert = FALSE,drop = TRUE,sep = NULL){
n_val <- length(value_cols)
result <- vector(mode = "list", length = n_val)
id_cols <- setdiff(names(data), c(key_col,value_cols))
for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}
result %>%
purrr::reduce(.f = full_join, by = id_cols)
}
> dat %>%
spread_chr(key_col = "grp",
value_cols = c("avg","sd"),
sep = "_")
# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
The key ideas here are to unquote the arguments key_col and value_cols[i] using the !! operator, and using the sep argument in spread to control the resulting value column names.
If we wanted to convert this function to accept unquoted arguments for the key and value columns, we could modify it like so:
spread_nq <- function(data, key_col,..., fill = NA,
convert = FALSE, drop = TRUE, sep = NULL){
val_quos <- rlang::quos(...)
key_quo <- rlang::enquo(key_col)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))
n_val <- length(value_cols)
result <- vector(mode = "list",length = n_val)
id_cols <- setdiff(names(data),c(key_col,value_cols))
for (i in seq_along(result)){
result[[i]] <- spread(data = data[,c(id_cols,key_col,value_cols[i]),drop = FALSE],
key = !!key_col,
value = !!value_cols[i],
fill = fill,
convert = convert,
drop = drop,
sep = paste0(sep,value_cols[i],sep))
}
result %>%
purrr::reduce(.f = full_join,by = id_cols)
}
> dat %>%
spread_nq(key_col = grp,avg,sd,sep = "_")
# A tibble: 2 x 5
id grp_avg_a grp_avg_b grp_sd_a grp_sd_b
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 -0.5646982 0.6569923 0.7050648
2 2 0.3631284 0.6328626 0.4577418 0.7191123
The change here is that we capture the unquoted arguments with rlang::quos and rlang::enquo and then simply convert them back to characters using tidyselect::vars_select.
Returning to the solution in the linked question that uses a sequence of gather, unite and spread, we can use what we've learned to make a function like this:
spread_nt <- function(data,key_col,...,fill = NA,
convert = TRUE,drop = TRUE,sep = "_"){
key_quo <- rlang::enquo(key_col)
val_quos <- rlang::quos(...)
value_cols <- unname(tidyselect::vars_select(names(data),!!!val_quos))
key_col <- unname(tidyselect::vars_select(names(data),!!key_quo))
data %>%
gather(key = ..var..,value = ..val..,!!!val_quos) %>%
unite(col = ..grp..,c(key_col,"..var.."),sep = sep) %>%
spread(key = ..grp..,value = ..val..,fill = fill,
convert = convert,drop = drop,sep = NULL)
}
> dat %>%
spread_nt(key_col = grp,avg,sd,sep = "_")
# A tibble: 2 x 5
id a_avg a_sd b_avg b_sd
* <int> <dbl> <dbl> <dbl> <dbl>
1 1 1.3709584 0.6569923 -0.5646982 0.7050648
2 2 0.3631284 0.4577418 0.6328626 0.7191123
This relies on the same techniques from rlang from the last example. We're using some unusual names like ..var.. for our intermediate variables in order to reduce the chances of name collisions with existing columns in our data frame.
Also, we're using the sep argument in unite to control the resulting column names, so in this case when we spread we force sep = NULL.
Spreading operations can also be done by unnesting a properly reformated table, here's an alternative using tidyverse :
# helper function that returns an horizontal one lined named tibble wrapped into a list
lhframe <- function(x,nms) list(setNames(as_tibble(t(x)),nms))
dat %>% group_by(id) %>%
summarize(avg = lhframe(avg,grp),
sd = lhframe(sd,grp)) %>%
unnest(.sep="_")
# # A tibble: 2 x 5
# id avg_a avg_b sd_a sd_b
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 -1.7631631 0.4600974 0.7595443 0.5664884
# 2 2 -0.6399949 0.4554501 0.8496897 0.1894739
Unfortunately the following doesn't work:
dat %>% group_by(id) %>%
summarize_at(vars(avg,sd),lhframe,grp) %>%
unnest(.sep="_")
Since tidyr version 1.0.0
tidyr::pivot_wider(data = dat, id_cols = id, names_from = grp, values_from = avg:sd)
# # A tibble: 2 x 5
# id avg_a avg_b sd_a sd_b
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1.37 -0.565 0.657 0.705
# 2 2 0.363 0.633 0.458 0.719
I have a feeling this is a pretty stupid issue, but I haven't been able to find the solution either
I have a tibble where each row is a sample and the first column is a character variable containing the sample ID and all subsequent columns are variables with numeric variables.
For example:
id <- c("a", "b", "c", "d", "e")
x1 <- rep(1,5)
x2 <- seq(1,5,1)
x3 <- rep(2,5)
x4 <- seq(0.1, 0.5, 0.1)
tb <- tibble(id, x1, x2, x3, x4)
I want to subset this to include only the columns with a sum greater than 5, and the id column. With the old dataframe structure, I know the following worked:
df <- as.data.frame(tb)
df2 <- cbind(df$id, df[,colSums(df[,2:5])>5)
colnames(df2)[1] <- "id"
However, when I try to subset this way with a tibble, I get the error message:
Error: Length of logical index vector must be 1 or 5, got: 4
Does anyone know how to accomplish this task without converting to the old data frame format? Preferably without creating an intermediate tibble with the id variable missing, because separating my ids from my data is just asking for trouble down the road.
Thanks!
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = 2, x4 = seq(.1, .5, len = 5))
### two additional examples of how to generate the Tibble data
### exploiting that its arguments are evaluated lazily and sequentially
# df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = x1 + 1, x4 = x2/10)
# df <- tibble(x2 = 1:5, id = letters[x2], x3 = 2, x1 = x3-1, x4 = x2/10) %>%
# select(id, num_range("x", 1:4))
base R solution, cf. HubertL's comment above,
### HubertL's base solution
df[c(TRUE,colSums(df[2:5])>5)]
#> # A tibble: 5 x 3
#> id x2 x3
#> <chr> <int> <dbl>
#> 1 a 1 2
#> 2 b 2 2
#> 3 c 3 2
#> 4 d 4 2
#> 5 e 5 2
dplyr solution, cf David Klotz's comment,
### Klotz's dplyr solution
library(dplyr)
df %>% select_if(function(x) is.character(x) || sum(x) > 5)
#> # A tibble: 5 x 3
#> id x2 x3
#> <chr> <int> <dbl>
#> 1 a 1 2
#> 2 b 2 2
#> 3 c 3 2
#> 4 d 4 2
#> 5 e 5 2