I want to be able to pass a function an undefined number of arguments via ... but also to be able to pass it a vector. Here is a silly example:
library(tidyverse)
df <- data.frame(gear = as.character(unique(mtcars$gear)),
id = 1:3)
myfun <- function(...) {
ids_lst <- lst(...)
df2 <- bind_rows(map(ids_lst, function(x)
mtcars %>%
filter(gear == x) %>%
select(mpg)), .id = "gear") %>%
left_join(df)
df2
}
#these all work:
myfun(3)
myfun(3, 4)
myfun(3, 4, 5)
Passing it a vector doesn't work though:
myvector <- unique(mtcars$gear)
myfun(myvector)
The problem is because of the way the function collects the arguments and how it returns them:
myfun_lst <- function(...) {
ids_lst <- lst(...)
ids_lst
}
myfun_lst(3, 4, 5)
# $`3`
# [1] 3
# $`4`
# [1] 4
# $`5`
# [1] 5
myfun_lst(myvector)
# $myvector
# [1] 4 3 5
I thought a fix would be to test if the input is a vector, something like:
myfun_final <- function(...) {
if(is.vector(...) & !is.list(...)) {
ids_lst <- as.list(...)
names(ids_lst) <- (...)
} else {
ids_lst <- lst(...)
}
df2 <- bind_rows(map(ids_lst, function(x)
mtcars %>%
filter(gear == x) %>%
select(mpg)), .id = "gear") %>%
left_join(df)
df2
}
Now, passing the function a vector works but collecting the arguments doesn't:
myfun_final(3, 4, 5)
myfun_final(myvector)
What is a good way to solve this?
Thanks
Of course you can change your function so that it will work with both, regular arguments myfun(3, 4, 5) and a vector myfun(myvector), as shown in the answer above.
Another option is that you make use of argument splicing by unquoting with the bang bang bang operator !!!. This operator is only supported in certain {rlang} and {tidyverse} functions. In your example you evaluate the dots ... inside purrr::map which supports argument splicing. Therefore there might not be the need to rewrite your function:
library(tidyverse)
# your original function:
myfun <- function(...) {
ids_lst <- lst(...)
df2 <- bind_rows(map(ids_lst, function(x)
mtcars %>%
filter(gear == x) %>%
select(mpg)), .id = "gear") %>%
left_join(df)
df2
}
myvector <- unique(mtcars$gear)
myfun(!!! myvector) # works
#> Joining, by = "gear"
#> gear mpg id
#> 1 4 21.0 1
#> 2 4 21.0 1
#> 3 4 22.8 1
#> 4 4 24.4 1
#> 5 4 22.8 1
#> 6 4 19.2 1
#> 7 4 17.8 1
#> 8 4 32.4 1
#> 9 4 30.4 1
#> 10 4 33.9 1
#> ...
myfun(3, 4, 5) # works
#> Joining, by = "gear"
#> gear mpg id
#> 1 3 21.4 2
#> 2 3 18.7 2
#> 3 3 18.1 2
#> 4 3 14.3 2
#> 5 3 16.4 2
#> 6 3 17.3 2
#> 7 3 15.2 2
#> 8 3 10.4 2
#> 9 3 10.4 2
#> 10 3 14.7 2
#> ...
Created on 2021-12-30 by the reprex package (v0.3.0)
You can read more about unquoting with the bang bang bang operator here.
Finally, you should think about the users of your function. If you are the only user then choose whatever suits you. In case there are other users you should think about how they expect the function to work. Probably users don't expect a function to work with several arguments and at the same time, alternatively, by providing those arguments in a vector. In the tidyverse argument splicing with !!! is a well established concept. In base R we would usually use do.call("myfun", as.list(myvector)) to achieve something similar.
To add another option:
The purrr package has a family of lift functions which can be used to alter the kind of arguments a function takes. The most prominent is lift_dl which transforms a function that takes dots as argument to a function that takes a list or vector as argument. This could also be used to solve the problem without the need to rewrite the function:
lift_dl(myfun)(myvector)
#> Joining, by = "gear"
#> gear mpg id
#> 1 4 21.0 1
#> 2 4 21.0 1
#> 3 4 22.8 1
#> 4 4 24.4 1
#> 5 4 22.8 1
#> 6 4 19.2 1
#> 7 4 17.8 1
#> 8 4 32.4 1
#> 9 4 30.4 1
#> 10 4 33.9 1
#> ...
Created on 2022-01-01 by the reprex package (v0.3.0)
How about testing if ... is of length 1 and if the only argument passed through is a vector? If not so, then consider ... a list of scalers and capture them with lst(...).
myfun_final <- function(...) {
if (...length() == 1L && is.vector(..1))
ids_lst <- `names<-`(..1, ..1)
else
ids_lst <- lst(...)
df2 <- bind_rows(map(ids_lst, function(x)
mtcars %>%
filter(gear == x) %>%
select(mpg)), .id = "gear") %>%
left_join(df)
df2
}
Test
> myfun_final(3)
Joining, by = "gear"
gear mpg id
1 3 21.4 2
2 3 18.7 2
3 3 18.1 2
4 3 14.3 2
5 3 16.4 2
6 3 17.3 2
7 3 15.2 2
8 3 10.4 2
9 3 10.4 2
10 3 14.7 2
11 3 21.5 2
12 3 15.5 2
13 3 15.2 2
14 3 13.3 2
15 3 19.2 2
> myfun_final(3,4,5)
Joining, by = "gear"
gear mpg id
1 3 21.4 2
2 3 18.7 2
3 3 18.1 2
4 3 14.3 2
5 3 16.4 2
6 3 17.3 2
7 3 15.2 2
8 3 10.4 2
9 3 10.4 2
10 3 14.7 2
11 3 21.5 2
12 3 15.5 2
13 3 15.2 2
14 3 13.3 2
15 3 19.2 2
16 4 21.0 1
17 4 21.0 1
18 4 22.8 1
19 4 24.4 1
20 4 22.8 1
21 4 19.2 1
22 4 17.8 1
23 4 32.4 1
24 4 30.4 1
25 4 33.9 1
26 4 27.3 1
27 4 21.4 1
28 5 26.0 3
29 5 30.4 3
30 5 15.8 3
31 5 19.7 3
32 5 15.0 3
> myfun_final(c(3,4,5))
Joining, by = "gear"
gear mpg id
1 3 21.4 2
2 3 18.7 2
3 3 18.1 2
4 3 14.3 2
5 3 16.4 2
6 3 17.3 2
7 3 15.2 2
8 3 10.4 2
9 3 10.4 2
10 3 14.7 2
11 3 21.5 2
12 3 15.5 2
13 3 15.2 2
14 3 13.3 2
15 3 19.2 2
16 4 21.0 1
17 4 21.0 1
18 4 22.8 1
19 4 24.4 1
20 4 22.8 1
21 4 19.2 1
22 4 17.8 1
23 4 32.4 1
24 4 30.4 1
25 4 33.9 1
26 4 27.3 1
27 4 21.4 1
28 5 26.0 3
29 5 30.4 3
30 5 15.8 3
31 5 19.7 3
32 5 15.0 3
Related
I'm looking to sequentially read in data and the transform it in two disparate scripts then combine the results into a list of dataframes:
library(tidyverse)
dat_list <- list(as_tibble(mtcars),as_tibble(mtcars),as_tibble(mtcars))
test_func <- function(x) {
dat <- x
gear_avg <- dat %>%
group_by(gear) %>%
summarize(value=mean(mpg))
carb_avg <- dat %>%
group_by(carb) %>%
summarize(value=mean(mpg))
df_list <- list(as_tibble(gear_avg),as_tibble(carb_avg))
return(df_list)
}
test_dat <- map_dfr(dat_list, test_func)
desired_output <-
list(
test_dat %>% filter(!is.na(gear)) %>% select(-carb),
test_dat %>% filter(!is.na(carb)) %>% select(-gear)
)
This is what I would expect to work but it just outputs a single dataframe.
Try using purrr::transpose:
map(transpose(test_dat), bind_rows)
From the purrr cheatsheet here is a little visual aid to understand what that function does:
Also, test_func does not return anything. So, in your reprex you should add the following as the last line: return(df_list)
Output
[[1]]
# A tibble: 9 x 2
gear value
<dbl> <dbl>
1 3 16.1
2 4 24.5
3 5 21.4
4 3 16.1
5 4 24.5
6 5 21.4
7 3 16.1
8 4 24.5
9 5 21.4
[[2]]
# A tibble: 18 x 2
carb value
<dbl> <dbl>
1 1 25.3
2 2 22.4
3 3 16.3
4 4 15.8
5 6 19.7
6 8 15
7 1 25.3
8 2 22.4
9 3 16.3
10 4 15.8
11 6 19.7
12 8 15
13 1 25.3
14 2 22.4
15 3 16.3
16 4 15.8
17 6 19.7
18 8 15
I'm trying to create a user created function to set the upper and lower values from 2 columns in data frame, but it doesn't work.
I'm using the mtcars data. I want to create a data frame from the function using the variables gear and carb. The function puts in the gear, carb, and mtcars as arguments. I want to the function to create a data frame in an object called mtcars_post_function_dataset that has the variables gear and carb such that gear has the smaller of the 2 variable values and carb has the larger of the 2 variable values. In addition, I want it to create 2 variables called gear_pre_storage and carb_pre_storage which has the original values of gear and carb from the original mtcars data frame.
When I tried to make this function, called function_set_upper_and_lower_range_values, and I try to run it with the data of interest, I get the follow output (which is not the desired result):
mtcars_post_function_dataset <-
function_set_upper_and_lower_range_values(gear, carb, mtcars)
> Error in as.name(lower_value_post) : object 'lower_value_post' not found
I'm not sure what to do. Please advise. Thanks ahead of time for any help.
Here is the code I used for the function:
# creates function_set_upper_and_lower_range_values
# ---- NOTE: creates function
function_set_upper_and_lower_range_values <-
# ---- NOTE: enter in 2 variables and dataset, function will set data so that lower_value >= upper_value for a given pair of values in 2 columns
# ---- NOTE: important for confidence interval / credible interval tests
# ---- NOTE: lower_value == lower value in range being evaluated
# ---- NOTE: upper_value == lower value in range being evaluated
# ---- NOTE: object to put function return should be data frame object
# ---- NOTE: function_range_check_specific == function name
function(lower_value, upper_value, dataset_name)
{
# ---- NOTE: turns function inputs into strings
lower_value_colmn <- deparse(substitute(lower_value))
upper_value_colmn <- deparse(substitute(upper_value))
nm1 <- deparse(substitute(dataset_name))
# ---- NOTE: # turns dataset_name into data frame
set_upper_and_lower_range_values_construction_funct_object_A <-
data.frame(
dataset_name
)
# ---- NOTE: adds column used for merging, and then turns data into data frame
set_upper_and_lower_range_values_construction_funct_object_A <- tibble::rowid_to_column(set_upper_and_lower_range_values_construction_funct_object_A, "merging_column")
# ---- NOTE: turns data into data frame
set_upper_and_lower_range_values_construction_funct_object_A <- data.frame(set_upper_and_lower_range_values_construction_funct_object_A)
# ---- NOTE: selects variables of interest
set_upper_and_lower_range_values_construction_funct_object_B <-
set_upper_and_lower_range_values_construction_funct_object_A %>%
select(
lower_value_colmn,
upper_value_colmn,
merging_column
)
# ---- NOTE: turns object into data frame
set_upper_and_lower_range_values_construction_funct_object_B <-
data.frame(set_upper_and_lower_range_values_construction_funct_object_B)
# ---- NOTE: # transforms values of interest into numeric form
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre <- as.numeric(as.character(set_upper_and_lower_range_values_construction_funct_object_B[[lower_value_colmn]]))
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre <- as.numeric(as.character(set_upper_and_lower_range_values_construction_funct_object_B[[upper_value_colmn]]))
# ---- NOTE: # sets data so lower_value_pre <= upper_value_pre
# ---- NOTE: ## creates storage variables
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage <- set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage <- set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre
# ---- NOTE: ## creates upper_value_colmn
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_post <-
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) > (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) < (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) == (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
NA
)))
# ---- NOTE: ## creates lower_value_post
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_post <-
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) > (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) < (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) == (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
NA
)))
# ---- NOTE: creates set_upper_and_lower_range_values_construction_funct_object_C to be used to combine variables of interest to main dataset
set_upper_and_lower_range_values_construction_funct_object_C <- data.frame(set_upper_and_lower_range_values_construction_funct_object_A)
# ---- NOTE: drops specific columns
set_upper_and_lower_range_values_construction_funct_object_C <-
dplyr::select(set_upper_and_lower_range_values_construction_funct_object_C, -c(lower_value_colmn, upper_value_colmn))
# ---- NOTE: selects specific data from set_upper_and_lower_range_values_construction_funct_object_B for combining dataset
set_upper_and_lower_range_values_construction_funct_object_D <-
merge(set_upper_and_lower_range_values_construction_funct_object_C,
set_upper_and_lower_range_values_construction_funct_object_B,
by.x = "merging_column",
by.y = "merging_column",
all.x = TRUE,
all.y = FALSE,
no.dups = TRUE)
# ---- NOTE: turns object into data frame
set_upper_and_lower_range_values_construction_funct_object_D <- data.frame(set_upper_and_lower_range_values_construction_funct_object_D)
# ---- NOTE: changes colnames
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(lower_value_post)] <- lower_value_colmn
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(upper_value_post)] <- upper_value_colmn
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(lower_value_pre_storage)] <- paste(lower_value_colmn, "pre_storage", sep = "_")
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(upper_value_pre_storage)] <- paste(upper_value_colmn, "pre_storage", sep = "_")
# ---- NOTE: returns proper variable
return(set_upper_and_lower_range_values_construction_funct_object_D)
}
In the function, the lines at the end are the problematic i.e. as.name is trying to convert an object that doesn't exist in the global env i.e lower_value_post, upper_value_post are column names in that set_upper_and_lower_range_values_construction_funct_object_D data.frame object.
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(lower_value_post)] <- lower_value_colmn
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(upper_value_post)] <- upper_value_colmn
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(lower_value_pre_storage)] <- paste(lower_value_colmn, "pre_storage", sep = "_")
names(set_upper_and_lower_range_values_construction_funct_object_D)[as.name(upper_value_pre_storage)] <- paste(upper_value_colmn, "pre_storage", sep = "_")
As the OP wanted to rename those specific columns, in base R an option would to be match the column name with that vector of column names to get the position index of column.
i1 <- match(c("lower_value_post", "upper_value_post", "lower_value_pre_storage", "upper_value_pre_storage"), names(set_upper_and_lower_range_values_construction_funct_object_D))
and use that index to extract those column names and assign (<-) with new vector of column names in that order
names(set_upper_and_lower_range_values_construction_funct_object_D)[i1] <-
c(lower_value_colmn, upper_value_colmn, paste0(c(lower_value_colmn, upper_value_colmn), "_pre_storage"))
-full function with changes made
function_set_upper_and_lower_range_values <-
# ---- NOTE: enter in 2 variables and dataset, function will set data so that lower_value >= upper_value for a given pair of values in 2 columns
# ---- NOTE: important for confidence interval / credible interval tests
# ---- NOTE: lower_value == lower value in range being evaluated
# ---- NOTE: upper_value == lower value in range being evaluated
# ---- NOTE: object to put function return should be data frame object
# ---- NOTE: function_range_check_specific == function name
function(lower_value, upper_value, dataset_name)
{
# ---- NOTE: turns function inputs into strings
lower_value_colmn <- deparse(substitute(lower_value))
upper_value_colmn <- deparse(substitute(upper_value))
nm1 <- deparse(substitute(dataset_name))
# ---- NOTE: # turns dataset_name into data frame
set_upper_and_lower_range_values_construction_funct_object_A <-
data.frame(
dataset_name
)
# ---- NOTE: adds column used for merging, and then turns data into data frame
set_upper_and_lower_range_values_construction_funct_object_A <- tibble::rowid_to_column(set_upper_and_lower_range_values_construction_funct_object_A, "merging_column")
# ---- NOTE: turns data into data frame
set_upper_and_lower_range_values_construction_funct_object_A <- data.frame(set_upper_and_lower_range_values_construction_funct_object_A)
# ---- NOTE: selects variables of interest
set_upper_and_lower_range_values_construction_funct_object_B <-
set_upper_and_lower_range_values_construction_funct_object_A %>%
select(
lower_value_colmn,
upper_value_colmn,
merging_column
)
# ---- NOTE: turns object into data frame
set_upper_and_lower_range_values_construction_funct_object_B <-
data.frame(set_upper_and_lower_range_values_construction_funct_object_B)
# ---- NOTE: # transforms values of interest into numeric form
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre <- as.numeric(as.character(set_upper_and_lower_range_values_construction_funct_object_B[[lower_value_colmn]]))
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre <- as.numeric(as.character(set_upper_and_lower_range_values_construction_funct_object_B[[upper_value_colmn]]))
# ---- NOTE: # sets data so lower_value_pre <= upper_value_pre
# ---- NOTE: ## creates storage variables
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage <- set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage <- set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre
# ---- NOTE: ## creates upper_value_colmn
set_upper_and_lower_range_values_construction_funct_object_B$upper_value_post <-
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) > (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) < (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) == (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
NA
)))
# ---- NOTE: ## creates lower_value_post
set_upper_and_lower_range_values_construction_funct_object_B$lower_value_post <-
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) > (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) < (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage,
ifelse(((set_upper_and_lower_range_values_construction_funct_object_B$upper_value_pre_storage) == (set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage)), set_upper_and_lower_range_values_construction_funct_object_B$lower_value_pre_storage,
NA
)))
# ---- NOTE: creates set_upper_and_lower_range_values_construction_funct_object_C to be used to combine variables of interest to main dataset
set_upper_and_lower_range_values_construction_funct_object_C <- data.frame(set_upper_and_lower_range_values_construction_funct_object_A)
# ---- NOTE: drops specific columns
set_upper_and_lower_range_values_construction_funct_object_C <-
dplyr::select(set_upper_and_lower_range_values_construction_funct_object_C, -c(lower_value_colmn, upper_value_colmn))
# ---- NOTE: selects specific data from set_upper_and_lower_range_values_construction_funct_object_B for combining dataset
set_upper_and_lower_range_values_construction_funct_object_D <-
merge(set_upper_and_lower_range_values_construction_funct_object_C,
set_upper_and_lower_range_values_construction_funct_object_B,
by.x = "merging_column",
by.y = "merging_column",
all.x = TRUE,
all.y = FALSE,
no.dups = TRUE)
# ---- NOTE: turns object into data frame
set_upper_and_lower_range_values_construction_funct_object_D <- data.frame(set_upper_and_lower_range_values_construction_funct_object_D)
# ---- NOTE: changes colnames
i1 <- match(c("lower_value_post", "upper_value_post", "lower_value_pre_storage", "upper_value_pre_storage"), names(set_upper_and_lower_range_values_construction_funct_object_D))
names(set_upper_and_lower_range_values_construction_funct_object_D)[i1] <-
c(lower_value_colmn, upper_value_colmn, paste0(c(lower_value_colmn, upper_value_colmn), "_pre_storage"))
# ---- NOTE: returns proper variable
return(set_upper_and_lower_range_values_construction_funct_object_D)
}
-testing
function_set_upper_and_lower_range_values(gear, carb, mtcars)
merging_column mpg cyl disp hp drat wt qsec vs am gear carb lower_value_pre upper_value_pre carb_pre_storage gear_pre_storage carb
1 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 4 4 4 4 4
2 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 4 4 4 4 4
3 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 1 1 4 4
4 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3 1 1 3 3
5 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 3 2 2 3 3
6 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3 1 1 3 3
7 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3 4 4 3 4
8 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 4 2 2 4 4
9 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 4 2 2 4 4
10 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 4 4 4 4 4
11 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 4 4 4 4 4
12 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 3 3 3 3 3
13 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 3 3 3 3 3
14 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 3 3 3 3 3
15 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 3 4 4 3 4
16 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 3 4 4 3 4
17 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 3 4 4 3 4
18 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 4 1 1 4 4
19 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 4 2 2 4 4
20 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 4 1 1 4 4
21 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 3 1 1 3 3
22 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 3 2 2 3 3
23 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 3 2 2 3 3
24 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 3 4 4 3 4
25 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 3 2 2 3 3
26 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 4 1 1 4 4
27 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 5 2 2 5 5
28 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 5 2 2 5 5
29 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 5 4 4 5 5
30 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 5 6 6 5 6
31 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 5 8 8 5 8
32 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 4 2 2 4 4
gear
1 4
2 4
3 1
4 1
5 2
6 1
7 3
8 2
9 2
10 4
11 4
12 3
13 3
14 3
15 3
16 3
17 3
18 1
19 2
20 1
21 1
22 2
23 2
24 3
25 2
26 1
27 2
28 2
29 4
30 5
31 5
32 2
>
I'm working with the mtcars dataset. I'm trying to create a new dataset which contains only the columns that are non-integer. To do this, I've tried several things, mainly trying to use lapply like so:
> newdata <- lapply(mtcars, Negate(is.integer))
But this doesn't return what I need. In the exercise, it was told that which() can help us do this, so I ask - how?
Thanks
You could use Filter to select columns which are not integer.
Filter(Negate(is.integer), mtcars)
#Or explicitly mentioning non-integers
#Filter(function(x) !is.integer(x), mtcars)
In dplyr, we can use select_if
library(dplyr)
mtcars %>% select_if(Negate(is.integer))
#mtcars %>% select_if(~!is.integer(.))
You can do:
iris[sapply(iris, Negate(is.factor))]
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
Here, for illustration, I'm using the iris dataset and filtering out factors.
To select those columns that appear as "integer" (according to your comment) you could select columns where all as.integer values correspond to the original ones using sapply.
mtcars[sapply(mtcars, function(i) all(as.integer(i) - i == 0))]
# cyl hp vs am gear carb
# Mazda RX4 6 110 0 1 4 4
# Mazda RX4 Wag 6 110 0 1 4 4
# Datsun 710 4 93 1 1 4 1
# Hornet 4 Drive 6 110 1 0 3 1
# Hornet Sportabout 8 175 0 0 3 2
# Valiant 6 105 1 0 3 1
# Duster 360 8 245 0 0 3 4
# Merc 240D 4 62 1 0 4 2
# Merc 230 4 95 1 0 4 2
# Merc 280 6 123 1 0 4 4
# Merc 280C 6 123 1 0 4 4
# Merc 450SE 8 180 0 0 3 3
# Merc 450SL 8 180 0 0 3 3
# Merc 450SLC 8 180 0 0 3 3
# Cadillac Fleetwood 8 205 0 0 3 4
# Lincoln Continental 8 215 0 0 3 4
# Chrysler Imperial 8 230 0 0 3 4
# Fiat 128 4 66 1 1 4 1
# Honda Civic 4 52 1 1 4 2
# Toyota Corolla 4 65 1 1 4 1
# Toyota Corona 4 97 1 0 3 1
# Dodge Challenger 8 150 0 0 3 2
# AMC Javelin 8 150 0 0 3 2
# Camaro Z28 8 245 0 0 3 4
# Pontiac Firebird 8 175 0 0 3 2
# Fiat X1-9 4 66 1 1 4 1
# Porsche 914-2 4 91 0 1 5 2
# Lotus Europa 4 113 1 1 5 2
# Ford Pantera L 8 264 0 1 5 4
# Ferrari Dino 6 175 0 1 5 6
# Maserati Bora 8 335 0 1 5 8
# Volvo 142E 4 109 1 1 4 2
If the which is mandatory you could built it in like so:
mtcars[which(sapply(mtcars, function(i) all(as.integer(i) - i == 0)))]
We can use negate from purrr
library(purrr)
library(dplyr)
mtcars %>%
select_if(negate(is.integer))
I hope I can explain what I'm trying to do sufficiently. I'm working in R and for a dataset I'm trying to keep only observations where for one variable, another variable satisfies two conditions.
Specifically, I want to keep only rows where for a particular "cyl", there is at least one mpg value >20, and at least one <20. Here is some example data from mtcars similar to what I'm working with.
mpg cyl
1 21.0 6
2 21.0 6
3 22.8 4
4 21.4 6
5 18.7 8
6 18.1 6
7 14.3 8
8 24.4 4
9 22.8 4
10 19.2 6
11 17.8 6
12 16.4 8
13 17.3 8
14 15.2 8
15 10.4 8
16 10.4 8
17 14.7 8
18 32.4 4
19 30.4 4
20 33.9 4
Ideally, my output for the above example would be what's below.
mpg cyl
1 21.0 6
2 21.0 6
4 21.4 6
6 18.1 6
10 19.2 6
11 17.8 6
Thanks in advance!
Assuming your dataframe input is DF, try this:
library(dplyr)
DF %>%
group_by(cyl) %>%
filter(sum(mpg > 20) > 1 & sum(mpg < 20) > 1)
# A tibble: 7 x 2
# Groups: cyl [1]
# mpg cyl
# <dbl> <dbl>
# 1 21 6
# 2 21 6
# 3 21.4 6
# 4 18.1 6
# 5 19.2 6
# 6 17.8 6
# 7 19.7 6
data
DF <- mtcars[,1:2]
I tried using the code presented here to find ALL duplicated elements with dplyr like this:
library(dplyr)
mtcars %>%
mutate(cyl.dup = cyl[duplicated(cyl) | duplicated(cyl, from.last = TRUE)])
How can I convert code presented here to find ALL duplicated elements with dplyr? My code above just throws an error? Or even better, is there another function that will achieve this more succinctly than the convoluted x[duplicated(x) | duplicated(x, from.last = TRUE)]) approach?
I guess you could use filter for this purpose:
mtcars %>%
group_by(carb) %>%
filter(n()>1)
Small example (note that I added summarize() to prove that the resulting data set does not contain rows with duplicate 'carb'. I used 'carb' instead of 'cyl' because 'carb' has unique values whereas 'cyl' does not):
mtcars %>% group_by(carb) %>% summarize(n=n())
#Source: local data frame [6 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
#5 6 1
#6 8 1
mtcars %>% group_by(carb) %>% filter(n()>1) %>% summarize(n=n())
#Source: local data frame [4 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
Another solution is to use janitor package:
mtcars %>% get_dupes(wt)
We can find duplicated elements with dplyr as follows.
library(dplyr)
# Only duplicated elements
mtcars %>%
filter(duplicated(.[["carb"]])
# All duplicated elements
mtcars %>%
filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))
The original post contains an error in using the solution from the related answer. In the example given, when you use that solution inside mutate, it tries to subset the cyl vector which will not be of the same length as the mtcars dataframe.
Instead you can use the following example with filter returning all duplicated elements or mutate with ifelse to create a dummy variable which can be filtered upon later:
library(dplyr)
# Return all duplicated elements
mtcars %>%
filter(duplicated(cyl) | duplicated(cyl, fromLast = TRUE))
# Or for making dummy variable of all duplicated
mtcars %>%
mutate(cyl.dup =ifelse(duplicated(cyl) | duplicated(cyl, fromLast = TRUE), 1,0))
# Adding a shortcut to the answer above
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mtcars %>% count(carb)
#> # A tibble: 6 x 2
#> carb n
#> <dbl> <int>
#> 1 1. 7
#> 2 2. 10
#> 3 3. 3
#> 4 4. 10
#> 5 6. 1
#> 6 8. 1
mtcars %>% count(carb) %>% filter(n > 1)
#> # A tibble: 4 x 2
#> carb n
#> <dbl> <int>
#> 1 1. 7
#> 2 2. 10
#> 3 3. 3
#> 4 4. 10
# Showing an alternative that follows the apparent intention if the asker
duplicated_carb <- mtcars %>%
mutate(dup_carb = duplicated(carb)) %>%
filter(dup_carb)
duplicated_carb
#> mpg cyl disp hp drat wt qsec vs am gear carb dup_carb
#> 1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
#> 2 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 TRUE
#> 3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
#> 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
#> 5 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
#> 6 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 TRUE
#> 7 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 TRUE
#> 8 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 TRUE
#> 9 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 TRUE
#> 10 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 TRUE
#> 11 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 TRUE
#> 12 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 TRUE
#> 13 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 TRUE
#> 14 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 TRUE
#> 15 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 TRUE
#> 16 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 TRUE
#> 17 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 TRUE
#> 18 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 TRUE
#> 19 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 TRUE
#> 20 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 TRUE
#> 21 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 TRUE
#> 22 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 TRUE
#> 23 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 TRUE
#> 24 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 TRUE
#> 25 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 TRUE
#> 26 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 TRUE
You can create a Boolean mask with duplicated():
iris %>% duplicated()
[1] FALSE FALSE FALSE .... TRUE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE
And pass through square brackets indexing:
iris[iris %>% duplicated(),]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
143 5.8 2.7 5.1 1.9 virginica
Note: This approach is the closest thing to Pandas
that could be done with R and dplyr:
iris[iris %>% duplicated(), c("Petal.Length","Petal.Width","Species")]
Petal.Length Petal.Width Species
143 5.1 1.9 virginica
A more general solution if you want to group duplicates using many columns
df%>%
select(ID,COL1,COL2,all_of(vector_of_columns))%>%
distinct%>%
ungroup%>%rowwise%>%
mutate(ID_GROUPS=paste0(ID,"_",cur_group_rows()))%>%
ungroup%>%
full_join(.,df,by=c("INFO_ID","COL1","COL2",vector_of_columns))->chk
Find duplicate value in data frame with column
df<-dataset[duplicated(dataset$columnname),]