I want to select items by index from a list before applying another function to it using purrr:map. I have tried the following, but can't find a way that works.
require(dplyr)
require(purrr)
dat <- list(1:3,
4:6,
letters[1:3])
# I can select one item
dat[1]
# I can select two items
dat[c(1,2)]
# But how can I do this in a pipeline by index?
dat %>% map(mean)
dat %>%
filter(c(1,2)) %>%
map(mean)
dat %>%
keep(1,2) %>%
map(mean)
dat %>%
select(1,2) %>%
map(mean)
We can use `[` and do
dat %>%
.[c(1, 2)] %>%
map(., mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or define an alias in the way the magrittr package does it
extract <- `[` # literally the same as magrittr::extract
dat %>%
extract(c(1, 2)) %>%
map(., mean)
Which could also be written as
dat %>% `[`(c(1,2))
Using baseR pipe operator this would read
dat |>
`[`(x = _, j = c(1,2)) |> # R version >= R 4.2.0
lapply(mean)
#[[1]]
#[1] 2
#
#[[2]]
#[1] 5
An option is
library(tidyverse)
keep(dat, seq_along(dat) %in% 1:2) %>%
map(mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or map with pluck
map(1:2, ~ pluck(dat, .x) %>%
mean)
Or with assign_in
assign_in(dat, 3, NULL) %>%
map(mean)
Or another option is map_if
map_if(dat, is.numeric, mean, .else = ~ NULL) %>%
discard(is.null)
Or with discard
discard(dat, is.character) %>%
map(mean)
or with Filter and map
Filter(is.numeric, dat) %>%
map(mean)
NOTE: All of them gets the expected output.
Related
I know for a df, I can do df<-df %>% modify_if(is.POSIXt, as.character). If I have a list of df lst, how can I do this through every df in lst? I know I probably need to use map or lapply, but I am not sure how. Could someone give me some guidance?
Thanks.
If we have a list of datasets and wanted to change the column type and return the same data.frame format
library(purrr)
library(dplyr)
library(lubridate)
map(lst, ~ .x %>%
mutate(across(where(is.POSIXt), as.character)))
Or with inherits
map(lst1, ~ .x %>%
mutate(across(where(~ inherits(., 'POSIXct')), as.character )))
You're on the right track:
new.lst <- lapply( lst, \(x) x %>% modify_if( is.POSIXt, as.character) )
For ancient versions of R:
new.lst <- lst %>% map( ~ .x %>% modify_if( is.POSIXt, as.character) )
Or lapply with function(x):
new.lst <- lapply( lst, function(x) x %>% modify_if( is.POSIXt, as.character) )
With minimal data to demonstrate:
library(lubridate)
lst <- list( data.frame(x=now(),y=1), data.frame(z=now()-days(2),t=3) )
lst[[1]]$x
new.lst <- lapply( lst, \(x) x %>% modify_if( is.POSIXt, as.character) )
new.lst[[1]]$x
Output:
> lst[[1]]$x
[1] "2021-05-20 00:48:34 CEST"
> new.lst[[1]]$x
[1] "2021-05-20 00:48:34"
(the CEST part shows that the first output is a date)
it may seem silly, or that the answer is in front of my nose, but I'm not sure how to make this function behave properly.
I want to write a function called loop, which takes a data frame df, group splits it, and then to each of these data frames, it runs some function called test.
On the other hand, test takes as its a dataset df and a row index i as its arguments and returns a list.
For the sake of exposition, assume this:
test <- function(i, df){
df$V2[1:i]
}
Afterward, loop does some other things that don't really matter for this question.
Here is what I've tried to do:
loop1 <- function(df){
df1 <- df %>%
group_split(UF)
x <- df1 %>%
map(~ .x %>%
nrow() %>%
seq())
z <- map2(x, df1, ~ .x %>% map(~ .x %>% test(df = .y)))
return(z)
}
If I were to run test for only one data frame d, I would do map(1:nrow(df), ~ test(., df = d))
Unfortunately, my function loop1 is not working. How could I adjust it so that it runs testfor each row of each dataset in df1?
Here is an example:
df <- data.frame(UF = c(1, 1, 2, 2, 3, 4, 4, 4),
V2 = 1:8)
And what output I expect as a result:
list(list(1, 1:2),
list(3, 3:4),
list(5),
list(6, 6:7, 6:8)
)
In this line
z <- map2(x, df1, ~ .x %>% map(~ .x %>% test(df = .y)))
the scope/usage of .x is confusing. Use an anonymous function to clear the scope or check the below function with minimal change in your code.
library(dplyr)
library(purrr)
loop1 <- function(df){
df1 <- df %>% group_split(UF)
x <- df1 %>% map(~ .x %>% nrow() %>% seq())
z <- map2(x, df1, ~map(.x, test, df = .y))
return(z)
}
loop1(df)
which returns :
#[[1]]
#[[1]][[1]]
#[1] 1
#[[1]][[2]]
#[1] 1 2
#[[2]]
#[[2]][[1]]
#[1] 3
#[[2]][[2]]
#[1] 3 4
#[[3]]
#[[3]][[1]]
#[1] 5
#[[4]]
#[[4]][[1]]
#[1] 6
#[[4]][[2]]
#[1] 6 7
#[[4]][[3]]
#[1] 6 7 8
I want to make a bunch of new variables a,b,c,d.....z to store tibble data frames. I will then rbind the new variables that store tibble data frames and export them as a csv. How do I do this faster without having to specify the new variables each time?
a<- subset(data.frame, variable1="condition1",....,) %>% group_by() %>% summarize( a=mean())
b<-subset(data.frame, variable1="condition2",....,) %>% group_by() %>% summarize( a=mean())
....
z<-subset(data.frame, variable1="condition2",....,) %>% group_by() %>% summarize( a=mean())
rbind(a,b,....,z)
There's got to be a faster way to do this. My data set is large so having it stored in memory as partitions of a,b,c,....z is causing the computer to crash. Typing the subset conditions to form the partitions repeatedly is tedious.
You could do something like this using purrr package:
You may need to use NSE depends on what's your condition. You can reference Programming with dplyr
purrr::map_df(
c("condition1","condition2",..., "conditionn"),
# .x for each condition
~ subset(your_data_frame, variable1=.x,....,) %>% group_by(some_columns) %>% summarise(a = mean(some_columns))
)
Example using iris:
library(rlang)
conditions <- c("Petal.Length>1.5","Species == 'setosa'","Sepal.Length > 5")
map(conditions, function(x){
iris %>%
dplyr::filter(!!rlang::parse_expr(x)) %>%
head()
})
Example using iris:
conditions <- c("Petal.Length>1.5","Species == 'setosa'","Sepal.Length > 5")
map(conditions, ~ iris %>% dplyr::filter(!!rlang::parse_expr(.x)) %>% nrow())
# or (!! is almost equivalent to eval or rlang::eval_tidy())
map(conditions, ~ iris %>% dplyr::filter(eval(rlang::parse_expr(.x))) %>% nrow())
[[1]]
[1] 113
[[2]]
[1] 50
[[3]]
[1] 118
Instead of creating multiple objects in the global environemnt, rread them in a list, and bind it
library(data.table)
files <- list.files(pattern = "\\.csv", full.names = TRUE)
rbindlist(lapply(files, fread))
It would be much faster with fread than in any other option
If we are using strings to be passed onto group_by, convert the string to symbol with sym from rlang and evaluate (!!)
library(purrr)
map2_df(c("condition1", "condition2"), c("a", "b") ~ df1 %>%
group_by(!! rlang::sym(.x)) %>%
summarise(!! .y := mean(colname)))
If the 'condition1', 'condition2' etc are expressions, place it as quosure and evaluate it
map2_df(quos(condition1, condition2), c("a", "b"), ~ df1 %>%
filter(!! .x) %>%
summarise(!! .y := mean(colnames)))
Using a reproducible example
conditions <- quos(Petal.Length>1.5,Species == 'setosa',Sepal.Length > 5)
map2(conditions, c('a', 'b', 'c'), ~
iris %>%
filter(!! .x) %>%
summarise(!! .y := mean(Sepal.Length)))
#[[1]]
# a
#1 6.124779
#[[2]]
# b
#1 5.006
#[[3]]
# c
#1 6.129661
It would be a 3 column dataset if we use map2_dfc
NOTE: It is not clear whether the OP meant 'condition1', 'condition2' as expressions to be passed on for filtering the rows or not.
I have a structure on which I want to apply a function, but cannot get it right using purrr::map.
There are two nested dataframes within a list. Function needs to be applied on all of the elements of the nested dataframes. To reproduce the data structure:
df1 <- data.frame(a = c(1,1,2,2,3,3),
b = c(1,2,3,4,5,6))
df1 <- df1 %>%
group_by(a) %>%
nest()
df2 <- data.frame(m = c(1,1,1,2,3,3),
n = c(6:11))
df2 <- df2 %>%
group_by(m) %>%
nest()
ls1 <- list(df1,df2)
Simple function like mean or max can be used:
f1 <- function(x) {
x %>%
unnest() %>%
summarise(b = sum(b))
}
ls2 <- ls1 %>% map(~ .x, f1)
this doesn't manage to do the job. Ideas to solve this with "purrr" are ideal but any is welcome.
I don't know if this is the best solution, but it should do the job:
library(purrr)
map(ls1, function(x) {
map(x, mean)
})
# [[1]]
# [[1]]$a
# [1] 2
#
# [[1]]$b
# [1] 3.5
#
#
# [[2]]
# [[2]]$m
# [1] 1.833333
#
# [[2]]$n
# [1] 8.5
Basically I nested two map as you can see. Remember that purrr gives you the ability to better control the output with some variants like map_df or map_dbl, unlike some *apply.
I am trying to apply a custom function to a data.frame row by row, but I can't figure out how to apply the function row by row. I'm trying rowwise() as in the simple artificial example below:
library(tidyverse)
my_fun <- function(df, col_1, col_2){
df[,col_1] + df[,col_2]
}
dff <- data.frame("a" = 1:10, "b" = 1:10)
dff %>%
rowwise() %>%
mutate(res = my_fun(., "a", "b"))
How ever the data does not get passed by row. How can I achieve that?
dplyr's rowwise() puts the row-output (.data) as a list of lists, so you need to use [[. You also need to use .data rather than ., because . is the entire dff, rather than the individual rows.
my_fun <- function(df, col_1, col_2){
df[[col_1]] + df[[col_2]]
}
dff %>%
rowwise() %>%
mutate(res = my_fun(.data, 'a', 'b'))
You can see what .data looks like with the code below
dff %>%
rowwise() %>%
do(res = .data) %>%
.[[1]] %>%
head(1)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 1