How to dynamically name files in lapply? - r

Here's some sample code:
dat <- data.frame(test1=runif(100,0,1),
test2=runif(100,0,1),
test3=runif(100,0,1))
variable_of_interest <- "test1"
dat_multi <- 3
test_save <- function(dat, x_seq) {
saveRDS(dat, file=paste0("data/", variable_of_interest,"_", x_seq,".RDS"))
}
test_func <- function(dat, dat_multi,x) {
res <- as.data.frame(as.matrix(dat*dat_multi))
dat_test <- rbind(melt(res$test1) %>% mutate(Var = "First_Testing_1"),
melt(res$test2) %>% mutate(Var = "Different_Testing_2"),
melt(res$test3) %>% mutate(Var = "Very_Different_Testing_3"))
test_save(dat_test,x)
}
x_seq <- 1:3
lapply(x_seq, function(x) test_func(dat, dat_multi, x))
Just looking to save files with the variable_of_interest and the iteration in the filename:
data/test1_1.RDS
data/test1_2.RDS
data/test1_3.RDS

The loop can be
lapply(x_seq, function(x) test_save(dat, x))
It is better to use a different lambda name than the object name already created in the global env. Also, if 'dat' is the same, then the lambda function can have a single argument
In the updated function, there are some issues in the code i.e. $ is used for extraction on a matrix object. Instead, it would be [ . It can be made compact with
test_func <- function(dat, dat_multi, x) {
res <- as.matrix(dat*dat_multi)
names(res) <- paste0("Testing_", seq_len(ncol(res)))
test_save(melt(res), x)
}
lapply(x_seq, function(x) test_func(dat, dat_multi, x))

Related

Error with tidy select when feeding column names into purrr::map for user function

I have a long function that uses a dataframe column name as an input and am trying to apply it to several different column names without a new line of code each time. I am having issues with tidyselect within the function called by map. I believe the issue is related to defusing, but I cannot figure it out. A toy example using mtcars data is below.
This works correctly with map:
library(tidyverse)
sum_dplyr <- function(df, x) {
res <- df %>% summarise(mean = mean({{x}}, na.rm = TRUE))
return(res)
}
sum_dplyr(mtcars, disp)
map(names(mtcars), ~ sum_dplyr(mtcars, mtcars[[.]])) # all columns -> works fine
While this gives the error "Must subset columns with a valid subscript vector" when feeding the function through map:
library(tidyverse)
sel_dplyr <- function(df, x) {
res <- df %>% dplyr::select({{x}})
return(res)
}
sel_dplyr(mtcars, disp) # ok
map(names(mtcars), ~ sel_dplyr(mtcars, mtcars[[.]])) # all columns -> error
What am I missing here ? Many thanks !
It may be better to correct the function to make sure that it takes both unquoted and quoted. With map, we are passing a character string. So, instead of {{}}, can use ensym with !!
sum_dplyr <- function(df, x) {
x <- rlang::ensym(x)
res <- df %>%
summarise(mean = mean(!!x, na.rm = TRUE))
return(res)
}
Similarly for sel_dplyr
sel_dplyr <- function(df, x) {
x <- rlang::ensym(x)
res <- df %>%
dplyr::select(!! x)
return(res)
}
and then test as
library(purrr)
library(dplyr)
map(names(mtcars), ~ sel_dplyr(mtcars, !!.x))
sel_dplyr(mtcars, carb)

I want to express this code with for loop or function

I have a large data frame.
As you can see, a pattern exists code below:
data_1<-data_1
data_2<-data_2 %>% filter(rowSums(data_2[,1:1])==0)
data_3<-data_3 %>% filter(rowSums(data_3[,1:2])==0)
data_4<-data_4 %>% filter(rowSums(data_4[,1:3])==0)
data_5<-data_5 %>% filter(rowSums(data_5[,1:4])==0)
data_6<-data_6 %>% filter(rowSums(data_6[,1:5])==0)
data_7<-data_7 %>% filter(rowSums(data_7[,1:6])==0)
data_8<-data_8 %>% filter(rowSums(data_8[,1:7])==0)
data_9<-data_9 %>% filter(rowSums(data_9[,1:8])==0)
data_10<-data_10 %>% filter(rowSums(data_10[,1:9])==0)
data_11<-data_11 %>% filter(rowSums(data_11[,1:10])==0)
data_12<-data_12 %>% filter(rowSums(data_12[,1:11])==0)
data_13<-data_13 %>% filter(rowSums(data_13[,1:12])==0)
data_14<-data_14 %>% filter(rowSums(data_14[,1:13])==0)
data_15<-data_15 %>% filter(rowSums(data_15[,1:14])==0)
data_16<-data_16 %>% filter(rowSums(data_16[,1:15])==0)
data_17<-data_17 %>% filter(rowSums(data_17[,1:16])==0)
data_18<-data_18 %>% filter(rowSums(data_18[,1:17])==0)
data_19<-data_19 %>% filter(rowSums(data_19[,1:18])==0)
data_20<-data_20 %>% filter(rowSums(data_20[,1:19])==0)
data_21<-data_21 %>% filter(rowSums(data_21[,1:20])==0)
I tried to make loop like this
for(i in 1:21){
data_i <- data_i %>% filter(rowSums(data_i[,1:i-1])==0)
but, data_i is far away from my intention.
how do I solve this problem?
1) for We use the test data in the Note at the end based on the built in anscombe data frame that comes with R. It is best to keep related data frames in a list so we first create such a list L and then iterate over it producing a new list L2 so that we don't overwrite the original list. Keeping the input and output separate makes it easier to debug.
We could alternately write seq_along(L)[-1] as seq(2, length(L)) and we could alternately write seq_len(i-1) as seq(1, i-1). Note that if DF is a data frame then DF[, 1] is the first column as a column vector but DF[, 1, drop = FALSE] is a one column data frame.
No packages are used.
L <- mget(ls(pattern = "^data_\\d+$"))
L2 <- L
for(i in seq_along(L)[-1]) {
Li <- L[[i]]
Sum <- rowSums(Li[, seq_len(i-1), drop = FALSE])
L2[[i]] <- Li[Sum == 0, ]
}
2) lapply Alternately we could use lapply:
L <- mget(ls(pattern = "^data_\\d+$"))
L2 <- L
L2[-1] <- lapply(seq_along(L)[-1], function(i) {
Li <- L[[i]]
Sum <- rowSums(Li[, seq_len(i-1), drop = FALSE])
Li[Sum == 0, ]
})
3) Map or use Map
L3 <- L
f3 <- function(d, i) {
Sum <- rowSums(d[, seq_len(i-1), drop = FALSE])
d[Sum == 0, ]
}
L3[-1] <- Map(f3, L[-1], seq_along(L)[-1])
or special case the first element like this. Note that it will take the component names from the first argument to Map after the function so it is important that f4 be defined so that that argument is L.
f4 <- function(d, i) {
if (i == 1) d
else {
Sum <- rowSums(d[, seq_len(i-1), drop = FALSE])
d[Sum == 0, ]
}
}
L4 <- Map(f4, L, seq_along(L))
Note
# create test data
data_1 <- anscombe
data_1[1, 1] <- 0
data_2 <- 10 * anscombe
data_2[2, 1:2] <- 0
data_3 <- 100 * anscombe
data_3[3, 1:3] <- 0

Dplyr indirection / pipe doesn't work inside a closure

I have a code which uses dplyr indirection:
library(dplyr)
createGenerator <- function(data, column)
{
values <- data %>% pull({{column}})
function(n)
{
values %>% sample(n)
}
}
df <- data.frame(x = 1:10, y = 1:10)
df %>% createGenerator(x)(1)
It gives me an error
Error in pull(., { : object 'x' not found
However if I don't create a closure it works, like in code below
createGenerator <- function(data, column, n)
{
values <- data %>% pull({{column}}) %>% sample(n)
}
But I need a possibility to create a closure. What am I missing in closure creation code?
There is a problem with the pipes, specifically the pipe within the enclosed function. I guess there might be a scoping problem, as you are dealing with different environments and also promises rather than existing objects.
No pipe (which I personally prefer, but I guess that's taste)
library(dplyr)
createGenerator <- function(data, column) {
values <- pull(data, {{ column }})
function(n) {
sample(values, n)
}
}
df <- data.frame(x = 1:10, y = 1:10)
createGenerator(df, x)(2)
#> [1] 4 5
or you create values within the enclosed function. Then the pipe works.
createGenerator <- function(data, column) {
function(n) {
values <- data %>% pull({{column}})
values %>% sample(n)
}
}
createGenerator(df, x)(2)
#> [1] 7 5

FUN == 'x' does not work, how to go around it in R

I am trying to write a function that uses some other function FUN as an argument - in this case, I want to (among other things), alter what to do if I set FUN = match0.
library(dplyr)
library(purrr)
f <- function(df, pair, FUN, ...){
df1 <- df %>%
group_split()
w <- df1 %>%
map(~ .x %>%
nrow() %>%
seq())
x <- map2(w, df1, ~map(.x, mean, df = .y))
y <- map(x, unlist)
l <- map2(y, df1, ~map(.x, function(x, df = .y){
if(deparse(substitute(FUN)) == 'match0'){
out <- x
} else{
out <- df[x, pair]}
return(out)
}
)) %>% unlist()
df <- bind_rows(df1) %>% bind_cols(index = l)
return(df)
}
If I run, for instance:
a <- data.frame(n = c(15,20,15,20,15,20)) %>% group_by(n)
x <- f(a, pair = 'pairs0', FUN = match0)
I get Column 'pairs0' doesn't exist.
This would be the case if, in fact, the conditional statement evaluated to FALSE. How can I change this?
To be honest, I'm not quite sure to use deparse, substitute and the like, I've just tried to follow some other posts. FWIW, I thought it would work because if I test deparse(substitute(match0)) == 'match0', I get TRUE.
Any help?

summary on list with all the info you need using map

I have a list with 6 elements and need some summary statistics. Each element has a different number of observations but all have the same 9 variables. They were created using split, and thus each element is a different treatment. I want to create a table with: n, min, max, mean, SD, 5th quantile, and 95th quantile).
I am using the purrr package for the function map() which is similar to lapply() but its easier to debug.
From: lapply(test,summary)
I can get min, max, mean
n, and sd I was able to get with the following:
mdl_summary= map(test,col_summary, sd) %>%
lapply(., setNames, nm=colnames %>%
map(.,t) %>%
lapply(., function(x) {
row.names(x)=deparse(substitute(sd))
return (x)
})
with col_summary being another function code:
col_summary <- function(df, fun) {
output <- vector("numeric", length(df))
for (i in seq_along(df)) {
output[[i]] <- fun(df[[i]])
}
output
}
But I cant get the quantiles...
I also tried to generalize mdl_summary into mdl_summary_fun but can't get the rowname to say the function it ran.
mdl_summary_fun= function (x,f)
map(x,col_summary, f) %>%
lapply(., setNames, nm=colnames(mdl$Statistics[2:10])) %>%
map(.,t) %>%
lapply(., function(x) {
row.names(x)=deparse(substitute(f))
return (x)
})
I just had to modify the col_summary function I created to include extra arguments:
col_summary <- function(df, fun, ...) {
output <- vector("numeric", length(df))
for (i in seq_along(df)) {
output[[i]] <- fun(df[[i]], ...)
}
output
}

Resources