Magritttr + lapply where first argument isn't to LHS [duplicate] - r

This question already has answers here:
Use pipe without feeding first argument
(2 answers)
Closed 6 years ago.
I'd like to pass a data frame into lapply via %>%, but I need to be able to access the names of the columns, so my lapply arguments are something like this:
mydf %>%
lapply( 1:length(.), function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})
However, when I try that, I get the following error:
Error in match.fun(FUN) :
'1:length(.)' is not a function, character or symbol
As far as I can tell R and lapply don't like 1:length(.). I suppose a valid option is breaking the chain, but I'd like to learn how to do it properly.

Your issue here is that %>% is inserting mydf as the first argument (so that three arguments are getting passed to lapply. Try wrapping the entire lapply expression in brackets. This prevents the insertion behavior:
mydf %>%
{ lapply( 1:length(.), function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
}) }
I think the prettiest fix would be to make a new function:
manipulate_whole_df = function(mydf)
lapply( 1:length(mydf), function(x)
manipulate_df( mydf[x], using_column_names(names(mydf)[x] ) ) )
mydf %>%
manipulate_whole_df
Or even
library(tidyr)
mydf %>%
gather(variable, value) %>%
group_by(variable) %>%
do(manipulate_df(.$value,
.$variable %>% first %>% using_column_name ) )

The function in lapply() only references the column indexes / column names, referencing mtcars in a way that does not depend on the iteration in lapply, so pipe the names
names(mtcars) %>% lapply(function(x) mtcars[x])
or write a proper closure
names(mtcars) %>% lapply(function(x, df) df[x], df=mtcars)
or perhaps you don't really need to access the names but just the columns?
mtcars %>% lapply(function(x) sqrt(sum(x)))

I think what you want is the following:
mydf %>% length %>% seq %>%
lapply(function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})
or you can use a lambda function:
mydf %>% {1:length(.)} %>%
lapply(function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})

Related

Using a for loop in R to assign value labels

Context: I have a large dataset (CoreData) with an accompanying datafile (CoreValues) that contains the code and values for each variable within the dataset.
Problem: I want to use a loop to assign each variable within the dataset (CoreData) the correct value labels (from the CoreValues data).
What I've tried so far:
I have created a character vector that identifies which variables within my main data (CoreData) have values that need to be added:
Core_VarwithValueLabels<- unique(CoreValues$Abbreviation)
I have tried a for loop using the vector created , to create vectors for both the label and level arguments that feed into the factor() function.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=paste0(i, 'Levels'), labels = paste0(i, 'Labels'))
}
This creates the correct label and level vectors, however, they are not being picked up properly within the factor function.
Question: Can you help me identify how to get my factor function to work within this loop or if there is a more appropriate method?
Sample data:
CoreValues:
example data from CoreValues
CoreData:
example data from CoreData
UPDATE: RESOLVED
I have now resolved this by using the get() function within my factor() function as it uses the strings I've created with paste0() and find the vector of that name.
for (i in Core_VarwithValueLabels){
assign(paste0(i, 'Labels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Description) %>%
unique() %>%
unlist()
)
assign(paste0(i, 'Levels'),
CoreValues %>%
filter(Abbreviation == i) %>%
select(Code) %>%
unique() %>%
unlist()
)
CoreData[i] <- factor(CoreData[i], levels=get(paste0(i, 'Levels')), labels = get(paste0(i, 'Labels')))
}

How to drop columns that meet a certain pattern over a list of dataframes

I'm trying to drop columns that have a suffix .1 - indicating that this is a repeated column name. This needs to act over a list of dataframe
I have written a function:
drop_duplicated_columns <- function (df) {
lapply(df, function(x) {
x <- x %>% select(-contains(".1"))
x
})
return(df)
}
However it is not working. Any ideas why?
One tidy way to solve this problem would be to first create a function that works for one data.frame and then map this function to a list
library(tidyverse)
drop_duplicated_columns <- function(df) {
df %>%
select(-contains(".1"))
}
Or even better
drop_duplicated_columns <- . %>%
select(-contains(".1"))
Usage in pipes, combine it with a map
list_dfs <- list(mtcars,mtcars)
list_dfs %>%
map(drop_duplicated_columns)
If you just need one function you can create a new pipe using the functioning code that you tested before
drop_duplicated_columns_list <- . %>%
map(drop_duplicated_columns)
list_dfs %>%
drop_duplicated_columns_list()

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

Dplyr conditional select and mutate

I have working code which excludes columns based on a parameter and mutates certain columns based on other parameters. There is this SO question Can dplyr package be used for conditional mutating? but it does not address conditional select
Is there a way to have pure dplyr code without the if statements?
Working R Code:
# Loading
diamonds_tbl <- diamonds
head(diamonds_tbl)
# parameters
initialColumnDrop <- c('x','y','z')
forceCategoricalColumns <- c('carat','cut', 'color')
forceNumericalColumns <- c('')
# Main Code
if(length(which(colnames(diamonds_tbl) %in% initialColumnDrop))>=1){
diamonds_tbl_clean <- diamonds_tbl %>%
select(-one_of(initialColumnDrop)) #Drop specific columns in columnDrop
}
if(length(which(colnames(diamonds_tbl_clean) %in% forceCategoricalColumns))>=1){
diamonds_tbl_clean <- diamonds_tbl_clean %>%
mutate_at(forceCategoricalColumns,funs(as.character)) #Force columns to be categorical
}
if(length(which(colnames(diamonds_tbl_clean) %in% forceNumericalColumns))>=1){
diamonds_tbl_clean <- diamonds_tbl_clean %>%
mutate_at(forceNumericalColumns,funs(as.numeric)) #Force columns to be numeric
}
I don't really understand the desire for a "pure dplyr" solution, but you can make any problem easier with helper functions. For example you could write a a function to run a transformation only if certain columns are found
run_if_cols_match <- function(data, cols, expr) {
if (any(names(data) %in% cols)) {
expr(data)
} else {
data
}
}
Then you could use that in a pipe
diamonds_tbl_clean <- diamonds_tbl %>%
run_if_cols_match(initialColumnDrop,
. %>% select(-one_of(initialColumnDrop))) %>%
run_if_cols_match(forceCategoricalColumns,
. %>% mutate_at(forceCategoricalColumns,funs(as.character))) %>%
run_if_cols_match(forceNumericalColumns,
. %>% mutate_at(forceNumericalColumns,funs(as.numeric)))
which would do the same thing as your code. Here just just conditionally run different anonymous pipes.

Use input of purrr's map function to create a named list as output in R

I am using the map function of the purrr package in R which gives as output a list. Now I would like the output to be a named list based on the input. An example is given below.
input <- c("a", "b", "c")
output <- purrr::map(input, function(x) {paste0("test-", x)})
From this I would like to access elements of the list using:
output$a
Or
output$b
We just need to name the list
names(output) <- input
and then extract the elements based on the name
output$a
#[1] "test-a"
If this needs to be done using tidyverse
library(tidyverse)
output <- map(input, ~paste0('test-', .)) %>%
setNames(input)
Update
Now in 2020 the answer form #mihagazvoda describes the correct approach: simply set_names before applying map
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))
Outdated answer
The accepted solution works, but suffers from a repeated argument (input) which may cause errors and interrupts the flow when using piping with %>%.
An alternative solution would be to use a bit more power of the %>% operator
1:5 %>% { set_names(map(., ~ .x + 3), .) } %>% print # ... or something else
This takes the argument from the pipe but still lacks some beauty. An alternative could be a small helper method such as
map_named = function(x, ...) map(x, ...) %>% set_names(x)
1:5 %>% map_named(~ .x + 1)
This already looks more pretty and elegant. And would be my preferred solution.
Finally, we could even overwrite purrr::map in case the argument is a character or integer vector and produce a named list in such a case.
map = function(x, ...){
if (is.integer(x) | is.character(x)) {
purrr::map(x, ...) %>% set_names(x)
}else {
purrr::map(x, ...)
}
}
1 : 5 %>% map(~ .x + 1)
However, the optimal solution would be if purrr would implement such behaviour out of the box.
The recommended solution:
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))

Resources