I am trying to use the assertive package for run-time testing, and I would like to pass column names using the pipe.
Here's a simple example:
library(tidyverse)
library(assertive)
df <- tibble(Name = c("DONALD", "JAIME", "LINDA"))
This works but doesn't use the pipe:
assertive::assert_all_are_true(df$Name == str_to_upper(df$Name))
This uses the pipe, but doesn't work:
df %>% assertive::assert_all_are_true(Name == str_to_upper(Name))
#> Error in match.arg(severity): object 'Name' not found
How can I pipe column names to assertive?
We can use with
library(dplyr)
df %>%
with(., assertive::assert_all_are_true(Name == str_to_upper(Name)))
Or extract the column with .$
df %>%
{assertive::assert_all_are_true(.$Name == str_to_upper(.$Name))}
Or with |> from R 4.1.0
df |>
{\(x) assertive::assert_all_are_true(x$Name == str_to_upper(x$Name))}()
Related
Using the code below, I am trying to generate a new character variable (cat) using a numeric variable (AgeatDeath). I am getting this: could not find function "%>%<-" while the dplyr is loaded. map_GIS is my data.
Thanks,
Nader
map_GIS %>%
filter(Disability=='Cerebral palsy') %>%
cat ['AgeatDeath' > 40] <- "Elder"
If the intention is to create a new column, named 'cat' (cat is a function name as well), we can use mutate
library(dplyr)
map_GIS2 <- map_GIS %>%
filter(Disability == 'Cerebral palsy') %>%
mutate(cat = case_when(AgeatDeath > 40 ~ 'Elder'))
(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)
I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.
Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.
Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.