Mutating Columns with paste0 - r

I'm looking to dynamically name columns. I need to duplicate variables with new names. Why isn't the new_sepal_length_2 variable the same as new_sepal_length? How can I fix this?
new_var = 'Sepal.Length'
iris %>% mutate(new_sepal_length = Sepal.Length,
new_sepal_length_2 = noquote(paste0(new_var)))

We can convert it to symbol (sym) and evaluate (!!)
library(dplyr)
library(stringr)
iris %>%
mutate(new_sepal_length = str_c(!!rlang::sym(new_var), collapse=", "))
Or another option is to make use of mutate_at which can take strings in vars
iris %>%
mutate_at(vars(new_var), list(new= ~ str_c(., collapse=", ")))
Or use paste
iris %>%
mutate(new_sepal_length = paste(!!rlang::sym(new_var), collapse = ", "))
paste0 or paste by itself only converts to character class. Perhaps, we may need to use the arguments in paste

Related

Is there a R function that allows you to insert a variable inside a command?

For example:
Imagine I have an object named "cors" which contains a string only ("Spain" for example). I would like then for "cors" to be replaced in the expression (1) below by "Spain", resulting in the expression (2):
#(1)
DF <- DF %>% filter(str_detect(Country, "Germany|cors", negate = TRUE))
#(2)
DF <- DF %>% filter(str_detect(Country, "Germany|Spain", negate = TRUE))
P.S: I know that in MATLAB this could be handled with the "eval()" command, though in R it apparently has a completely different applicability.
If we have an object, then place it outside the quotes and use paste/str_c to create the string
library(dplyr)
library(stringr)
cors <- "Spain"
pat <- str_c(c("Germany", cors), collapse = "|")
DF %>%
filter(str_detect(Country, pat, negate = TRUE))
Or another option is to string interpolate with glue (assuming cors object have only a single string element)
DF %>%
filter(str_detect(Country, glue::glue("Germany|{cors}"), negate = TRUE))
Or this can be done in base R with grepl
pat <- paste(c("Germany", cors), collapse = "|")
subset(DF, !grepl(pat, Country))
If you really want eval, you could do:
cors <- 'Spain'
DF <- DF %>% filter(
eval(
parse(text=paste0('str_detect(Country, "Germany|', cors, '", negate=TRUE)'))
))

In R how to pass a column as parameter to strsplit?

What is the proper way to pass a column as parameter to a str_split function and have it recognized as a column?
library(tidyverse)
library(lazyeval)
df = data.frame("x"=c("apple/pear","pear/banana/kiwi","orange/pear"))
function (col) {
mtcars %>%
select(col) %>%
transform(col = interp(strsplit(~v, "/"), v=as.name(col)) )
}
currently this is returning error 'Error in strsplit(~v, "-") : non-character argument'
We can use tidyverse options instead of mixing base R with tidyverse. separate_rows from tidyr splits the column and reshape it to 'long' format. Inside the function, we can make use of the curly-curly operator ({{}}) that evaluates unquoted argument to the function
library(dplyr)
library(tidyr)
f1 <- function(data, col) {
data %>%
separate_rows({{col}}, sep="/")
}
f1(df, x)

str_extract_all: return all patterns found in string concatenated as vector

I want to extract everything but a pattern and return this concetenated in a string.
I tried to combine str_extract_all together with sapply and cat
x = c("a_1","a_20","a_40","a_30","a_28")
data <- tibble(age = x)
# extracting just the first pattern is easy
data %>%
mutate(age_new = str_extract(age,"[^a_]"))
# combining str_extract_all and sapply doesnt work
data %>%
mutate(age_new = sapply(str_extract_all(x,"[^a_]"),function(x) cat(x,sep="")))
class(str_extract_all(x,"[^a_]"))
sapply(str_extract_all(x,"[^a_]"),function(x) cat(x,sep=""))
Returns NULL instead of concatenated patterns
Instead of cat, we can use paste. Also, with tidyverse, can make use of map and str_c (in place of paste - from stringr)
library(tidyverse)
data %>%
mutate(age_new = map_chr(str_extract_all(x, "[^a_]+"), ~ str_c(.x, collapse="")))
using `OP's code
data %>%
mutate(age_new = sapply(str_extract_all(x,"[^a_]"),
function(x) paste(x,collapse="")))
If the intention is to get the numbers
library(readr)
data %>%
mutate(age_new = parse_number(x))
Here is a non tidyverse solution, just using stringr.
apply(str_extract_all(column,regex_command,simplify = TRUE),1,paste,collapse="")
'simplify' = TRUE changed str_extract_all to output a matrix, and apply iterates over the matrix. I got the idea from https://stackoverflow.com/a/4213674/8427463
Example: extract all 'r' in rownames(mtcar) and concatenate as a vector
library(stringr)
apply(str_extract_all(rownames(mtcars),"r",simplify = TRUE),1,paste,collapse="")

dplyr works using funs, but gives error with list [duplicate]

Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.

How to change the now deprecated dplyr::funs() which includes an ifelse argument?

Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.

Resources