Can I use a function when naming new columns with quasiquotation? - r

I want to make a new column with a name that is a combo of two arguments I gave a function.
Here is some data:
data <- tribble(
~one, ~two, ~three,
'a','b', 'c',
'd', 'e', 'f'
)
If I just want to give it a normal name, this works fine:
normal_naming_func <- function(data, name) {
data %>%
mutate({{name}} := str_c(one, two))
}
But what if I want the name to be a combination of two different function parameters?
This doesn't work:
naming_func <- function(data, name_part1, name_part2) {
data %>%
mutate(str_c({{name_part1}}, {{name_part2}}) := str_c(one, two))
}
I get the error:
Error: The LHS of:=must be a string or a symbol
Neither does this:
naming_func <- function(data, name_part1, name_part2) {
data %>%
mutate(str_glue("{{name_part1}}, {{name_part2}}") := str_c(one, two))
}
Thanks for your help.

You forgot to unquote the LHS. Furthermore, you need to convert the unevaluated names to strings before you can concatenate them:
naming_func <- function(data, name_part1, name_part2) {
name1 = as.character(ensym(name_part1))
name2 = as.character(ensym(name_part2))
data %>%
mutate(!! str_c(name1, name2) := str_c({{name_part1}}, {{name_part2}}))
}
Remember, {{…}} is a shortcut for enquote-then-unquote. However, to construct the new column name you need a slightly different operation: enquote-then-to-string-then-concatenate-then-unquote.
{{…}} does not allow you to insert operations in between the quoting and unquoting so the only way to achieve this is to split the operations up and perform them manually, as is done in the code above.

Related

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

Removing a row by string-matching in R regardless of whether it exists or not

I am trying to remove a row in a dataframe based on string matching. I'm using:
data <- data[- grep("my_string", data$field1),]
When there's an actual row with the value "my_string" in data$field1 this works as expected and it drops that row. However, if there is no string "my_string", it creates an empty dataframe. How to I do write this so that it allows for the possibility of the string to not exist, and still keeps my data frame intact?
It may be better to use grepl and negate with !
data[!grepl("my_string", data$field1),]
Or another option is setdiff on grep
data[setdiff(seq_len(nrow(data)), grep("my_string", data$field1)),]
You can use a plain if statement.
df <- data.frame(fieled = c("my_string", "my_string_not", "something", "something_else"),
numbers = 1:4)
result <- grep("gabriel", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df
result <- grep("my_string", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Mutate a column selected by a string converted to symbol

I'm trying to make lower case a column of my dataset
I wrote a basic stupid function
library(dplyr)
cleaning_tags<-function(data,col)
{
data<-data%>%mutate(!!sym(col)=tolower(!!sym(col)))
return (data)
}
where data is a data.frame and column is column name as a string
I don't know the error I'm getting
Error: unexpected '=' in "data%>%dplyr::mutate(!!sym("GROUPDSC") ="
The sym operator seems to work correctly because if I'm trying to do
data%>%select(!!sym(col))
it select the desired column.
Thanks.
Try using := when assigning values to column
library(dplyr)
library(rlang)
cleaning_tags<-function(data,col) {
data %>% mutate(!!sym(col) := tolower(!!sym(col)))
}
df <- data.frame(a = c("ABC", "DEF"))
cleaning_tags(df, "a")
# a
#1 abc
#2 def
There are different strange things in your code. The function does not return anything (you forgot to return data), you can't assign the new column name like this and the code is hard to read.
library(tidyverse)
cleaning_tags<-function(data, col) {
data %>%
mutate_at(col, toupper)
}
ir <- cleaning_tags(iris, "Species")

R - Passing a string as a string in a user defined function R

I am trying to write a function which subsets a dataset containing a certain string.
Mock data:
library(stringr)
set.seed(1)
codedata <- data.frame(
Key = sample(1:10),
ReadCodePreferredTerm = sample(c("yes", "prefer", "Had refer"), 20, replace=TRUE)
)
User defined function:
findterms <- function(inputdata, variable, searchterm) {
outputdata <- inputdata[str_which(inputdata$variable, regex(searchterm, ignore_case=TRUE)), ]
return(outputdata)
}
I am expecting at least a couple of rows returned, but I get 0 when I run the following code:
findterms(codedata, ReadCodePreferredTerm, " refer") #the space in front of this word is deliberate
I realise I am trying to do something quite simple... but can't find out why it isn't working.
Note, the code works fine when not defined as a function:
referterms <- codedata[str_which(codedata$ReadCodePreferredTerm, regex(" refer", ignore_case=TRUE)), ]
You can use dplyr and stringr to do this simply
library(magrittr) # For the pipe (%>%)
library(dplyr)
library(stringr)
codedata %>%
dplyr::filter(str_detect(ReadCodePreferredTerm, '\\brefer\\b'))
You can also write your own function if you like, you will need rlang as well if you don't want to pass in a string for the variable name. something like this works
library(rlang)
findterms <- function(df, variable, searchterm) {
variable <- enquo(variable)
return(
df %>%
dplyr::filter(str_detect(!!variable, str_interp('\\b${ searchterm }\\b')))
)
}
findterms(codedata, ReadCodePreferredTerm, 'refer')

Resources