how to pass column names in dplyr select without evaluation - r

I'm trying to programmatically pass column names to a function so that they can be selected in dplyr. The column names will vary so I've tried to use the standard evaluation version of the select function select_. The column names themselves are a bit funny as they contain + and - characters, which I think is causing the issue. Below is a simple example that replicates the error.
library(tibble)
library(dplyr)
data <- data_frame(target_id = 'xyz',
`CH4+Sulfate-1` = 1.2,
`CH4+Sulfate-2` = 2,
`CH4+Sulfate-3` = 3)
columns <- c('CH4+Sulfate-1', 'CH4+Sulfate-2', 'CH4+Sulfate-3')
select_(data, .dots = columns)
I get the following error
Error in eval(expr, envir, enclos) : object 'CH4' not found
Which leads me to believe that the names are being evaluated rather than taken as the string. How can I get around this problem without having to rename the columns of the table?

Wrapping the names in backticks does the job.
columns <- c('`CH4+Sulfate-1`', '`CH4+Sulfate-2`', '`CH4+Sulfate-3`')

Related

R Tidymodels: error columns don't exist when using function argument to specify column

I'm trying to write a function to use the R tidymodels function initial_split with an argument that would let me change the strata to a different variable each time I call the function.
Using initial_split regularly like this works perfectly:
split_glab=initial_split(data,prop=0.7,strata=sp_glabrata)
Then I converted it to a function and plugged in my species parameter:
split_data=function(df,species){
initial_split(df,prop=0.7,strata=species)
}
split_data(data,species=sp_glabrata)
And get the following error:
Error: Can't subset columns that don't exist.
x Column `species` doesn't exist.
Of course, this column doesn't exist in my data since it's just an argument in my function --the column I'm trying to reference is called sp_glabrata. I can't figure out how to get my function to reference the column instead of the parameter. I don't want to just type the column name since I have to apply many similar functions to several columns and it would take forever.
Any guidance would be appreciated!
As it is a tidy package, can make use of curly-curly operator ({{}}) to evaluate the unquoted argument as a column name
library(tidymodels)
split_data <- function(df, species){
initial_split(df, prop=0.7, strata={{species}})
}
-testing
split_data(iris, species = Species)
#<Analysis/Assess/Total>
#<105/45/150>

Paste0, subset Error: 'subset' must be logical

I would like to use paste0 to create a long string containing the conditions for the subset function.
I tried the following:
#rm(list=ls())
set.seed(1)
id<-1:20
ids<-sample(id, 3)
d <- subset(id, noquote(paste0("id==",ids,collapse="|")))
I get the
Error in subset.default(id, noquote(paste0("id==", ids, collapse = "|"))) :
'subset' must be logical
I tried the same without noquote. Interestinly when I run
noquote(paste0("id==",ids,collapse="|"))
I get [1] id==4|id==7|id==1. When I then paste this by hand in the subset formula
d2<-subset(id,id==4|id==7|id==1)
Everything runs nice. But why does subset(id, noquote(paste0("id==",ids,collapse="|"))) not work although it seems to be the same? Thanks a lot for your help!

Renaming an unnamed variable with dplyr

I have to read a bunch of .xlsx files into R, which I do with readxl::read_excel(). Each of these files does not give a variable name for the first column. Since there are plenty of files, I do not want to change those manually.
In order to process the data properly, it is necessary to give these first columns a name. In the end, I want to write a function that I can call for each of these .xlsx files (e.g. using purrr:map) and within this function I would prefer to get a single pipe as a solution.
Unfortunately, dplyr::rename(df, timeseries = ``) throws the following error:
Error: attempt to use zero-length variable name
Using the column index (dplyr::rename(df, timeseries = 1)) does not work either:
Error: Arguments to rename() must be unquoted variable names.
Argument timeseries is not.
How can I avoid to interrupt the pipe in order to rename the variable by names(df)[1] <- "timeseries"?
This can be accomplished with dplyr::select() in the following way:
select(df, timeseries = 1, everything())
Obviously, dplyr::select() can handle column indices, which allows this solution.
Please comment if you are aware of any particular reason why this is not possible with dplyr:rename()!
If you want to use rename and a column index (in this case 1), you can do
rename_(df, timeseries = names(df)[1])
When chaining, use a dot:
df %>% ... %>% rename_(timeseries = names(.)[1])

Specifying variables in cor.matrix

Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?
The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)

ldply with subset does not see local variable

i have a list of team names (teamNames) and a list of data frames (weekSummaries)
i want to get a list of team summaries by week:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,subset,team_name==teamName)
}
however, when i run this i get an error
>Error in eval(expr, envir, enclos) : object 'teamName' not found
but when i run the command
>ldply(weekSummaries,subset,team_name=="Denver Broncos")
i get a data frame with the information i need for one team... can somebody point out what i'm doing wrong?
Looks like the answer is not to use the subset function, and instead to use a custom function, passing it the data frame, then subsetting using bracket notation. such as this:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,function(week){
week[week$team_name==teamName,]
})
}

Resources