ldply with subset does not see local variable

ldply with subset does not see local variable - r

i have a list of team names (teamNames) and a list of data frames (weekSummaries)
i want to get a list of team summaries by week:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,subset,team_name==teamName)
}
however, when i run this i get an error
>Error in eval(expr, envir, enclos) : object 'teamName' not found
but when i run the command
>ldply(weekSummaries,subset,team_name=="Denver Broncos")
i get a data frame with the information i need for one team... can somebody point out what i'm doing wrong?

Looks like the answer is not to use the subset function, and instead to use a custom function, passing it the data frame, then subsetting using bracket notation. such as this:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,function(week){
week[week$team_name==teamName,]
})
}

Related

Creating a list for formula in R for raking

I am trying to generate a list to match the required parameter for the rake function in the survey package. The parameter in question is called sample.margins and in the documentation (page 27) it's described as requiring list of formulas or data frames describing sample margins, which must not contain missing values
I am using the following list:
[[1]]
~age_bucket
<environment: 0x125f741c8>
[[2]]
~educ_bucket
The list is created as:
rake_sample_margins <- lapply(1:length(column_names), function(x) {
as.formula(paste0("~", column_names))
})
rake_sample_margins[[length(rake_sample_margins) + 1]] <- as.formula(~educ_bucket)
So ~educ_bucket is created after the lapply whereas ~age_bucket is created by iterating through the column_names vector (which in this case contains only that string age_bucket.
When I run my code using this list for the parameter of the rake function, I get this error:
Error in eval(expr, envir, enclos) :
object 'educ_bucket' not found
Even though when I examine the correlating dataframe in another parameter it is indeed there (as the 2nd element of that list of data frames):
[[2]]
educ_bucket FREQ
1: groupA .002
2: groupB .08
I've been told that the list of formulae must exactly match a column in the data frame in the list of dataframes passed, but I believe I am matching this. Is there something else about this I'm missing? Also, I notice I have an irregularity with the list. The first formula includes: <environment: 0x125f741c8> underneath the formula when printed out, whereas the second does not. What, if anything, does that mean?

how to pass column names in dplyr select without evaluation

I'm trying to programmatically pass column names to a function so that they can be selected in dplyr. The column names will vary so I've tried to use the standard evaluation version of the select function select_. The column names themselves are a bit funny as they contain + and - characters, which I think is causing the issue. Below is a simple example that replicates the error.
library(tibble)
library(dplyr)
data <- data_frame(target_id = 'xyz',
`CH4+Sulfate-1` = 1.2,
`CH4+Sulfate-2` = 2,
`CH4+Sulfate-3` = 3)
columns <- c('CH4+Sulfate-1', 'CH4+Sulfate-2', 'CH4+Sulfate-3')
select_(data, .dots = columns)
I get the following error
Error in eval(expr, envir, enclos) : object 'CH4' not found
Which leads me to believe that the names are being evaluated rather than taken as the string. How can I get around this problem without having to rename the columns of the table?

Wrapping the names in backticks does the job.
columns <- c('`CH4+Sulfate-1`', '`CH4+Sulfate-2`', '`CH4+Sulfate-3`')

Specifying variables in cor.matrix

Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?

The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)

changing first colname in a list of dataframes

I've got a list of dataframes and am trying to change the first colname using the lapply method
frames<-lapply(frames,function(x){ colnames(frames[[x]])[1]<-"date"})
is returning the error
Error in `*tmp*`[[x]] : invalid subscript type 'list'
I am not sure why it would produce this error as my understanding is that this should apply
colname[1]<-"date"
to every data frame in the list
If anyone can tell me the root of this error I would be very grateful!

You do not need to reference the frames list inside of lapply. Your function treats x as an element in the list, frames. Try this:
frames <- lapply(frames, function(x) { colnames(x)[1] <- "date"; return(x) })

invalid 'envir' argument of type 'character' -- in self-defined function with lattice histogram

I want a function with parameters such as data name (dat), factor(myfactor), variable names(myvar) to dynamically generate histograms (have to use lattice).
Using IRIS as a minimal example:
data(iris)
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
histogram(myf,
data=dat,
main=bquote(paste(.(myvar),"distribution by",.(myfactor),seq=" ")))}
my_histogram("Sepal.Length","Species","iris")
I also tried do.call as some posts indicated:
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
p <- do.call("histogram",
args = list(myf,
data=dat))
print(p)
}
my_histogram("Sepal.Length","Species","iris")
But the error appears: invalid 'envir' argument of type 'character'. I think the program doesn't know where to look for thismyf` string. How can I fix this or there's a better way?

Readers of this should be aware that the question has completely mutated from an earlier version and doesn't really match up with this answer anymore. The answer to the new question appears in the comments.
There is no object named Sepal.Length. (So R is creating an error even before that my_function gets called.) There is only a column name and it would need to be quoted to pass it to a function. (The data object could not be created because that URL fails to deliver the data. Why aren't you using the built-in copy of the iris data object?
You will also need to build a formula from myvar and fac. Formulas are expressions and get parsed without evaluation of their tokens. You need to build a formula inside your function that looks like: ~Sepal.Length|Species and then pass it to the histogram call. Consult ?as.formula

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ldply with subset does not see local variable - r

Related

Creating a list for formula in R for raking

how to pass column names in dplyr select without evaluation

Specifying variables in cor.matrix

changing first colname in a list of dataframes

invalid 'envir' argument of type 'character' -- in self-defined function with lattice histogram

Categories

Resources