Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?
The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)
Related
From these strings
data = "mtcars"
y = "mpg"
x = c("cyl","disp")
, I am trying to perform a linear model. I tried things like
epp=function(x) eval(parse(text=paste0(x,collapse="+")))
lm(data=epp(data),epp(y)~epp(x))
# Error in eval(expr, envir, enclos) : object 'cyl' not found
where the last line was aimed to be equivalent to
lm(data=mtcars,mpg~cyl+disp)
This involves two operations that are both described in multiple SO entries that use perhaps singly either the get or as.formula functions:
lm(data=get(data),
formula=as.formula( paste( y, "~", paste(x, collapse="+") ) )
)
In both cases you are use a text/character object to return a language object. In the first argument get returns a 'symbol' that can be evaluated and in the second instance as.formula returns a 'formula' object. #blmoore is correct in advising us that lm will accept a character object, so the as.formula call is not needed here.
I'm trying to programmatically pass column names to a function so that they can be selected in dplyr. The column names will vary so I've tried to use the standard evaluation version of the select function select_. The column names themselves are a bit funny as they contain + and - characters, which I think is causing the issue. Below is a simple example that replicates the error.
library(tibble)
library(dplyr)
data <- data_frame(target_id = 'xyz',
`CH4+Sulfate-1` = 1.2,
`CH4+Sulfate-2` = 2,
`CH4+Sulfate-3` = 3)
columns <- c('CH4+Sulfate-1', 'CH4+Sulfate-2', 'CH4+Sulfate-3')
select_(data, .dots = columns)
I get the following error
Error in eval(expr, envir, enclos) : object 'CH4' not found
Which leads me to believe that the names are being evaluated rather than taken as the string. How can I get around this problem without having to rename the columns of the table?
Wrapping the names in backticks does the job.
columns <- c('`CH4+Sulfate-1`', '`CH4+Sulfate-2`', '`CH4+Sulfate-3`')
i have a list of team names (teamNames) and a list of data frames (weekSummaries)
i want to get a list of team summaries by week:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,subset,team_name==teamName)
}
however, when i run this i get an error
>Error in eval(expr, envir, enclos) : object 'teamName' not found
but when i run the command
>ldply(weekSummaries,subset,team_name=="Denver Broncos")
i get a data frame with the information i need for one team... can somebody point out what i'm doing wrong?
Looks like the answer is not to use the subset function, and instead to use a custom function, passing it the data frame, then subsetting using bracket notation. such as this:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,function(week){
week[week$team_name==teamName,]
})
}
I cannot understand what is going wrong here.
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
# Building decision tree
Train <- data.frame(residual.sugar=data.train$residual.sugar,
total.sulfur.dioxide=data.train$total.sulfur.dioxide,
alcohol=data.train$alcohol,
quality=data.train$quality)
Pre <- as.formula("pre ~ quality")
fit <- rpart(Pre, method="class",data=Train)
I am getting the following error :
Error in eval(expr, envir, enclos) : object 'pre' not found
Don't know why #Janos deleted his answer, but it's correct: your data frame Train doesn't have a column named pre. When you pass a formula and a data frame to a model-fitting function, the names in the formula have to refer to columns in the data frame. Your Train has columns called residual.sugar, total.sulfur, alcohol and quality. You need to change either your formula or your data frame so they're consistent with each other.
And just to clarify: Pre is an object containing a formula. That formula contains a reference to the variable pre. It's the latter that has to be consistent with the data frame.
This can happen if you don't attach your dataset.
I think I got what I was looking for..
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
fit <- rpart(quality ~ ., method="class",data=data.train)
plot(fit)
text(fit, use.n=TRUE)
summary(fit)
i use
colname(train) = paste("A", colname(train))
and it turns out to the same problem as yours.
I finally figure out that randomForest is more stingy than rpart, it can't recognize the colname with space, comma or other specific punctuation.
paste function will prepend "A" and " " as seperator with each colname.
so we need to avert the space and use this sentence instead:
colname(train) = paste("A", colname(train), sep = "")
this will prepend string without space.
Using a dataset w, which includes a numeric column PY, I can do:
nrow(subset(w, PY==50))
and get the correct answer. If, however, I try to create a function:
fxn <- function(dataset, fac, lev){nrow(subset(dataset, fac==lev))}
and run
fxn(w, PY, 50)
I get the following error:
Error in eval(expr, envir, enclos) : object 'PY' not found
What am I doing wrong? Thanks.
From the documentation of subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
This rather obscure warning was very well explained here: Why is `[` better than `subset`?
The final word is you can't use subset other than interactively, in particular, not via a wrapper like you are trying. You should use [ instead:
fxn <- function(dataset, fac, lev) nrow(dataset[dataset[fac] == lev, , drop = FALSE])
or rather simply:
fxn <- function(dataset, fac, lev) sum(dataset[fac] == lev)