I cannot understand what is going wrong here.
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
# Building decision tree
Train <- data.frame(residual.sugar=data.train$residual.sugar,
total.sulfur.dioxide=data.train$total.sulfur.dioxide,
alcohol=data.train$alcohol,
quality=data.train$quality)
Pre <- as.formula("pre ~ quality")
fit <- rpart(Pre, method="class",data=Train)
I am getting the following error :
Error in eval(expr, envir, enclos) : object 'pre' not found
Don't know why #Janos deleted his answer, but it's correct: your data frame Train doesn't have a column named pre. When you pass a formula and a data frame to a model-fitting function, the names in the formula have to refer to columns in the data frame. Your Train has columns called residual.sugar, total.sulfur, alcohol and quality. You need to change either your formula or your data frame so they're consistent with each other.
And just to clarify: Pre is an object containing a formula. That formula contains a reference to the variable pre. It's the latter that has to be consistent with the data frame.
This can happen if you don't attach your dataset.
I think I got what I was looking for..
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
fit <- rpart(quality ~ ., method="class",data=data.train)
plot(fit)
text(fit, use.n=TRUE)
summary(fit)
i use
colname(train) = paste("A", colname(train))
and it turns out to the same problem as yours.
I finally figure out that randomForest is more stingy than rpart, it can't recognize the colname with space, comma or other specific punctuation.
paste function will prepend "A" and " " as seperator with each colname.
so we need to avert the space and use this sentence instead:
colname(train) = paste("A", colname(train), sep = "")
this will prepend string without space.
Related
From these strings
data = "mtcars"
y = "mpg"
x = c("cyl","disp")
, I am trying to perform a linear model. I tried things like
epp=function(x) eval(parse(text=paste0(x,collapse="+")))
lm(data=epp(data),epp(y)~epp(x))
# Error in eval(expr, envir, enclos) : object 'cyl' not found
where the last line was aimed to be equivalent to
lm(data=mtcars,mpg~cyl+disp)
This involves two operations that are both described in multiple SO entries that use perhaps singly either the get or as.formula functions:
lm(data=get(data),
formula=as.formula( paste( y, "~", paste(x, collapse="+") ) )
)
In both cases you are use a text/character object to return a language object. In the first argument get returns a 'symbol' that can be evaluated and in the second instance as.formula returns a 'formula' object. #blmoore is correct in advising us that lm will accept a character object, so the as.formula call is not needed here.
Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?
The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)
i have a list of team names (teamNames) and a list of data frames (weekSummaries)
i want to get a list of team summaries by week:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,subset,team_name==teamName)
}
however, when i run this i get an error
>Error in eval(expr, envir, enclos) : object 'teamName' not found
but when i run the command
>ldply(weekSummaries,subset,team_name=="Denver Broncos")
i get a data frame with the information i need for one team... can somebody point out what i'm doing wrong?
Looks like the answer is not to use the subset function, and instead to use a custom function, passing it the data frame, then subsetting using bracket notation. such as this:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,function(week){
week[week$team_name==teamName,]
})
}
I would like to assign a formula to a variable, and then take a data frame that contains the variables in the formula and make a new column with the result. I thought the function I could use would be model.frame but I'm not sure. Any idea how I can do this?
names(mydata)
[1] "STATION_NAME" "LATITUDE" "LONGITUDE" "DATE" "SNOW"
[6] "TMAX" "TMIN" "PRCP"
varForm <- "(TMIN+TMAX)/2"
calcVect <- model.frame(varForm, data = mydata) Error in eval(expr, envir, enclos) : object 'TMIN' not found
mydata$calcField <- calcVect Error: object 'calcVect' not found
Both mydata and varForm would be parameters for a user defined function I am working on. That's the reason for not just directly calculating the field. Thanks!
Hi i think you can do that with within
eq <- "varForm = (TMIN+TMAX)/2"
mydata <- within(mydata, eval(parse(text = eq)))
i am struggling with an assignment and i would like your input.
note: this is a homework but when i tried to add the tag it said not to add it..
i don't want the resulting code, just suggestions on how to get this working :)
so, i have a t.test function as such:
my.t.test <- function(x,s1,s2){
x1 <- x[s1]
x2 <- x[s2]
x1 <- as.numeric(x1)
x2 <- as.numeric(x2)
t.out <- t.test(x1,x2,alternative="two.sided",var.equal=T)
out <- as.numeric(t.out$p.value)
return(out)
}
a matrix 30cols x 12k rows called data and an annotation file containing col names and data on the colums named dataAnn
dataAnn first column contains a list of M (male) or F (female) corresponding to the samples (or cols) in data (that follow the same order as in dataAnn), i have to run a t.test comparing the two samples and get the p values out
when i call
raw.pValue <- apply(data,1,my.t.test,s1=dataAnn[,1]=="M",s2=dataAnn[,1]=="F")
i get the error
Error in t.test(x1, x2, alternative = "two.sided", var.equal = T) :
unused argument(s) (alternative = "two.sided", var.equal = T)
i even tried to use
raw.pValue <- apply(data,1,my.t.test,s1=unlist(data[,1:18]),s2=unlist(data[,19:30]))
to divide the cols i want to compare but in this case i get the error
Error in x[s1] : invalid subscript type 'list'
i have been looking online, i understand that the second error is caused by an indices being a list...but this didn't really clarify it for me...
any input would be appreciated!!
You have overwritten the t.test function. Try calling it something like my.t.test, or when you want to call the original one use stats::t.test (this calls the one from the stats namespace). Remember that when you have overwritten a function you need to rm it from your workspace before you can use the original one without specifying the namespace.