R - new column in data frame calculated with a formula variable

R - new column in data frame calculated with a formula variable - r

I would like to assign a formula to a variable, and then take a data frame that contains the variables in the formula and make a new column with the result. I thought the function I could use would be model.frame but I'm not sure. Any idea how I can do this?
names(mydata)
[1] "STATION_NAME" "LATITUDE" "LONGITUDE" "DATE" "SNOW"
[6] "TMAX" "TMIN" "PRCP"
varForm <- "(TMIN+TMAX)/2"
calcVect <- model.frame(varForm, data = mydata) Error in eval(expr, envir, enclos) : object 'TMIN' not found
mydata$calcField <- calcVect Error: object 'calcVect' not found
Both mydata and varForm would be parameters for a user defined function I am working on. That's the reason for not just directly calculating the field. Thanks!

Hi i think you can do that with within
eq <- "varForm = (TMIN+TMAX)/2"
mydata <- within(mydata, eval(parse(text = eq)))

Related

Specifying variables in cor.matrix

Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?

The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)

Understand the call function in R

I copied the function from the web:
# function used to predict Best Subset Selection Regression
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
However, when I try to use the above function within another function , I kept getting the following error.
library(leaps)
abc <- function(){
regfit <- regsubsets(lpsa ~.,data = XTraining, nvmax = 8)
predict.regsubsets(regfit, data = XTesting, id = 1)
}
abc()
Error in object$call[[2]] : subscript out of bounds
I read ?call in R already. But it doesn't help me understanding what went wrong here, in particular what is $call[[2]] ?
How can I edit the function above such that when I call the above function inside another function I won't get an error ?

The culprit is the line
form = as.formula(object$call[[2]])
This implies that object (which is the variable you pass to the function, in your example regfit) has a member called call, which is a list with at least two elements. [[ ]] is the R operator used to take the elements of a list.
For instance:
> a <- list(1:10, 1:5, letters[15:20])
> a[[2]]
[1] 1 2 3 4 5
> a[[3]]
[1] "o" "p" "q" "r" "s" "t"
However
> a[[5]] # This does not work, as a only has three elements
Error in a[[5]] : subscript out of bounds
You should not check ?call but rather the help for the function that generates object, in your case regsubsets.
As you can see from ?regsubsets, or by using str(regfit), that function does not return an object with a member named call.
To get the formula from a regsubsets object you need to look at the obj member of the summary.
For instance you could use:
sm <- summary(regfit)
sm$obj$call

Your mistake is in the function abc. The argument in the predict.regsubsets is called newdata, but you refer to is as data....

The object is probably the output from a previous analysis (look at the place you got the code from what function). The line form = as.formula(objects$call[[2]]) extracts the formula used to create the object and stores it in form. In the next lines it is used to create the model matrix of the new data, and finally uses it to predict the new data.

ldply with subset does not see local variable

i have a list of team names (teamNames) and a list of data frames (weekSummaries)
i want to get a list of team summaries by week:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,subset,team_name==teamName)
}
however, when i run this i get an error
>Error in eval(expr, envir, enclos) : object 'teamName' not found
but when i run the command
>ldply(weekSummaries,subset,team_name=="Denver Broncos")
i get a data frame with the information i need for one team... can somebody point out what i'm doing wrong?

Looks like the answer is not to use the subset function, and instead to use a custom function, passing it the data frame, then subsetting using bracket notation. such as this:
teamSummaries <- llply(teamNames,getTeamSubset)
getTeamSubset = function(teamName){
temp=ldply(weekSummaries,function(week){
week[week$team_name==teamName,]
})
}

Error in eval(expr, envir, enclos) : object not found

I cannot understand what is going wrong here.
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
# Building decision tree
Train <- data.frame(residual.sugar=data.train$residual.sugar,
total.sulfur.dioxide=data.train$total.sulfur.dioxide,
alcohol=data.train$alcohol,
quality=data.train$quality)
Pre <- as.formula("pre ~ quality")
fit <- rpart(Pre, method="class",data=Train)
I am getting the following error :
Error in eval(expr, envir, enclos) : object 'pre' not found

Don't know why #Janos deleted his answer, but it's correct: your data frame Train doesn't have a column named pre. When you pass a formula and a data frame to a model-fitting function, the names in the formula have to refer to columns in the data frame. Your Train has columns called residual.sugar, total.sulfur, alcohol and quality. You need to change either your formula or your data frame so they're consistent with each other.
And just to clarify: Pre is an object containing a formula. That formula contains a reference to the variable pre. It's the latter that has to be consistent with the data frame.

This can happen if you don't attach your dataset.

I think I got what I was looking for..
data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
fit <- rpart(quality ~ ., method="class",data=data.train)
plot(fit)
text(fit, use.n=TRUE)
summary(fit)

i use
colname(train) = paste("A", colname(train))
and it turns out to the same problem as yours.
I finally figure out that randomForest is more stingy than rpart, it can't recognize the colname with space, comma or other specific punctuation.
paste function will prepend "A" and " " as seperator with each colname.
so we need to avert the space and use this sentence instead:
colname(train) = paste("A", colname(train), sep = "")
this will prepend string without space.

r Creating models on subsets with data.table inside a function

Using data.table, I am trying to write a function that takes a data table, a formula object, and a string as arguments, and creates and stores multiple model objects.
myData <- data.table(c("A","A","A","B","B","B"),c(1,2,1,4,5,5),c(1,1,2,5,6,4))
## This works.
ModelsbyV1 <- myData[,list(model=list(lm(V2~V3)),by=V1)]
##This does not.
SectRegress <- function (df,eq,sectors) {
Output <- df[,list(model=list(lm(eq))),
by=sectors]
return(Output)
}
Test <- SectRegress(myData,formula(V2~V3),sectors="V1")
##Error in eval(expr, envir, enclos) : object 'X' not found
I have tried ataching the df in the function. But, that nullifies the ability to group by type. The colnames(df) inside the function includes "X". I'm stumped.

You've to evaluate it within the environment .SD (as lm can not "see" V2 and V3 otherwise):
SectRegress <- function (df,eq,sectors) {
Output <- df[, list(model=list(lm(eq, .SD))), by=sectors]
return(Output)
}
Test <- SectRegress(myData,formula(V2~V3),sectors="V1")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - new column in data frame calculated with a formula variable - r

Hi i think you can do that with within eq <- "varForm = (TMIN+TMAX)/2" mydata <- within(mydata, eval(parse(text = eq)))

Related

Specifying variables in cor.matrix

Understand the call function in R

ldply with subset does not see local variable

Error in eval(expr, envir, enclos) : object not found

r Creating models on subsets with data.table inside a function

Categories

Resources