What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it? - r

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?

I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

Related

R nsltools Regression, preview function doesn't take variables

im quite new to R but wanted to use the packages "nls" and "nlstools" since it has nice tools for analysis and evaluation.
the code I use is:
conB1_2015 = read.csv("C:\\Path_to_File\\conB1_2015.csv")
conB1_2015 = na.omit(conB1_2015)
tRef <- mean(conB1_2015$Mean_Soil_Temp_V2..C., na.rm=TRUE)
rRef <- conB1_2015$Lin_Flux..mymol.m.2.s.1.[which.min(abs(conB1_2015$Mean_Soil_Temp_V2..C.-tRef))]
rMax <- max(conB1_2015$Lin_Flux..mymol.m.2.s.1., na.rm=TRUE)
half <- rMax/2
half_SM <- conB1_2015$Soil_Moist_V3[which.min(abs(conB1_2015$Lin_Flux..mymol.m.2.s.1.-half))]
form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
preview(form, data = conB1_2015, start = c(a = -1.98, b = -0.05), variable = 1)
The Problem is, that i get this Error running this code:
Error in data.frame(value, row.names = rn, check.names = FALSE) :
row names supplied are of the wrong length
When i change the variables in form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
to form <- as.formula(Lin_Flux..mymol.m.2.s.1.~(rRef<-4.41)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM<-7.19)+Soil_Moist_V3)
the function works fine.
I wanted to automate the script to run over several csv's to test different models on different data. Is it really not possible to pass variables into the preview function or am I missing something? There can't be a problem with headers or the data table since it's working fine in the second example.

R Passing linear model to another function inside a function

I am trying to find the optimal "lambda" parameter for the Box-Cox transformation.
I am using the implementation from the MASS package, so I only need to create the model and extract the lambda.
Here is the code for the function:
library(MASS)
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))
It gives the following error:
Error in is.data.frame(data) : object 'my_tmp' not found
The interesting thing is that the very same code is working outside the function. In other words, for some reason, the boxcox function from the MASS package is looking for the variable in the global environment.
I don't really understand, what exactly is going on... Do you have any ideas?
P.S. I do not provide a software/hardware specification, since this error was sucessfully replicated on a number of my friends' laptops.
P.P.S. I have found the way to solve the initial problem in the forecast package, but I still would like to know, why this code is not working.
Sometimes user contributed packages don't always do a great job tracking the environments where calls were executed when manipulating functions calls. The quickest fix for you would be to change the line from
the_lm <- lm(x ~ 1, data = my_tmp)
to
the_lm <- lm(x ~ 1, data = my_tmp, y=True, qr=True)
Because if the y and qr are not requested from the lm call, the boxcox function tries to re-run lm with those parameters via an update call and things get mucked up inside a function scope.
Why don't let box-cox do the fitting?
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
out <- boxcox(x ~ 1, data = my_tmp, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
I think your scoping issue is with update.default which calls eval(call, parent.frame()) and my_tmp doesn't exist in the boxcox environment. Please correct me if I'm wrong on this.
boxcox cannot find your data. This maybe because of some scoping issue.
You can feed data in to boxcox function.
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE, data = my_tmp) # feed data in here
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))

bic.glm predict error: "newdata is missing variables"

I've spent a lot of time trying to solve this error and searching for solutions without any luck, and I thank you in advance for your help.
I'm trying to create predicted values from the coefficients created via BMA. Whenever I run my predict function, I am getting a "newdata is missing variables" error. All variables included in the original model are present in the new dataframe, so I'm not quite sure what the problem is. I'm working with a fairly large dataset with many independent variables. I'm fairly new to R, so I apologize if this is an obvious question!
y<-df$y
x<-df
x$y<-NULL
bic.glm<-bic.glm(x, y, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
predict(bic.glm.bwt, x)
I've also tried it this way:
bic.glm<-bic.glm(y~., data=df, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
predict(bic.glm, x)
And also with creating a new data frame...
bic.glm<-bic.glm(y~., data=df, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
newdata<-x
predict(bic.glm, newdata=x)
Each time I receive the same error message:
Error in predict.bic.glm(bic.glm, newdata=x) :
newdata is missing variables
Any help is very much appreciated!
First, it is bad practice to call your LHS the same name as the function call. You may be masking the function bic.glm from further use.
That minor comment aside... I just encountered the same error. After some digging, it seems that predict.bic.glm checks the names vs. the mle matrix in the bic.glm object. The problem is that somewhere in bic.glm, if factors are used, those names get a '.x' or just '.' appended at the end. Therefore, whenever you use factors you will get this error.
I communicated this to package maintainers. Meanwhile, you can work around the bug by renaming the column names of the mle object, like this (using your example):
fittedBMA<-bic.glm(y~., data=df)
colnames(fittedBMA$mle)=colnames(model.matrix(y~., data=df)) ### this is the workaround
predict(fittedBMA,newdata=x) ### should work now, if x has the same variables as df
Okay, so first look at the usage section in the cran documentation for BMA::bic.glm.
here
This example is instructive for a data.frame.
Example 2 (binomial)
library(MASS)
data(birthwt)
y <- birthwt$lo
x <- data.frame(birthwt[,-1])
x$race <- as.factor(x$race)
x$ht <- (x$ht>=1)+0
x <- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl <- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)
bic.glm.bwT <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial",
factor.type=TRUE)
predict( bic.glm.bwT, newdata = x)
bic.glm.bwF <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial",
factor.type=FALSE)
predict( bic.glm.bwF, newdata = x)

predict in caret ConfusionMatrix is removing rows

I'm fairly new to using the caret library and it's causing me some problems. Any
help/advice would be appreciated. My situations are as follows:
I'm trying to run a general linear model on some data and, when I run it
through the confusionMatrix, I get 'the data and reference factors must have
the same number of levels'. I know what this error means (I've run into it before), but I've double and triple checked my data manipulation and it all looks correct (I'm using the right variables in the right places), so I'm not sure why the two values in the confusionMatrix are disagreeing. I've run almost the exact same code for a different variable and it works fine.
I went through every variable and everything was balanced until I got to the
confusionMatrix predict. I discovered this by doing the following:
a <- table(testing2$hold1yes0no)
a[1]+a[2]
1543
b <- table(predict(modelFit,trainTR2))
dim(b)
[1] 1538
Those two values shouldn't disagree. Where are the missing 5 rows?
My code is below:
set.seed(2382)
inTrain2 <- createDataPartition(y=HOLD$hold1yes0no, p = 0.6, list = FALSE)
training2 <- HOLD[inTrain2,]
testing2 <- HOLD[-inTrain2,]
preProc2 <- preProcess(training2[-c(1,2,3,4,5,6,7,8,9)], method="BoxCox")
trainPC2 <- predict(preProc2, training2[-c(1,2,3,4,5,6,7,8,9)])
trainTR2 <- predict(preProc2, testing2[-c(1,2,3,4,5,6,7,8,9)])
modelFit <- train(training2$hold1yes0no ~ ., method ="glm", data = trainPC2)
confusionMatrix(testing2$hold1yes0no, predict(modelFit,trainTR2))
I'm not sure as I don't know your data structure, but I wonder if this is due to the way you set up your modelFit, using the formula method. In this case, you are specifying y = training2$hold1yes0no and x = everything else. Perhaps you should try:
modelFit <- train(trainPC2, training2$hold1yes0no, method="glm")
Which specifies y = training2$hold1yes0no and x = trainPC2.

Calling mlogit() from inside another function, scoping problem with variables when using attach

I need to call the mlogit() R function from inside another function.
This is a function for demonstrative purposes:
#-------------------------
# DEMO FUNCTION
#-------------------------
# f = formula (string)
# fData = data.frame
# cVar = choice variable (string)
# optVar = alternative variable (string)
##########################
mlogitSum <- function(f, fData, cVar="choice", optVar="option"){
library(mlogit)
r2 <- mlogit(as.formula(f), shape = "long", data = fData, alt.var=optVar, choice = cVar)
return(summary(r2))
}
Apparently there is an environment problem, so that variables not declared globally are not found by the mlogit() function as arguments.
This example doesn't work:
mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/mlogit.csv"))
attach(mydata)
library(mlogit)
mydata$brand<-as.factor(mydata$brand)
mlData<-mlogit.data(mydata, varying=NULL, choice="brand", shape="wide")
myFormula <-"brand~1|female+age"
var1 <- "brand"
var2 <- "alt"
mlogitSum(myFormula, fData = mlData, var1, var2)
While if the variables are assigned in the main environment it works:
mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/mlogit.csv"))
attach(mydata)
library(mlogit)
mydata$brand<-as.factor(mydata$brand)
fData<-mlogit.data(mydata, varying=NULL, choice="brand", shape="wide")
myFormula <-"brand~1|female+age"
cVar <- "brand"
optVar <- "alt"
mlogitSum(myFormula, fData, cVar, optVar)
Alternatively it works if I assign the variables globally from inside the function
#-------------------------
# DEMO FUNCTION
#-------------------------
# f = formula (string)
# fData = data.frame
# cVar = choice variable (string)
# optVar = alternative variable (string)
##########################
mlogitSum_rev <- function(f, fData, cVar="choice", optVar="option"){
fData<<-fData
cVar<<-cVar
optVar<<-optVar
#return(head(lcmData))
library(mlogit)
#mi serve per poi estrarre model.matrix(r2), per il resto sarebbe ridondante
r2 <- mlogit(as.formula(f), shape = "long", data = fData, alt.var=optVar, choice = cVar)
return(summary(r2))
}
mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/mlogit.csv"))
attach(mydata)
library(mlogit)
mydata$brand<-as.factor(mydata$brand)
mlData<-mlogit.data(mydata, varying=NULL, choice="brand", shape="wide")
myFormula <-"brand~1|female+age"
var1 <- "brand"
var2 <- "alt"
mlogitSum_rev(myFormula, mlData, var1, var2)
Any idea on how to avoid to assign the variables globally?
tl;dr this appears to be a bug in mlogit, which you can fix yourself (see below) or ask the maintainer to fix.
Deep inside mlogit, the function tries to evaluate the data as follows:
nframe <- length(sys.calls()) ## line 11
...
data <- eval(mldata, sys.frame(which = nframe)) ## line 44
This is moderately sophisticated messing about with R's scoping structures -- it's trying to evaluate mldata in the frame one above the current frame, and it will fail if someone does something tricky (but perfectly reasonable!) like call mlogit from within a function.
I solved the problem (sort of!) by running fix(mlogit), which will dump you into an editor and allow you to modify the function. I changed line 44 to
data <- eval(mldata, parent.frame())
after which the code seemed to work.
If this works for you, you can either (1) fix() mlogit every time you need to use it: (2) download a copy of the source (.tar.gz) package, modify it, and install it; or (3) [preferably!] contact the package maintainer, let them know about the issue, and ask them to release a patched version ...
PS depending on your general data analysis protocol, you may want to get out of the habit of using attach: Why is it not advisable to use attach() in R, and what should I use instead?

Resources