"undefined columns selected" in lapply() call - r

Error in [.data.frame(meuse#data, , x) : undefined columns selected
MWE:
library(automap)
data(meuse)
coordinates(meuse) = ~ x+y
lapply(1:1, function (x) {
automap::autofitVariogram(meuse#data[, x] ~ 1, input_data = meuse)
})
Executing meuse#data[,1] outside the lapplycall works fine and returns a numeric vector.
Also automap::autofitVariogram(meuse#data[, 1] ~ 1, input_data = meuse) runs fine.
Hence I expected it the problem to be caused by the lapply call. However, using another dataset of mine (SpPointsDaFr) does not cause the problem and runs fine.
Looking at the error message more closely, I am not sure if the second "comma" after "meuse#data," is always present in 'subset' error messages?
Edit:
Another approach which does not work: Addressing via string (note that I only use [1:1] instead of [1] for further function use)
cols <- names(meuse#data) [1:1]
> lapply(cols, function (x) {
+ automap::autofitVariogram(meuse#data[, x] ~ 1, input_data = meuse)
+ })

I found a workaround. Addressing/subsetting the required values of meuse before the call of autofitVariogram and then putting the object tmp in works.
lapply(1:1, function (x) {
tmp <- meuse#data[, x]
emp.svgm <- automap::autofitVariogram(tmp ~ 1, meuse)
})
The error when trying to subset inside the function call is still open for discussion though.

Related

How to use strings from a list as variables in mediate and lm in R?

I am trying to run lots of mediation analyses and to make it quicker I'm trying to put the lm() and mediate() functions inside a for loop. I then pass a list of lists into the loop where each item of the list is a list of three in the form c("", "", "").
Passing the items into the loop and unlisting them to have single strings for X, M and Y variables is fine. I've tried many variations on get(), eval() and assign() within the mediate() function to no avail. I think this is due to my use of get() within lm().
The way I think my code should look:
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
model.M <- lm(get(M) ~ get(X), data = NewScDat)
model.Y <- lm(get(Y) ~ get(X) + get(M), data = NewScDat)
results <- mediate(model.M, model.Y, treat=get(X), mediator=get(M),
boot=TRUE, sims=500)
}
The model.M and model.Y bits work fine. It's the treat= and mediator= inside mediate() that I simply cannot figure out. I get this error:
Error in get(X) : object 'SCI' not found
If I change the mediate() call to include the variable names directly I get a different error:
results <- mediate(model.M, model.Y, treat='SCI', mediator='rMEQ',
boot=TRUE, sims=500)
Error in `[.data.frame`(m.data, , treat) : undefined columns selected
I then thought that lm() may be using "get(X)" as a variable name instead of "SCI" which is what get(X) spits out intially:
results <- mediate(model.M, model.Y, treat='get(X)', mediator='get(M)',
boot=TRUE, sims=500)
Error in get(M) : object 'rMEQ' not found
And just to test what's going on I looked at what get(X) and get(M) are now spitting out:
get(X)
Error in get(X) : object 'SCI' not found
get(M)
Error in get(M) : object 'rMEQ' not found
What I'm really trying to achieve is to be able to run mediate() inside a loop using a list of lists as described above. I'm doing this to avoid having multiple mediate() functions repeated with manual setup.
Here's my MWE of the successful solution:
library(mediation)
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
FormulaM <- paste(M,X,sep = " ~ ") # Results in a string "rMEQ ~ SCI"
FormulaY <- paste(Y,"~", X,"+",M,sep=' ') # Results in a string "SIDAS ~ SCI + rMEQ"
model.M <- lm(FormulaM, data=df)
model.Y <- lm(FormulaY, data=df)
results <- mediate(model.M, model.Y, treat=X, mediator=M,
boot=TRUE, sims=500)
}
Thanks for the tips and suggestions all. #Parfait - I've included the dput() but could you point me towards an FAQ or similar explaining the reasoning behind this?
EDIT - I understand what dput() is and does now so I've removed it from the MWE because I'd used it inappropriately.
Fuller example including useful recording of results for anyone that needs it:
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
NBootstraps = 5000
MediationResults <- list()
j <- 1
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
FormulaM <- paste(M,X,sep = " ~ ")
FormulaY <- paste(Y,"~", X,"+",M,sep=' ')
model.M <- lm(FormulaM, data = NewScDat)
model.Y <- lm(FormulaY, data = NewScDat)
MediationResults[[j]] <- summary(mediate(model.M, model.Y, treat=X, mediator=M,
boot=TRUE, sims=NBootstraps))
j <- j + 1
}

How to use lapply or a family of the apply function for calling a function within a function in R?

How to use lapply or a family of the apply function for calling a function within a function?
I have a parent function (i.e., hrat) that calls a sister function (i.e., drat) within it. I would like to apply this function over certain vector. I am providing a code to demonstrate my logic. I get following error message.
Code:
drat <- function(y){
x <- y * 5
return(x)
}
hrat <- function(z, j, drat){
y <- z +1
w <- drat(y) + j
return(w)
}
z <- c(1:5)
j <- 4
result <- lapply(z,j, function(x) hrat(x, drat(x)))
ERROR MESSAGE:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'j' of mode 'function' was not found
Any help will be appreciated. Thank you
To avoid confusion, it is better to have anonymous function call
lapply(z, function(x) hrat(x, drat))

R: eval parse function call not accessing correct environments

I'm trying to read a function call as a string and evaluate this function within another function. I'm using eval(parse(text = )) to evaluate the string. The function I'm calling in the string doesn't seem to have access to the environment in which it is nested. In the code below, my "isgreater" function finds the object y, defined in the global environment, but can't find the object x, defined within the function. Does anybody know why, and how to get around this? I have already tried adding the argument envir = .GlobalEnv to both of my evals, to no avail.
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval(y > x))
}
y <- 4
test <- function() {
x <- 3
return(eval(parse(text = str)))
}
test()
Error:
Error in eval(y > x) : object 'x' not found
Thanks to #MrFlick and #r2evans for their useful and thought-provoking comments. As far as a solution, I've found that this code works. x must be passed into the function and cannot be a default value. In the code below, my function generates a list of results with the x variable being changed within the function. If anyone knows why this is, I would love to know.
str <- "isgreater(y, x)"
isgreater <- function(y, x) {
return(eval(y > x))
}
y <- 50
test <- function() {
list <- list()
for(i in 1:100) {
x <- i
bool <- eval(parse(text = str))
list <- append(list, bool)
}
return(list)
}
test()
After considering the points made by #r2evans, I have elected to change my approach to the problem so that I do not arrive at this string-parsing step. Thanks a lot, everyone.
I offer the following code, not as a solution, but rather as an insight into how R "works". The code does things that are quite dangerous and should only be examined for its demonstration of how to assert a value for x. Unfortunately, that assertion does destroy the x-value of 3 inside the isgreater-function:
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval( y > x ))
}
y <- 4
test <- function() {
environment(isgreater)$x <- 5
return(eval(parse(text = str) ))
}
test()
#[1] FALSE
The environment<- function is used in the R6 programming paradigm. Take a look at ?R6 if you are interested in working with a more object-oriented set of structures and syntax. (I will note that when I first ran your code, there was an object named x in my workspace and some of my efforts were able to succeed to the extent of not throwing an error, but they were finding that length-10000 vector and filling up my console with logical results until I escaped the console. Yet another argument for passing both x and y to isgreater.)

How to debug "invalid subscript type 'list'" error in R (genalg package)

I am new to genetic algorithms and am trying a simple variable selection code based on the example on genalg package's documentation:
data(iris)
library(MASS)
X <- cbind(scale(iris[,1:4]), matrix(rnorm(36*150), 150, 36))
Y <- iris[,5]
iris.evaluate <- function(indices) {
result = 1
if (sum(indices) > 2) {
huhn <- lda(X[,indices==1], Y, CV=TRUE)$posterior
result = sum(Y != dimnames(huhn)[[2]][apply(huhn, 1,
function(x)
which(x == max(x)))]) / length(Y)
}
result
}
monitor <- function(obj) {
minEval = min(obj$evaluations);
plot(obj, type="hist");
}
woppa <- rbga.bin(size=40, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
The code works just fine on its own, but when I try to apply my dataset (here), I get the following error:
X <- reducedScaledTrain[,-c(541,542)]
Y <- reducedScaledTrain[,542]
ga <- rbga.bin(size=540, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, verbose=TRUE, monitorFunc=monitor)
Testing the sanity of parameters...
Not showing GA settings...
Starting with random values in the given domains...
Starting iteration 1
Calucating evaluation values... Error in dimnames(huhn)[[2]][apply(huhn, 1, function(x) which(x == max(x)))] :
invalid subscript type 'list'
I am trying to perform feature selection on 540 variables (I've eliminated the variables with 100% correlation) using LDA. I've tried transforming my data into numeric or list, but to no avail. I have also tried entering the line piece by piece, and the 'huhn' line works just fine with my data. Please help, I might be missing something...

I do not understand error "object not found" inside the function

I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.

Resources