I'm attempting to use the bnlearn package to calculate conditional probabilities, and I'm running into a problem when the "cpquery" function is used within a loop. I've created an example, shown below, using data included with the package. When using the cpquery function in a loop, a variable created in the loop ("evi" in the example) is not recognized by the function. I receive the error:
Error in parse(text = evi) : object 'evi' not found
The creation steps of "evi" are based on examples provided by the author.
Any help you could provide would be great. I'm desperate to find a way that I can apply the cpquery function for a large number of observations.
library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
bn.function <- function(network, evidence_data) {
a <- NULL
b <- nrow(evidence_data)
for (i in 1:b) {
evi <- paste("(", names(evidence_data), "=='",
sapply(evidence_data[i,], as.character), "')",
sep = "", collapse = " & ")
a[i] <- cpquery(network, (C=='c'), eval(parse(text=evi)))
}
return(a)
}
test <- bn.function(fitted, learning.test)
Thanks in advance!
I don't know if this is due to a bugfix or just because I tried another approach - anyways, looping works if you iteratively build up the evidence list outside of the cpquery-function.
An example for an iteration through a list called evidenceData with all-positive evidences:
for(i in names(evidenceData)){
loopEvidenceList <- list()
loopEvidenceList[[i]] <- "TRUE"
a =cpquery(fitted = bayesNet, event = queryNode == "TRUE",
evidence = loopEvidenceList, method = "lw", n = 100000)
print(a)
}
Depending on the way your evidence is availible, you might need more sophisticated preparation of the "loopEvidenceList" but once you got that prepared, it works fine.
To avoid the scoping problem, you can postpone the call to eval and do it inside the cpquery function. If you directly pass evi (the character variable) to cpquery and then parse it inside the definition, the chain of environments gets shifted and cpquery will have access to evi.
You can use m.cpquery <- edit(cpquery) to fork your own version of the function and insert the following line at its beginning:
evidence = parse(text = evidence)
and then save your new function.
So the heading of m.cpquery will look like:
> m.cpquery
function (fitted, event, evidence, cluster = NULL, method = "ls",
..., debug = FALSE)
{
evidence = parse(text = evidence)
check.fit(fitted)
check.logical(debug)
...
Now you can use m.cpquery in your own function like before, except we'll pass the plain character variable to it:
a[i] <- m.cpquery(network, (C=='c'), evi)
Note that in the first line of m.cpquery, we only parsed the evidence character variable and didn't call eval on it. cpquery is a front-end to conditional.probability.query (see here) and we're relying on conditional.probability.query's subsequent call to eval.
I should say that this is a rather ugly workaround. And it only works if you are using logic sampling (method='ls'). But if you want to use likelihood weighting, the check.mutilated.evidence function will raise an error. I haven't checked if injecting an eval expression before it gets called would result in a mayhem of subsequent errors leading to hell.
I feel like the problem is you are using the same variable in evidence as well as event. Learning.test contains the values of "C" variable. then we are trying to predict C as the event. Maybe using a subset of the original dataset excluding C will do the trick
Related
In my code there is a situation where I conditionally want to use one accessor function or another throughout the code. Instead of having an if-else statement for every time I want to pick which accessor to use and coding it explicitly, I tried to conditionally assign either of the accessor functions to a new function called accessor_fun and use it throughout the code, but this returns an error when I use the accessor function to reassign the values it accesses. Here is a simplified example of the problem I am having:
#reassigning the base r function names to a new function name
alt_names_fun <- names
example_list <- list(cat = 7, dog = 8, fish = 33)
other_example_list <- list(table = 44, chair = 101, desk = 35)
#works
alt_names_fun(example_list)
#throws error
alt_names_fun(example_list) <- alt_names_fun(other_example_list)
#still throws error
access_and_assign <- function(x, y, accessor) {
accessor(x) <- accessor(y)
}
access_and_assign(x = example_list, y = other_example_list, accessor = alt_names_fun)
#still throws error
alt_names_fun_2 <- function(x){names(x)}
alt_names_fun_2(example_list) <- alt_names_fun_2(other_example_list)
#works
names(example_list) <- names(other_example_list)
As you see if you try the code above, an example of the kind of error I am getting is
Error in alt_names_fun(example_list) <- alt_names_fun(other_example_list) :
could not find function "alt_names_fun<-"
So my question is, is there a way to do the reassignment of R accessor functions and use them in a way like I am trying to in the example above?
Accessor functions are really pairs of functions. One for retrieval and one for assignment. If you want to replicate that, you need to replicate both parts
alt_names_fun <- names
`alt_names_fun<-` <- `names<-`
The assignment versions have <- in their name. This is a special naming convection that R uses to find them. Since these are character normally not allowed in basic symbol names, you need to use the back ticks to enclose the function names.
I have script main.R, where I create inv_cov_mat variable. I later load metrics.R and use it to calculate function value (I use it as kind of inter-script function closure). I get error "object 'inv_cov_mat' not found". My code:
main.R:
knn <- function(...)
{
# some code
source("./source/metrics.R")
if (metric == "mahalanobis")
inv_cov_mat <- solve(cov(training_set))
# other code
# calculate distance in given metric between current vector and every row vector from training set matrix
distances <- apply(training_set, 1, metric, vec2=curr_vec) # error
metrics.R:
mahalanobis <- function(vec1, vec2)
{
diff <- vec1 - vec2
sqrt(t(diff) %*% inv_cov_mat %*% diff)
}
I've found simple, even if not elegant answer: use inv_cov_mat as global variable, not inside knn function. Then other scripts can see it.
It's not entirely clear what you want, but if I understand you correctly---you have a character string identifying the metric you want to use, and a function with the same name. So you should be able to use get to retrieve the function based on the name.
metric == "mahalanobis"
metric.fun = get(metric)
distances <- apply(training_set, 1, metric.fun, vec2=curr_vec)
That said, there are probably better ways to organize your code that would avoid this problem entirely, e.g. create a named list of functions for accessing metrics.
EDIT regarding the issue of inv_cov_mat, either pass it as an argument to your metric function or use get inside that function to access variables from the parent environment using the envir argument. Passing the variable as an argument to your metric function is definitely the better and cleaner approach.
I'm trying to pass two file paths as parameters to a function. But it's not accepting the inputs. Here's what I'm doing:
partition<-function(d1,p2){
d1<-read.table(file = d1, fill = TRUE)
p2<-read.table(file = p2, fill = TRUE)
}
and while calling the function:
partition("samcopy.txt","partcopy.txt")
The .txt is not being read by the variables inside the function. How to make the variables read the table?
AidanGawronSki's approach works, but from a programming standpoint should be avoided! Here is a more traditional answer to your problem.
partition<-function(d1,p2){
a <- read.table(file = d1, fill = TRUE)
b <- read.table(file = p2, fill = TRUE)
res <- list(a,b)
names(res) <- c(d1,p2)
res
}
To understand why the above approach is "better", it is important to understand what environments are and more generally the R scoping rules. Environments are essentially your workspace. For example, when you first open R and begin assigning objects, these objects are stored within the Global Environment. Another example of an environment is when you call a function, the function creates it own environment comprised of any parameters you have passed to the function. By doing this R ensures that when you call a function, it has no "side effects" or said another way it does not affect the global environment.
Let me show you an example. Imagine you begin an R session, and assign d1 <- 1 in your Global Environment. You're going to want to use d1 later on in your analysis and it would be a shame if it changed without you knowing it, right?
If you utilize AidanGawronSki's approach when you call
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
}
The d1 in your Global Environment will change to be read.table(file = d1, fill = TRUE). This is very very dangerous! A object you previously assigned to be one thing is now another thing and you are not even warned of this change.
The same problem, however, will never occur with the approach I have proposed. I strongly recommend you get in the habit of using this approach! If you don't any function can change things in your Global Environment without you knowing.
For more info read this, this or just google something like "functions with no side effects"
FYI there are also several other problems with your code. First you need to tell your function what to return. All you did is call a function, assign stuff to the local environment and then close the function. Functions will always return the last line (as long as it is not an assignment). This is why in my example, I put res as the last line of the function. Also you are not correctly assigning your object. You pass a string like d1 <- "text.txt", to your function and then ask your function to do the following, "text.txt" <- read.table("text.txt",...). That simply does not make sense. You need to assign the output from read.table to an object. In my example, I assign them to a and b.
use the super assignment operator <<-
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
p2 <<- read.table(file = p2, fill = TRUE)
}
calld=data.frame(matrix(rnorm(100*50,0,1),1000,50))
for (x in names(calld)) {
assign(paste("calld$",x,sep=""),pnorm(get(paste("calld$",x,sep="")),0,1,lower.tail=T,log.p=F))
}
Error in get(paste("calld$", x, sep = "")) : object 'calld$X1' not found
Am I using the get function correctly?? I am trying to concatenate the names of the data set via a loop and paste of it's existing valued by passing the values through a pnorm (cumulative normal distribution function). But I keep getting an error. The function works when I call the variable names in the "calld" dataframe. The problem is the concentration process of creating the loop. Where am I going wrong? I appreciate your help
Update::
I took your advice guys and reedited the loop, to.
for (n in names(calld)) {
get("calld")[[n]]=pnorm(get("calld")[[n]],0,1,lower.tail=T,log.p=F)
}
Error in get("calld")[[n]] = pnorm(get("calld")[[n]], 0, 1, lower.tail = T, :
target of assignment expands to non-language object
But now I am getting this new error. So everything on the right hand side of the equation in the loop when I tested it it works. The error arises when I set it the value equal to itself, replacing the prior values.
Have mercy on kittens!
You can't use assign this way, nor get.
calld[] <- lapply(calld, pnorm, mean = 0, sd = 1)
Explanantion: calld[]<- replaces all existing columns of calld (whilst retaining the structure as a data.frame) with the results of lapply(calld, pnorm, mean = 0, sd = 1) which cycles through all columns of calld, applying pnorm on each one.
library(fortunes)
fortune(312)
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad.
-- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]])
R-help (February 2012)
I am trying to write an R function that takes a data set and outputs the plot() function with the data set read in its environment. This means you don't have to use attach() anymore, which is good practice. Here's my example:
mydata <- data.frame(a = rnorm(100), b = rnorm(100,0,.2))
plot(mydata$a, mydata$b) # works just fine
scatter_plot <- function(ds) { # function I'm trying to create
ifelse(exists(deparse(quote(ds))),
function(x,y) plot(ds$x, ds$y),
sprintf("The dataset %s does not exist.", ds))
}
scatter_plot(mydata)(a, b) # not working
Here's the error I'm getting:
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'closure'
I tried several other versions, but they all give me the same error. What am I doing wrong?
EDIT: I realize the code is not too practical. My goal is to understand functional programming better. I wrote a similar macro in SAS, and I was just trying to write its counterpart in R, but I'm failing. I just picked this as an example. I think it's a pretty simple example and yet it's not working.
There are a few small issues. ifelse is a vectorized function, but you just need a simple if. In fact, you don't really need an else -- you could just throw an error immediately if the data set does not exist. Note that your error message is not using the name of the object, so it will create its own error.
You are passing a and b instead of "a" and "b". Instead of the ds$x syntax, you should use the ds[[x]] syntax when you are programming (fortunes::fortune(312)). If that's the way you want to call the function, then you'll have to deparse those arguments as well. Finally, I think you want deparse(substitute()) instead of deparse(quote())
scatter_plot <- function(ds) {
ds.name <- deparse(substitute(ds))
if (!exists(ds.name))
stop(sprintf("The dataset %s does not exist.", ds.name))
function(x, y) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
plot(ds[[x]], ds[[y]])
}
}
scatter_plot(mydata)(a, b)