I have a function called "p". In this function, there is an object called "marginal" which I need to make recognizable and retrievable outside this function when called. Right now, when I call the object "marginal" outside the function, I get an wrror message that:
object 'marginal' not found
Question: How can I make "marginal" recognizable outside the function?
p <- function(t, N1, N2=NULL, delta) {
efN = ifelse(is.null(N2), N1, N1*N2/(N1+N2))
df = ifelse(is.null(N2), N1 - 1, N1 + N2 - 2)
prior <- function(delta) dnorm(delta, 0, 1)
likelihood <- function(delta) dt(t, df, delta*sqrt(efN) )
marginal <- integrate(function(x) prior(x)*likelihood(x), -Inf, Inf)[[1]]
post <- function(x) prior(x)*likelihood(x) / marginal
return(post(delta))
list(marginal) ## What to use instead of list to object "marginal" recognizable
# outside the function?
}
marginal
object 'marginal' not found
The best solution for you would be to refactor your current function, possibly into smaller functions, in such a way that you can easily get the value of marginal separately. Then, use it to complete your current calculation.
That being said, if you want a quick fix you could resort to using the parent scope assignment operator <<-, e.g.
marginal <- NULL
p <- function(t, N1, N2=NULL, delta) {
efN = ifelse(is.null(N2), N1, N1*N2/(N1+N2))
df = ifelse(is.null(N2), N1 - 1, N1 + N2 - 2)
prior <- function(delta) dnorm(delta, 0, 1)
likelihood <- function(delta) dt(t, df, delta*sqrt(efN) )
# note the assignment being done here carefully
marginal <<- integrate(function(x) prior(x)*likelihood(x), -Inf, Inf)[[1]]
post <- function(x) prior(x)*likelihood(x) / marginal
return(post(delta))
}
# now marginal contain the value assigned in the function call above
However, it is usually not recommended to take such an approach. I only offer it as a quick fix, with the strong suggestion that you rethink your code design.
To access multiple values from inside a function, make sure all the values you want to access get returned by putting them in a list. Then return the list, assign the results of the function call, and access the values you want:
multi_return = function() {
x = 3
y = 4
res = list(x = x, y = y)
return(res)
}
results = multi_return()
y = results$y
y
# Output:
## [1] 4
This kind of approach is generally preferred over having functions make global assignments, as then calling a function can have unpredictable side effects.
Related
I am trying to exercise a simulation of Sierpinski triangle in R with affine transformation and Iterated Function System (IFS). And hopefully, I can further exercise how the simulation of Barnsley's fern can also be done. For those who know Chinese, this video is my starting point of this exercise.
Here is a short introduction of the simulation process:
Create an equilateral triangle, name the vertices A, B, C
Create a random initial point lying inside the triangle ABC
Sample A, B, C with equal chances
If the outcome is A, then move the initial point to the midpoint of A and itself
Repeat step 3, and move the last point to the midpoint of the outcome point and itself.
By doing this repeatedly, we should see the path of the points looks like a Sierpinski triangle.
I wonder how the assignment of variable works inside a self-defined function. I would like to create an object (a matrix or a dataframe) to store the path of simulated points and keep updating the object to keep track of how the points move.
the following is my current codes:
# create the triangle
triangle <- matrix(c(A = c(-1,0),
B = c(1, 0),
C = c(0, sqrt(3))),
byrow = TRUE, nrow = 3, ncol = 2)
colnames(triangle) <- c("X", "Y") # axis name
rownames(triangle) <- c("A", "B", "C")
# sample an initial point inside the triangle ABC
sampleInit <- function(){
X <- runif(1, min = -1, max = 1)
Y <- runif(1, min = 0, max = sqrt(3))
if( (Y >= 0) && (Y <= (sqrt(3)*X + sqrt(3))) && (Y <= -sqrt(3)*X+sqrt(3)) ){
return(cbind(X, Y))
} else {
sampleInit()
}
}
### graph: plot the triangle and the initial point together
graphics.off()
plot(triangle, xlim = c(-1, 1), ylim = c(0, sqrt(3)))
par(new = TRUE)
plot(sampleInit(), xlim = c(-1, 1), ylim = c(0, sqrt(3)), col = "red")
### a three-sided dice: determine the direction to move along
diceRoll <- function(){
return(sample(c("A", "B", "C"), size = 1, prob = c(1/3, 1/3, 1/3)))
}
## path
stepTrace <- as.data.frame(sampleInit())
move <- function(diceOutCome, stepTrace){
lastStep <- tail(stepTrace, 1)
if(diceOutCome == "A"){
X <- (-1 + lastStep[,1])/2
Y <- (0 + lastStep[,2])/2
} else if(diceOutCome == "B"){
X <- (1 + lastStep[,1])/2
Y <- (0 + lastStep[,2])/2
} else if(diceOutCome == "C"){
X <- (0 + lastStep[,1])/2
Y <- (sqrt(3) + lastStep[,2])/2
}
lastStep <- cbind(X, Y)
stepTrace <- rbind(stepTrace, lastStep)
}
move(diceRoll(), stepTrace)
View(stepTrace)
Sorry for the long story and not jumping to the key question directly. My question is that stepTrace (the object I would like to store the path) didn't get updated as I execute the last two lines.
What I imagined was the assignment process in move() updates the dataframe stepTrace, however it turns out it doesn't. I check my code in the debugger, and found out that stepTrace did get updated inside the function call, but it didn't pass the new assigned value outside the function call. That's why I would like to ask how does the assignment process works in R. What is the difference between the this kind of process and other general purpose languages such as Java? (What I imagined to do this exercise in Java would not encounter this kind of assignment issue. Correct me if I am wrong since I am still new to Java)
Similar problems bother me when I tried to assign variables inside a loop. I know there is a base function assign that helps to resolve is issue, but I just don't know what is the mechanism behind it.
I tried to google my question, but I am not sure which keyword I should use, and I didn't find direct answers to my question. Any comment, keyword or external resource to the documentation is appreciated!
In short, your move function does what you want, but it is not advisable to write it like that. In its current form, stepTrace is updated in the function's local environment, but not in the global environment, where your stepTrace lives. They are not the same stepTrace. To fix it, you can run stepTrace <- move(diceRoll(), stepTrace), but beware of the second circle. For a cleaner approach, remove the last stepTrace assignment from move.
From ?return: If the end of a function is reached without calling return, the value of the last evaluated expression is returned.
Consider the following examples:
x <- 5
a <- b <- c <- d <- 1
f1 <- function(x) x + 1
f2 <- function(x) return(x + 1)
f3 <- function(x) x <- x + 1
f4 <- function(x) x <<- x + 1
f1(1)
f2(1)
f3(1) # your problem
f4(1) # x gets replaced with x in f4, 2 in global environment.
a <- b <- c <- d <- 1
a <- f1(1)
b <- f2(1)
c <- f3(1)
d <- f4(1)
f3 and f4 are generally considered bad practice because of side effects, i.e. they (can) modify a non-local variable, f2 might trigger a discussion. For f3, see the result of
c(f3(1))
#> [1] 2
Given our experiment of calling f3(1) by itself, we'd expect a vector of length 0 (?). Consider removing any assignment as the last operation within your functions, and avoid naming your function arguments the same as the objects you intend to change.
#DonaldSeinen explained how to fix your code in his answer. I'll try to point you to documentation for more details.
First, you don't need to go to external documentation. An Introduction to R and The R Language Definition manuals are included in R distributions. The Introduction describes what's going on in lots of detail in section 10.7, "Scope". There's a different description in the Language Definition in section 3.5, "Scope of Variables".
Some people find the language in those manuals to be too technical. An easier to read external reference that gets it right is Wickham's Advanced R, readable online at https://adv-r.hadley.nz/. Scoping is discussed in chapters 6 and 7, especially sections 6.4 and 7.2.
I am working on an R project, and I have many different functions (I'm calculating RMSEs on various data sets with various requirements).
I am currently using the "do.call()" function to invoke the function name I'm passing in, but
this causes my whole system to stall and nothing works. This has happened many times over, and I've had to restart R Studio (using version 4.0.2).
I would like to pass in a function as an argument into my parent function (which is recursive but only to 2 passes), and I would like to be able to pass in the parameters from the parent function to the child functions, as well as the recursive function call.
I'm not sure of the correct execution of this.
Any help on where I'm going wrong is greatly appreciated.
Currently, my code is as follows:
#find_generic_lambda is the parent function that is called, and the FUN argument is the named function I would like to pass in to execute inside
find_generic_lambda <- function(seq_start, seq_end, seq_increment, FUN, detailed_flag = FALSE, training_set, testing_set)
{
lambdas <- seq(seq_start, seq_end, seq_increment)
params = c(lambdas, train_set, test_set)
#invoking the passed-in function here with the parameters I'm setting
#this is where the code stumbles
RMSE <- sapply(lambdas, do.call(FUN, params))
#find the smallest lamdba
qplot(lambdas, RMSE)
#saving the first round lambda
min_lambda_first_try <- lambdas[which.min(RMSE)]
min_lambda_first_try
if (detailed_flag)
{
#if this is the first iteration of the function, continue with taking a 10% lower and 10% higher lambda value to iterate through new lambdas that are much more granuluar, with increments at 10% of what they were previously.
new_lambda_range = (seq_end + seq_start)/10
new_lambda_range
min_lambda_first_try <- find_generic_lambda(seq_start = min_lambda_first_try - new_lambda_range, seq_end = min_lambda_first_try + new_lambda_range,
seq_increment = seq_increment/10, FUN, detailed_flag = FALSE, training_set = training_set, testing_set = testing_set)
}
return (min_lambda_first_try)
}
#this is one of the functions that will be passed in as a parameter
regularized_rmse_3 <- function(l, train_set, test_set)
{
mu <- mean(train_set$rating)
just_the_sum <- train_set %>%
group_by(movieId) %>%
summarize(s = sum(rating - mu), n_i = n())
predicted_ratings <- test_set %>%
left_join(just_the_sum, by='movieId') %>%
mutate(b_i = s/(n_i+l)) %>%
mutate(pred = mu + b_i) %>%
pull(pred)
return(RMSE(predicted_ratings, test_set$rating))
}
rmse3_lambda <- find_generic_lambda(seq_start=0, seq_end=10, seq_increment=0.5,
FUN="regularized_rmse_3",
detailed_flag = TRUE, training_set=training_set, testing_set=testing_set)
Expanding on my comments:
Here's a simplified version of your functions (so I can make example dataset) -
f <- function (l_candidate, FUN) {
RMSE <- sapply(l_candidate, FUN)
l_min_RMSE <- l_candidate[which.min(RMSE)]
return(l_min_RMSE)
}
g <- function (l, trainset, testset) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
trainset <- c(1, 1, 2, 1)
testset <- c(3, 4)
Then:
f(1:5, FUN = function (x) g(x, trainset, testset))
# [1] 2
So you pass the function g via a wrapper function into f and it will do the job for you.
Alternative
R allows you to create a function out of another function:
g <- function (trainset, testset) function (l) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
g1 <- g(trainset, testset)
g1(1)
# [1] 1.346291
In this situation, g() takes two arguments, and return a function that takes 1 argument l. So you can create a new function g1() out of g().
Then you can pass it to your parent function giving you the same results in this example:
f(1:5, FUN = g1)
# [1] 2
I'm trying to recreate the functionality of the memoise package in base R by saving the outputs of a recursive function in a data frame. I have this function "P" and then I made this "metaP" wrapper that will run P(n) if metaP(n) hasn't been run before and then save the results of P(n), or it produces the previously saved output. My issue is it only works at the first level. If I run metaP(5) it will save the output of metaP(5), but in order to get P(5) it also had to calculate P(4) and the results of P(4) aren't getting saved. I'm assuming it's getting lost in the recursive environments, but when I tried using the assign function and setting it to the global environment it still didn't work.
In the example below, I run metaP 5 through 10, and df has 5 through 10 saved, but it doesn't have 1 through 5 saved, some of which must have been calculated to come up with the answers of 5 through 10.
df <- data.frame(n = 0, pn = 1)
metaP <- function(n) {
if (!n %in% df$n) df <<- rbind(df, data.frame(n = n, pn = P(n)))
df[df$n == n, "pn"]
}
P <- function(n) {
if (n < 0) return(0)
k <- rep(1:((sqrt(24 * n + 1) + 1) / 6), each = 2) * c(1, -1)
return(sum((-1) ^ (k + 1) * sapply(n - k * (3 * k - 1) / 2, metaP)) %% 1e6)
}
sapply(5:10, metaP)
df
The issue here is kind of subtle. The expression
df <<- rbind(df, data.frame(n = n, pn = P(n)))
is ambiguous, because the ?rbind documentation doesn't define the order in which the two arguments to rbind() are evaluated. It appears that R is evaluating df, then doing the recursive call, then appending that result to the saved value of df. Any changes to the global variable that happened during the recursive call are lost.
To fix this, rewrite the conditional part as
if (!n %in% df$n) {
newval <- data.frame(n = n, pn = P(n))
df <<- rbind(df, newval)
}
(I'd also suggest adding parens to the test, and writing it as if (!(n %in% df$n)), because it's not immediately obvious that these are the same. I was confused about this in an earlier answer to this question. But checking ?Syntax shows that %in% has higher priority than !.)
I want to use the ICC::ICCbare function within a loop. However, the ICCbare uses the concrete variable names as input, e.g.:
ICCbare(x = group, y = variable1, data = dat)
whereby both "group" and "variable1" are columns of the data.frame "dat" (i.e., dat$variable1); ICCbarecannot be used with y = dat[, i].
In order to program a loop I therefore need to evaluate some R code within the function call of ICCbare. My idea was the following:
for(i in 1:10){
ICCbare(group, names(dat)[i], data = dat)
}
However, this does not work. The following error is printed:
Error in '[.data.frame`(data, yc) : undefined columns selected'
Is there a way to evaluate the statement names(dat)[i]) first before it is passed to the function call?
Here is a minimum working example for my problem:
# Create data set
dat <- data.frame(group=c(rep("A",5),
rep("B",5)),
variable1=1:10,
variable2=rnorm(10))
# Loop
for (i in names(dat)[2:3]){
ICCbare("group", i, data = dat)
}
I agree with #agstudy. This is a bad example of non-standard evaluation. You can use this as a workaround:
v <- "variable1"
ICCbare("group", v, data = dat)
#Error in `[.data.frame`(data, yc) : undefined columns selected
eval(bquote(ICCbare("group", .(v), data = dat)))
#$ICC
#[1] 0.8275862
It is a bug in ICCbare that try to to manage arguments as name in a bad manner.
function (x, y, data)
{
ICCcall <- Call <- match.call()
xc <- as.character(ICCcall[[2L]]) ## this is ugly!
yc <- as.character(ICCcall[[3L]])
inds <- unique(data[xc])[[1]]
tdata <- data.frame(data[yc], data[xc])
Personally I would remove the first lines and just use assume that arguments are just column names.
ICCbare_simple <-
function (xc, yc, data)
{
## remove lines before this one
inds <- unique(data[xc])[[1]]
## the rest of the code
.....
}
I'm the maintainer of ICC and I want to thank you for the excellent discussion. I know this is a very late reply, but I just updated the package and the new version (v2.3.0) should fix the "ugly" code and the problem encountered by the OP. See examples in this gist.
I just wanted to post this here in case anyone was searching with a similar problem. Thanks again, sorry for the delay.
Here is the content of the gist:
ICC non-standard evaluation examples
The ICC package for R calculates the intraclass correlation coefficient (ICC) from a one-way analysis of variance. Recently, the package was updated to better execute R's non-standard evaluation within each function (version 2.3.0 and higher). The package functions should now be able to handle a range of possible scenarios for calling the functions in, what I hope, is a less grotesque and more standard way of writing R functions. To demonstrate, below are some of those scenarios. Note, the examples use the ICCbare function, but the way in which the function arguments are supplied will apply to all of the functions in ICC.
First, load the package (and make sure the version is >2.3.0)
library(ICC)
packageVersion("ICC")
Columns of a data.frame
Here we supply the column names and the data.frame that contains the data to calculate the ICC. We will use the ChickWeight data fame.
data(ChickWeight)
ICCbare(x = Chick, y = weight, data = ChickWeight)
#$ICC
#[1] 0.1077609
Iterating through columns of a data.frame
In this case, we might have a data.frame in which we want to estimate the ICC for a number of different types of measurements that each has the same grouping or factor variable (e.g., x). The extreme of this might be in a simulation or bootstrapping scenario or even with some fancy high-throughput phenotyping/data collection. The point being, we want to automate the calculation of the ICC for each column.
First, we will simulate our own dataset with 3 traits to use in the example:
set.seed(101)
n <- 15 # number of individuals/groups/categories/factors
k <- 3 # number of measures per 'n'
va <- 1 # variance among
icc <- 0.6 # expected ICC
vw <- (va * (1 - icc)) / icc # solve for variance within
simdf <- data.frame(ind = rep(LETTERS[1:n], each = k),
t1 = rep(rnorm(n, 10, sqrt(va)), each = k) + rnorm(n*k, 0, sqrt(vw)),
t2 = rep(rnorm(n, 10, sqrt(va)), each = k) + rnorm(n*k, 0, sqrt(vw)),
t3 = rep(rnorm(n, 10, sqrt(va)), each = k) + rnorm(n*k, 0, sqrt(vw)))
Two ways to run through the columns come to mind: iteratively pass the name of each column or iteratively pass the column index. I will demonstrate both below. I do these in for loops so it is easier to see, but an easy extension would be to vectorise this by using something from the apply family of functions. First, passing the name:
for(i in names(simdf)[-1]){
cat(i, ":")
tmp.icc <- ICCbare(x = ind, y = i, data = simdf)
cat(tmp.icc, "\n")
}
#t1 : 0.60446
#t2 : 0.6381197
#t3 : 0.591065
or even like this:
for(i in 1:3){
cat(paste0("t", i), ": ")
tmp.icc <- ICCbare(x = ind, y = paste0("t", i), data = simdf)
cat(tmp.icc, "\n")
}
#t1 : 0.60446
#t2 : 0.6381197
#t3 : 0.591065
Alternatively, pass the column index:
for(i in 2:ncol(simdf)){
cat(names(simdf)[i], ": ")
tmp.icc <- ICCbare(x = ind, y = simdf[, i], data = simdf)
cat(tmp.icc, "\n")
}
#t1 : 0.60446
#t2 : 0.6381197
#t3 : 0.591065
Passing a character as an argument is deprecated
Note that the function will still work if a character is passed directly (e.g., "t1"), albeit with a warning. The warning just means that this may no longer work in future versions of the package. For example:
ICCbare(x = ind, y = "t1", data = simdf)
#[1] 0.60446
#Warning message:
#In ICCbare(x = ind, y = "t1", data = simdf) :
# passing a character string to 'y' is deprecated since ICC version
# 2.3.0 and will not be supported in future versions. The argument
# to 'y' should either be an unquoted column name of 'data' or an object
Note, however, that an expression evaluating to a character (e.g., paste0("t", 1)) doesn't throw the warning, which is nice!
I have a function that I use to get a "quick look" at a data.frame... I deal with a lot of survey data and this acts as a quick tool to see what's what.
f.table <- function(x) {
if (is.factor(x[[1]])) {
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
x <- na.omit(melt(x,c()))
x <- cast(x, variable ~ value, frequency)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
frequency <- function(x) {
x[x > 1] <- 1
x[is.na(x)] <- 0
x <- round(sum(x)/n, digits=2)
}
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(frequency, mean, sd, min, max))
x <- transform(x, variable=reorder(variable, frequency))
}
return(x)
}
What I find happens is that if I don't define "frequency" outside of the function, it returns wonky results for data frames with continuous variables. It doesn't seem to matter which definition I use outside of the function, so long as I do.
try:
n <- 100
x <- data.frame(a=c(1:25),b=rnorm(100),c=rnorm(100))
x[x > 20] <- NA
Now, select either one of the frequency functions and paste them in and try it again:
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
f.table(x)
Why is that?
Crucially, I think this is where your problem is. cast() is evaluating those functions without reference to the function it was called from. Inside cast() it evaluates fun.aggregate via funstofun and, although I don't really follow what it is doing, is getting stats:::frequency and not your local one.
Hence my comment to your Q. What do you wan the function to do? At the moment it would seem necessary to define a "frequency" function in the global environment so that cast() or funstofun() finds it. Give it a unique name so it is unlikely to clash with anything so it should be the only thing found, say .Frequency(). Without knowing what you want to do with the function (rather than what you thought the function [f.table] should do) it is a bit difficult to provide further guidance, but why not have .FrequencyNum() and .FrequencyFac() defined in the global workspace and rewrite your f.table() wrapper calls to cast to use the relevant one?
.FrequencyFac <- function(X, N) {
round(length(X)/N, digits=2)
}
.FrequencyNum <- function(X, N) {
X[X > 1] <- 1
X[is.na(X)] <- 0
round(sum(X)/N, digits=2)
}
f.table <- function(x, N) {
if (is.factor(x[[1]])) {
x <- na.omit(melt(x, c()))
x <- dcast(x, variable ~ value, .FrequencyFac, N = N)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(.FrequencyNum, mean, sd, min, max), N = N)
##x <- transform(x, variable=reorder(variable, frequency))
## left this out as I wanted to see what cast returned
}
return(x)
}
Which I thought would work, but it is not finding N, and it should be. So perhaps I am missing something here?
By the way, it is probably not a good idea to rely on function that find n (in your version) from outside the function. Always pass in the variables you need as arguments.
I don't have the package that contains melt, but there are a couple potential issues I can see:
Your frequency functions do not return anything.
It's generally bad practice to alter function inputs (x is the input and the output).
There is already a generic frequency function in stats package in base R, which may cause issues with method dispatch (I'm not sure).