Should attach be avoided in this situation? - r

Although there are some questions about this topic (e.g. this question), none of them answer my particular questions (as far as I could tell anyway).
Suppose I have a function which depends on a lot of parameters. For demonstration purposes I chose 3 parameters:
myfun <- function(x1, x2, x3){
some code containing x1, x2, x3
}
Often the input parameters are already contained in a list:
xlist <- list(x1 = 1, x2= 2, x3 = 3)
I want to run myfun with the inputs contained in xlist like this:
myfun(xlist$x1, xlist$x2, xlist$x3)
However this seems like too big of an effort (because of the high number of parameters).
So I decided to modify myfun: instead of all the input parameters. It now gets the whole list as one single input: at the beginning of the code I use attach in order to use the same code as above.
myfun2 <- function(xlist){
attach(xlist)
same code as in myfun containing x1, x2, x3
detach(xlist)
}
I thought that this would be quite a neat solution, but a lot of users advise to not use attach.
What do you think? Are there any arguments to prefer myfun over myfun2?
Thanks in advance.

I think you'd be better off using do.call. do.call will accept a list and convert them to arguments.
myfun <- function(x1, x2, x3){
x1 + x2 + x3
}
xlist <- list(x1 = 1, x2= 2, x3 = 3)
do.call(myfun, xlist)
This has the benefit of being explicit about what the arguments are, which makes it much easier to reason with the code, maintain it, and debug it.
The place where this gets tricky is if xlist has more values in it than just those required by the function. For example, the following throws an error:
xlist <- list(x1 = 1, x2 = 2, x3 = 3, x4 = 4)
do.call(myfun, xlist)
You can circumvent this by matching arguments with the formals
do.call(myfun, xlist[names(xlist) %in% names(formals(myfun))])
It's still a bit of typing, but if you're talking about 10+ arguments, it's still a lot easier than xlist$x1, xlist$x2, xlist$x3, etc.
LAP gives a useful solution as well, but would be better used to have with outside the call.
with(xlist, myfun(x1, x2, x3))

You could just use with():
xlist <- list(x1 = 1, x2= 2, x3 = 3)
FOO <- function(mylist){
with(mylist,
x1+x2+x3
)
}
> FOO(xlist)
[1] 6
I'm not convinced of this approach, though. The function would depend on the correctly named elements within the list.

My approach would be something like this:
testfun <- function (a_list)
{
args = a_list
print(args$x1)
print(args$x2)
print(args$x3)
}
my_list <- list(x1=2, x2=3, x3=4)
testfun(my_list)
However, you would need to know the names of the parameters within the function.
Perhaps the do.call() function can come into play here.
do.call('fun', list)

You could assign the list to the environment of the function:
myfun <- function(xlist) {
for (i in seq_along(xlist)) {
assign(names(xlist)[i], xlist[[i]], envir = environment())
}
# or if you dislike for-loops
# lapply(seq_along(xlist), function(i) assign(names(xlist)[i], xlist[[i]], envir = parent.env(environment())))
print(paste0(x2, x3)) # do something with x2 and x3
print(x1 * x3) # do something with x1 and x3
}
myfun(list(x1 = 4, x2 = "dc", x3 = c(3,45,21)))

Related

Improve perfomance of loop

I am struggling to improve the performance of the below code, which is running for about 2M entries. First, the condition was inside the loop, and now it is outside, and this brought some improvements, but not enough.
Do you have any other ideas?
if (Floor=="Yes") {
for (i in 1:length(X)){
base_short_term[i] <- pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
}
} else {
for (i in 1:length(X)){
base_short_term[i] <- pmin(numeric_vector5[i],(1+numeric_vector3[i])^((numeric_vector5[i])/(1+numeric_vector7[i]))
}
}
Loops are bad in R and should be avoided whenever possible. Here this is the case: a vectorized operation would be far more efficient (loops lead to memory overhead) and more readable code:
df <- data.frame(x1 = numeric_vector1,
x2 = numeric_vector2,
x3 = numeric_vector3,
x4 = numeric_vector4,
x5 = numeric_vector5,
x7 = numeric_vector7)
if (Floor == "yes"){
df$base_short_term <- pmax(df$x1, (1+df$x2)^(df$x3/df$x4))
} else{
df$base_short_term <- pmin(df$x5, (1+df$x3)^(df$x5/df$x7))
}
If loops cannot be avoided, it's better to use lapply or favor Rcpp
Update
If vectors have different length, you will loose performance because you will need to slice first from 1 to length(X) or use lapply
Slicing vector
df <- data.frame(x1 = numeric_vector1[seq_along(X)],
x2 = numeric_vector2[seq_along(X)],
x3 = numeric_vector3[seq_along(X)],
x4 = numeric_vector4[seq_along(X)],
x5 = numeric_vector5[seq_along(X)],
x7 = numeric_vector7[seq_along(X)])
(this solution is possible because even if vectors do not have the same length, you are only using indices up to length(X), for all your vectors)
lapply
Really looks like your for loop but more efficient since it avoids creating and dumping object at each iteration
For instance, if Floor is TRUE:
base_short_term <- lapply(seq_along(X), function(i), {
pmax(numeric_vector1[i],(1+numeric_vector2[i])^((numeric_vector3[i])/(1+numeric_vector4[i]))
})

How to use a dataframe in a function in r

I need to insert the variables of a dataframe into a function in r. The function in question is "y=[1- (x1-x2) / x3]". When I write, and enter the variables manually it works, however, I need to use the random numbers from the dataframe.
#Original function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(0.9, 0.5, 0.5)```
#Dataframe function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(x1 = x1, x2 = x2, x3 = x3, DATA = DF)
The first output is ok, however, the second output appears the error message. Error in f(VMB = VMB, VMR = VMR, DATA = DATA1) : unused argument (DATA = DATA1) I know I'm not properly inserting the dataframe into the code, but I'm already circling, can anyone help me?
As the comments suggest, your problem is that the function doesn't contain a data argument. R doesn't know where x1, x2, x3 comes from and will only look at through the global environment trying to find them. If these are contained in a data frame, it doesn't know that it should take them from there, and will fail.
For example
f <- function(x,y,z)
1 + (x-y)/z
f(0.9, 0.5, 0.5)
will work, because it knows where to retrieve the values. So will
x1 <- 0.9
x2 <- 0.5
x3 <- 0.5
f(x1, x2, x3)
because it looks through these environemnts, but
df <- data.frame(x = 0.9, y = 0.5, z = 0.5)
f(x, y, z) #fails
fails, because it doesn't look for them in df. Instead you can use
f(df$x, df$y, df$z)
with(df, f(x, y, z)) #same
which lets R know where to get the variables. (Here i used x, y and z to avoid conflict names)
If this function should always take a data.frame and use columns x1, x2, x3 you could use rewrite it to incorporate this, as below.
f <- function(df){
with(df, 1 + (x1-x2)/x3)
}

Rename objects in environment r

I'd like to rename objects in environment r. For example,
y1 <- vector('list', 3)
x1 <- matrix(0, 3, 3)
x2 <- matrix(1, 3, 3)
x3 <- matrix(2, 3, 3)
y1[[1]] <- x1
y1[[2]] <- x2
y1[[3]] <- x3
y2 <- vector('list', 3)
y2[[1]] <- x1
y2[[2]] <- x2
y2[[3]] <- x3
y <- new.env()
y$y1 <- y1
y$y2 <- y2
names(y)
names(y) <- c('a', 'b')
I expected that the name of lists inside y was a and b, that is, names(y) equals c('a', 'b'),
Obs.: I can't rename manually the variables y1 and y2, I need to change them inside the environment.
If you can’t assign them directly with the correct name, then the easiest is to replace the environment by a new one. If you absolutely need to preserve the environment (because it’s referenced elsewhere), you can replace its contents using the same trick:
objs = mget(ls(env), env)
rm(list = ls(env), envir = env)
list2env(setNames(objs, new_names), env)
The relevant part here is the last parameter to list2env: if you leave it off, this just creates a new environment. If you specify an existing environment, the names are added to that instead.
This code will leave hidden names (i.e. names starting with .) untouched — to change this, provide the all.names argument to ls, or use names.
R doesn't really have a built in operation to rename variables in any environment. YOu could write a simple helper function to do that.
env_rename <- function(e, new_names, old_names = names(e)) {
stopifnot(length(new_names)==length(old_names))
orig_val <- mget(old_names, envir=e)
rm(list=old_names, envir=e)
for(i in seq_along(old_names)) {
assign(new_names[i], orig_val[[i]], envir=e)
}
}
and call that with
env_rename(y, c("a","b"))
Do you really need an environment, or a list could do the job?
If so, you could rename the list items easily:
...
...
y=list()
y$y1 <- y1
y$y2 <- y2
names(y)=c('a','b')
names(y)
[1] "a" "b"
I have the opposite problem: getSymbols put the result in an environment and I changed it to a list to rename them:
acao
[1] "PETR4.SA" "VALE3.SA" "ITUB4.SA"
require(quantmod)
e1=new.env()
x=getSymbols(acao,env=e1)
e1=as.list(e1)
names(e1)
[1] "ITUB4.SA" "VALE3.SA" "PETR4.SA"
names(e1)=sub('.SA$','',names(e1))
names(e1)
[1] "ITUB4" "VALE3" "PETR4"

Looping multiple listed data frames into a single function

I am trying to execute the function varipart() from the package ade4. I am trying to use the same number dataframe from each list in the different parts of the same function. I need to pass this for each set of dataframes.
########### DATA BELOW
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
d3 <- data.frame(y1 = c(2, 1, 2), y2 = c(5, 6, 4))
spec.list <- list(d1, d2, d3)
d1 <- data.frame(y1 = c(20, 87, 39), y2 = c(46, 51, 8))
d2 <- data.frame(y1 = c(30, 21, 12), y2 = c(61, 51, 33))
d3 <- data.frame(y1 = c(2, 11, 14), y2 = c(52, 16, 1))
env.list <- list(d1, d2, d3)
d1 <- data.frame(y1 = c(0.15, 0.1, 0.9), y2 = c(0.46, 0.51, 0.82))
d2 <- data.frame(y1 = c(0.13, 0.31, 0.9), y2 = c(0.11, 0.51, 0.38))
d3 <- data.frame(y1 = c(0.52, 0.11, 0.14), y2 = c(0.52, 0.36, 0.11))
spat.list <- list(d1, d2, d3)
###############
# I have tried two ways
library(parallel)
library(ade4)
output_varpart <- mclapply(spec.list, function(x){
varipart(x, env.list, spat.list, type = "parametric")
})
output_varpart <- mclapply(x, function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
})
for(i in 1:length(x)){
results <- varipart(spec.list, env.list, spat.list, type = "parametric")
}
None of these methods work! Please be gentle, I'm new to list syntax and looping. Errors are "Warning message:
In mclapply(output.spectrans.dudi, function(x) { :
all scheduled cores encountered errors in user code" and "Error in x * w : non-numeric argument to binary operator", respectively.
You were close, but I'll explain a bit how lapply (and mclapply) work, because it feels like you're mixing up what the role of x is. First, this should work:
output_varpart <- mclapply(1:3, function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
})
But why?
The function lapply means: apply a function (2nd argument) to all values in a list (first argument). So lapply(list('Hello', 'World', '!'), print) will do
print('Hello')
print('World')
print('!')
and it will return a list of length 3 with the results (the return of print is the value that was printed)
But quite often, there is not one function that does exactly what you want. You can always define a function, like this:
my_vari_fun <- function(index) {
varipart(spec.list[[index]], env.list[[index]], spat.list[[index]], type = "parametric")
}
You can then call it like my_vari_fun(1), and it doesn't matter at all if the argument is called x or index, or something else. I'm sure you get it. So a next step would be
output_varpart <- lapply(list(1,2,3), my_vari_part)
The disadvantage of this is that it takes multiple lines of code, and we probably won't use my_vari_fun again. So that's the reason we can provide an anonymous function, we just give a function to lapply without assigning it to a name. We just replace my_vari_fun with it's "value" (which happens to be a function).
However, outside this function, x doesn't mean anything. We could as well have called it any other name.
We just need to tell lapply what values to input: list(1,2,3). Or simpler as a vector, which lapply will convert: 1:3
By the way, I've just inserted 3 here, but for the general case you can use 1:length(spec.list), you just have to make sure all lists are the same length.
Finally, I've talked about lapply now, but it all works the same for mclapply. The difference is only under the hood, mclapply will spread its work over multiple cores.
Edit: debugging
In debugging, there is more difference between lapply and mclapply. I will first talk about lapply.
If there is some error in your code that gets executed inside the lapply, the entire lapply will fail, and nothing gets assigned. Which sometimes makes it hard to spot exactly where an error takes place, but it can be done. A simple workaround may be feeding lapply just parts of your input, to see where it breaks.
But R also comes with some debugging tools, where execution is freezes as soon as an error is encountered. I find recover the most useful tool.
You can set it by options(error=recover), and every time an error is encountered, it gives you a backwards list of the function that threw the error, by which function it was called, by which function that was called, ...
Then you can choose a number to explore the environment in which that function was running. When I try to emulate your error, I get this:
Error in x * w : non-numeric argument to binary operator
Enter a frame number, or 0 to exit
1: source("~/.active-rstudio-document")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: .active-rstudio-document#20: lapply(1:3, function(x) {
varipart(spec.list[[x]], env.list[[x]], spat.list[
6: FUN(X[[i]], ...)
7: .active-rstudio-document#21: varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
8: as.matrix(scalewt(Y, scale = scale))
9: scalewt(Y, scale = scale)
10: apply(df, 2, weighted.mean, w = wt)
11: FUN(newX[, i], ...)
12: weighted.mean.default(newX[, i], ...)
A lot of them are internal functions by R, and you can see what varipart does: it passes on stuff to lower functions, who pass it on, etc.
For our purposes, we want number 6: here the lapply calls your function, with the i-th input value.
As soon as we enter 6, we get a new prompt, that reads Browse[1]> (in some cases it may be another number), and we are in the environment as if we just entered our
function(x){
varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
}
Which means typing x will give you the value for which this function fails, and spec.list[[x]] etc. will tell you for which inputs varipart failed. Then the final step is deciding what this means: either varipart is broken, or one of your inputs is.
In this case, I noticed I can get the same error by having one of the columns in the data.frame something else then numeric. But you'll have to look whether that is your problem as well, but debugging becomes a whole lot easier if you've figured out where the problem is.
With mclapply
mclapply runs on multiple cores, which means that if there is an error in one core, the other cores still finish their jobs.
For calculations where a forked process encountered an error, that error will be the return value, in the form of a try-error-object.
But note that that will be the case for other iterations by the same core as well. So if for mclapply(1:10, fun), fun(1) will throw an error, in the case of 2 cores, all odd inputs will show that error.
So we can look at the return value, to narrow our search down:
sapply(output_varpart, class)
The error(s) is/are in the iterations where the output-class is try-error, but we can't know exactly which one.
How to practically solve it depends on the size of the calculations.
If they were really extensive, it may be worth it to keep the values that did succeed, and narrow it down again by re-running only the failed parts.
Or if I just see one try-error, we don't need look any further.
But usually, I find it most useful to change the mclapply to a regular lapply, and use the approach above.

Determine if a sequence has "gaps" in R

I would like to determine if a sequence contains any gaps or irregular steps? Not sure if this is the right way to phrase this and there's a good chance that this is duplicate (but I was unable to find a good question).
The following has_gap function gives me the correct results, but seems a bit clunky? Perhaps there is something built-in that I haven't discovered?
x1 <- c(1:5, 7:10)
x2 <- 1:10
x3 <- seq(1, 10, by = 2)
x4 <- c(seq(1, 6, by = 2), 6, seq(7, 10, by = 2))
has_gap <- function(vec) length(unique(diff(vec))) != 1
vecs <- list(x1, x2, x3, x4)
sapply(vecs, has_gap)
# [1] TRUE FALSE FALSE TRUE
library(zoo)
is.regular(x3, strict=TRUE)
is.regular(x3, strict=FALSE)
As noted by G. Grothendieck in the comments, one approach is:
has_gaps <- \(x)!!diff(range(diff(x)))
Another approach might be:
has_gaps2 <- \(x)var(diff(x))>0
If performance is an issue, rawr suggested:
has_gaps3 <- \(x)!isTRUE(all.equal(cor(x,seq_along(x)),1))

Resources