Why is using `<<-` frowned upon and how can I avoid it? - r

I followed the discussion over HERE and am curious why is using<<- frowned upon in R. What kind of confusion will it cause?
I also would like some tips on how I can avoid <<-. I use the following quite often. For example:
### Create dummy data frame of 10 x 10 integer matrix.
### Each cell contains a number that is between 1 to 6.
df <- do.call("rbind", lapply(1:10, function(i) sample(1:6, 10, replace = TRUE)))
What I want to achieve is to shift every number down by 1, i.e all the 2s will become 1s, all the 3s will be come 2 etc. Therefore, all n would be come n-1. I achieve this by the following:
df.rescaled <- df
sapply(2:6, function(i) df.rescaled[df.rescaled == i] <<- i-1))
In this instance, how can I avoid <<-? Ideally I would want to be able to pipe the sapply results into another variable along the lines of:
df.rescaled <- sapply(...)

First point
<<- is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:
f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}
then,
> a <- 1
> f()
> a # the global `a` is not affected
[1] 1
Second point
You can do that by using Reduce:
Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)
or apply
apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})
But
simply, this is sufficient:
ifelse(df >= 2, df-1, df)

You can think of <<- as global assignment (approximately, because as kohske points out it assigns to the top environment unless the variable name exists in a more proximal environment). Examples of why this is bad are here:
Examples of the perils of globals in R and Stata

Related

How to define a recursive for loop in R?

I have a priorly unknown number of variables, and for each variable I need to define a for loop and perform a series of operations. For each subsequent variable, I need to define a nested loop inside the previous one, performing the same operations. I guess there must be a way of doing this recursively, but I am struggling with it.
Consider for instance the following easy example:
results = c()
index = 0
for(i in 1:5)
{
a = i*2
for(j in 1:5)
{
b = a*2 + j
for(k in 1:5)
{
index = index + 1
c = b*2 + k
results[index] = c
}
}
}
In this example, I would have 3 variables. The loop on j requires information from the loop i, and the loop on k requires information from the loop j. This is a simplified example of my problem and the operations here are pretty simple. I am not interested on another way of getting the "results" vector, what I would like to know is if there is a way to recursevily do this operations for an unknown number of variables, lets say 10 variables, so that I do not need to nest manually 10 loops.
Here is one approach that you might be able to modify for your situation...
results <- 0 #initialise
for(level in 1:3){ #3 nested loops - change as required
results <- c( #converts output to a vector
outer(results, #results so far
1:5, #as in your loops
FUN = function(x,y) {x*2+y} #as in your loops
)
)
}
The two problems with this are
a) that your formula is different in the first (outer) loop, and
b) the order of results is different from yours
However, you might be able to find workarounds for these depending on your actual problem.
I have tried to change the code so that it is a function that allows to define how many iterations need to happen.
library(tidyverse)
fc <- function(i_end, j_end, k_end){
i <- 1:i_end
j <- 1:j_end
k <- 1:k_end
df <- crossing(i, j, k) %>%
mutate(
a = i*2,
b = a*2 + j,
c = b*2 + k,
index = row_number())
df
}
fc(5,5,5)

Efficient algorithm to turn matrix subdiagonal to columns r

I have a non-square matrix and need to do some calculations on it's subdiagonals. I figure out that the best way is too turn subdiagonals to columns/rows and use functions like cumprod. Right now I use a for loop and exdiag defined as below:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
However it to be not really efficient. Do you know any other algorithm to achieve that kind of results.
A little example to show what I am doing:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
mat <- matrix(1:72, nrow = 12, ncol = 6)
newmat <- matrix(nrow=11, ncol=6)
for (i in 1:11){
newmat[i,] <- c(cumprod(exdiag(mat,i)),rep(0,max(6-12+i,0)))
}
Best regards,
Artur
The fastest but by far the most cryptic solution to get all possible diagonals from a non-square matrix, would be to treat your matrix as a vector and simply construct an id vector for selection. In the end you can transform it back to a matrix if you want.
The following function does that:
exdiag <- function(mat){
NR <- nrow(mat)
NC <- ncol(mat)
smalldim <- min(NC,NR)
if(NC > NR){
id <- seq_len(NR) +
seq.int(0,NR-1)*NR +
rep(seq.int(1,NC - 1), each = NR)*NR
} else if(NC < NR){
id <- seq_len(NC) +
seq.int(0,NC-1)*NR +
rep(seq.int(1,NR - 1), each = NC)
} else {
return(diag(mat))
}
out <- matrix(mat[id],nrow = smalldim)
id <- (ncol(out) + 1 - row(out)) - col(out) < 0
out[id] <- NA
return(out)
}
Keep in mind you have to take into account how your matrix is formed.
In both cases I follow the same logic:
first construct a sequence indicating positions along the smallest dimension
To this sequence, add 0, 1, 2, ... times the row length.
This creates the first diagonal. After doing this, you simply add a sequence that shifts the entire previous sequence by 1 (either down or to the right) until you reach the end of the matrix. To shift right, I need to multiply this sequence by the number of rows.
In the end you can use these indices to select the correct positions from mat, and return all that as a matrix. Due to the vectorized nature of this code, you have to check that the last subdiagonals are correct. These contain less elements than the first, so you have to replace the values not part of that subdiagonal by NA. Also here you can simply use an indexing trick.
You can use it as follows:
> diag1 <- exdiag(amatrix)
> diag2 <- exdiag(t(amatrix))
> identical(diag1, diag2)
[1] TRUE
In order to come to your result
amatrix <- matrix(1:72, ncol = 6)
diag1 <- exdiag(amatrix)
res <- apply(diag1,2,cumprod)
res[is.na(res)] <- 0
t(res)
You can modify the diag() function.
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
exdiag2 <- function(matrix, off){diag(matrix[-1:-off,])}
Speed Test:
mat = diag(10, 10000,10000)
off = 4
> system.time(exdiag(mat,4))
user system elapsed
7.083 2.973 10.054
> system.time(exdiag2(mat,4))
user system elapsed
5.370 0.155 5.524
> system.time(diag(mat))
user system elapsed
0.002 0.000 0.002
It looks like that the subsetting from matrix take a lot of time, but it still performs better than your implementation. May be there are a lot of other subsetting approaches, which outperforms my solution. :)

How to store data from for loop inside of for loop? (rolling correlation in r)

require(quantmod)
require(TTR)
iris2 <- iris[1:4]
b=NULL
for (i in 1:ncol(iris2)){
for (j in 1:ncol(iris2)){
a<- runCor(iris2[,i],iris2[,j],n=21)
b<-cbind(b,a)}}
I want to calculate a rolling correlation of different columns within a dataframe and store the data separately by a column. Although the code above stores the data into variable b, it is not as useful as it is just dumping all the results. What I would like is to be able to create different dataframe for each i.
In this case, as I have 4 columns, what I would ultimately want are 4 dataframes, each containing 4 columns showing rolling correlations, i.e. df1 = corr of col 1 vs col 1,2,3,4, df2 = corr of col 2 vs col 1,2,3,4...etc)
I thought of using lapply or rollapply, but ran into the same problem.
d=NULL
for (i in 1:ncol(iris2))
for (j in 1:ncol(iris2))
{c<-rollapply(iris2, 21 ,function(x) cor(x[,i],x[,j]), by.column=FALSE)
d<-cbind(d,c)}
Would really appreciate any inputs.
If you want to keep the expanded loop, how about a list of dataframes?
e <- list(length = length(ncol(iris2)))
for (i in 1:ncol(iris2)) {
d <- matrix(0, nrow = length(iris2[,1]), ncol = length(iris2[1,]))
for (j in 1:ncol(iris2)) {
d[,j]<- runCor(iris2[,i],iris2[,j],n=21)
}
e[[i]] <- d
}
It's also a good idea to allocate the amount of space you want with placeholders and put items into that space rather than use rbind or cbind.
Although it is not a good practice to create dataframes on the fly in R (you should prefer putting them in a list as in other answer), the way to do so is to use the assign and get functions.
for (i in 1:ncol(iris2)) {
for (j in 1:ncol(iris2)){
c <- runCor(iris2[,i],iris2[,j],n=21)
# Assign 'c' to the name df1, df2...
assign(paste0("df", i), c)
}
}
# to have access to the dataframe:
get("df1")
# or inside a loop
get(paste0("df", i))
Since you stated your computation was slow, I wanted to provide you with a parallel solution. If you have a modern computer, it probably has 2 cores, if not 4 (or more!). You can easily check this via:
require(parallel) # for parallelization
detectCores()
Now the code:
require(quantmod)
require(TTR)
iris2 <- iris[,1:4]
Parallelization requires the functions and variables be placed into a special environment that is created and destroyed with each process. That means a wrapper function must be created to define the variables and functions.
wrapper <- function(data, n) {
# variables placed into environment
force(data)
force(n)
# functions placed into environment
# same inner loop written in earlier answer
runcor <- function(data, n, i) {
d <- matrix(0, nrow = length(data[,1]), ncol = length(data[1,]))
for (j in 1:ncol(data)) {
d[,i] <- TTR::runCor(data[,i], data[,j], n = n)
}
return(d)
}
# call function to loop over iterator i
worker <- function(i) {
runcor(data, n, i)
}
return(worker)
}
Now create a cluster on your local computer. This allows the multiple cores to run separately.
parallelcluster <- makeCluster(parallel::detectCores())
models <- parallel::parLapply(parallelcluster, 1:ncol(iris2),
wrapper(data = iris2, n = 21))
stopCluster(parallelcluster)
Stop and close the cluster when finished.

set matrix element using apply in R

I am trying to assign the values from the dataframe into a matrix. The columns 2 and 3 are mapped to rows and columns respectively in the matrix. This is not working since the sim.mat is not storing the values.
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <- as.numeric(x[4])
}
mat <- apply(sim.data, 1, score, sim.mat)
Is this the right approach? If yes how can I get it to work.
No need for apply, try this:
score <- function(x, sim.mat) {
r <- as.numeric(x[[2]])
c <- as.numeric(x[[3]])
sim.mat[cbind(r,c)] <- as.numeric(x[[4]])
sim.mat
}
mat <- score(sim.data, sim.mat)
Check the "Matrices and arrays" section of ?"[" for documentation.
If you really wanted to use apply like you did, you would need your function to modify sim.data in the calling environment, do:
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <<- as.numeric(x[4])
}
apply(sim.data, 1, score, sim.mat)
sim.mat
This type of programming where functions have side-effects is really not recommended.

Store values in For Loop

I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3

Resources