replacing loops when comparing multidimensional arrays - r

I am working with multi-dimensional arrays, and making comparisons for each element. So far, I have been using loops, but I was wondering how I could use apply(or another better function to avoid the loops). I am not sure ..I tried several ways, but it is not working fine.
Let's say the following example, where I compute the 95 percentile for the elements of the 3-dimension, and then I make a comparison:
m <- array(1:30, c(5,4,3))
mp <- apply(m,1:2,quantile,probs=c(.95),na.rm=TRUE)
temp <- array(dim=dim(m))
for(i in 1:5){
for(j in 1:4){
temp[i,j,] <- m[i,j,]>mp[i,j]
}
}
I don't know if apply can be used here(I read some posts but still not sure), is there any other way to avoid the loops??
Thanks in advance!

You can use vectorization and assigning dimensions after the evaluation of your condition:
array(as.vector(m)>as.vector(mp),dim(m))

Related

Which function inside a loop is more efficient (ncol/nrow() or dim())

In an exercise attempt I am trying to create a multiplication table using a for loop. I am new to programming and R is my first language that I learn, so I would like to know which functions inside loops are faster and more efficient. For now, I am not using methods of the apply family because I think that understanding of basic functions like the loops is important.
Here are two ways that I use to create a multiplicaton table:
Using dim() function:
mtx <- matrix(nrow=10, ncol=10)
for(i in 1:dim(mtx)[1]){
for(j in 1:dim(mtx)[2]){
mtx[i,j] <- i*j
}
}
Using ncol/nrow() function:
mtx <- matrix(nrow=10, ncol=10)
for(i in 1:ncol(mtx)){
for(j in 1:nrow(mtx)){
mtx[i,j] <- i*j
}
}
Which way is more efficient and generaly better to use?
Thank you
If you use the functions like you do in your example, the difference is really neglectable. This is because the functions get called only once per loop definition (and not every loop iteration!)
I would definitely prefer ncol/nrow because its much easier too read than dim(x)[1].
That being said, if you just go for the timings, the dim function is faster than ncol/nrow. If you look at the source code, you can see that ncol is implemented as
function (x)
dim(x)[2L]
which means that ncol calls dim and is therefore marginally slower.
If you really want to save some speed with big matrices I would suggest to create the loop vectors beforehand like this:
rows <- 1:nrow(mtx)
cols <- 1:ncols(mtx)
for (i in rows) {
for (j in cols) {
mtx[i, j] <- i * j
}
}

R stats - generate incremental variables

I am quite new in R programming and facing a simple problem but can't find any solution.
In other programming languages, it is possible to generate incrementally named variables into a loop. Is this possible in R? How?
For example, I would like to save the output of an operation into a new variable each time a loop is done:
for(i in 1:5) {
var_[i] <- i + pi
}
In this way, I would end up with var_1, var_2,..., var_5.
Thank you in advance for any help.
The literal version of what you're attempting is generally considered bad practice in R.
We generally avoid creating large collections of isolated data structures. It is much cleaner to put them all in a list and then set their names attribute:
x <- vector("list",5)
for (i in seq_len(5)){
x[[i]] <- i + pi
}
names(x) <- paste0("var_",1:5)
And then you'd refer to them via things like:
x[["var_1"]]
While this is possible in R, it is highly discouraged. It is better to work with named lists or vector or accumulate results. For example here you can store them as a vector.
myvar<-1:5+pi
# myvar[1] == 4.141593
# myvar[5] == 8.141593
or if you wanted to create a list you could use
myvar <- lapply(1:5, function(x) {x+pi})
names(myvar)<-paste("var", 1:5, sep="_")
# myvar[["var_1"]] == myvar[[1]] == 4.141593
But if you really need to create a bunch of variables (and you don't) you can use the assign() function

How to avoid a loop here in R?

In my R program I have a "for" loop of the following form:
for(i in 1:I)
{
res[i] <- a[i:I] %*% b[i:I]
}
where res, a and b are vectors of length I.
Is there any straightforward way to avoid this loop and calculate res directly? If so, would that be more efficient?
Thanks in advance!
This is the "reverse cumsum" of a*b
rev(cumsum(rev(a) * rev(b)))
So long as res is already of length I, the for loop isn't "incorrect" and the apply solutions will not really be any faster. However, using apply can be more succinct...(if potentially less readable)
Something like this:
res <- sapply(seq_along(a), function(i) a[i:I] %*% b[i:I])
should work as a one-liner.
Expanding on my first sentence. While using the inherent vectorization available in R is very handy and often the fastest way to go, it isn't always critical to avoid for loops. Underneath, the apply family determines the size of the output and pre-allocates it before "looping".

In R, when is smart to formaly declare a data.frame?

In other R code, it is common to see data.frame declared before a loop is started.
Suppose I have data frame data1 with 2000 rows.
And in a loop, I am via web service looping over data1 to create a new data.frame data2. (Please don't recommend not using a loop).
And in data2$result and data2$pubcount I need to store different values for each of the 2000 data1 items.
Do I HAVE to declare before the loop
data2=data.frame()
and do I have to tell R how many rows and what columns I will later use? I know that columns can be added without declaring. What about rows. Is there advantage in doing:
data2<-data.frame(id=data1$id)
I would like to do only what I absolutely HAVE to declare and do.
Why the empty declaration gives error once in the loop?
later edit: Speed and memory is not of issue. 10s vs. 30s makes no difference and I have a under 100MB data and big PC (8GB). Matrix is not an option since the data is numbers and text (mixed), so I have to use non-matrix.
Something like this:
df <- data.frame(a=numeric(n),b=character(n))
for (i in 1:n) {
#<do stuff>
df[i,1] <- ...
df[i,2] <- ...
}
You should avoid manipulation of data.frames in a loop, since subsetting of data.frames is a slow operation:
a <- numeric(n)
b <- character(n)
for (i in 1:n) {
#<do stuff>
a[i] <- ...
b[i] <- ...
}
df <- data.frame(a,b)
Of course, there are often better ways than a for loop. But it is strongly recommended to avoid growing objects (and I wont teach you how to do that). Pre-allocate as shown here.
Why should you pre-allocate? Because growing objects in a loop is sloooowwwww and that's one of the main reasons why people think loops in R are slow.

Loop two variables one is conditional on another one

I want to make a loop which contains two variables i,j. for each i equals 1:24, j can be 1:24
but I don't know to make this loop;
i=1
while(i<=24)
{
j=seq(1,24,by=1)
for (j in j)
{
cor[i,j]
}
}
i=i+1
is this right? my output is cor[i,j].
In order to accomplish your final goal try...
cor(myMatrix)
The result is a matrix containing all of the correlations of all of the columns in myMatrix.
If you want to try to go about it the way you were it's probably best to generate a matrix of all of the possible combinations of your items using combn. Try combn(1:4,2) and see what it looks like for a small example. For your example with 24 columns the best way to cycle through all combinations using a for loop is...
myMatrix <- matrix(rnorm(240), ncol = 24)
myIndex <- combn(1:24,2)
for(i in ncol(myIndex)){
temp <- cor(myMatrix[,myIndex[1,i]],myMatrix[,myIndex[2,i]])
print(c(myIndex[,i],temp))
}
So, it's possible to do it with a for loop in R you'd never do it that way.
(and this whole answer is based on a wild guess about what you're actually trying to accomplish because the question, and your comments, are very hard to figure out)

Resources