Speed of for loop is decreasing - r

I'm currently looping through a large data set and what I discovered is that the higher loop index, the slowlier the loop is. It goes pretty fast at the beginning, but it's incredibly slow at the end. What's the reason for this? Is there any way how to bypass it?
Remarks:
1) I can't use plyr because the calculation is recursive.
2) The length of output vector is not known in advance.
My code looks rougly like this:
for (i in 1:20000){
if(i == 1){
temp <- "some function"(input data[i])
output <- temp
} else {
temp <- "some function"(input data[i], temp)
out <- rbind(out, temp)
}
}

The problem is that you are growing the object out at each iteration, which will entail larger and larger amounts of copying as the size of out increases (as your loop index increases).
In this case, you know the loop needs a vector of 20000 elements, so create one initially and fill in that object as you loop. Doing this will also remove the need for the if() ... else() which is also slowing down your loop and will become appreciable as the size of the loop increases.
For example, you could do:
out <- numeric(20000)
out[1] <- foo(data[1])
for (i in 2:length(out)) {
out[i] <- foo(data[i], out[i-1])
}
What out needs to be when you create it will depend on what foo() returns. Adjust creation of out accordingly.

Related

Iterating over a changing sequence in a for loop

Suppose we have syntax as follows:
for (i in sequence) {
...
sequence <- append(sequence, i, after = 3)
}
But the code seems not to work properly because sequence is not updating within the brackets. I have come up with a similar decision using while loop instead. Is it possible to use for loop anyway?
The for loop in R only evaluates the seq value (sequence in your example) at the beginning. So changing sequence in your loop will have no effect over how many times the loop will run.
For example,
sequence <- 1:2
for (i in sequence) {
print(i)
sequence <- 0
}
will print the numbers 1 and 2, and then will finish, with sequence containing a single zero.
This is described in the help page ?"for":
The seq in a for loop is evaluated at the start of the loop; changing
it subsequently does not affect the loop. If seq has length zero the
body of the loop is skipped. Otherwise the variable var is assigned in
turn the value of each element of seq. You can assign to var within
the body of the loop, but this will not affect the next iteration.
When the loop terminates, var remains as a variable containing its
latest value.
No, for loops in R can be considered a function call that iterates over the inputs "by value" (eg. by copying the input). Any change to the iterator sequence after start will leave the for-loop iteration unaffected. This is in general good practice as it drastically reduces bad coding practice and infinite loops. This is easily illustrated with a simple example:
idx <- 1:15
for(i in idx){
idx <- head(idx, -1)
cat('idx: ', idx, '\n')
cat('i: ', i, '\n')
}
If you want to update the iterator your best bet is either repeat or while but be careful as it increases the potential for errors and unexpected infinite loops.
idx <- 1:15
while(length(idx) != 0){
i <- head(idx, 1)
idx <- tail(idx, -1)
cat('idx: ', idx, '\n')
cat('i: ', i, '\n')
}

Problem with checking logical within for loop

Inspired by the leetcode challenge for two sum, I wanted to solve it in R. But while trying to solve it by brute-force I run in to an issue with my for loop.
So the basic idea is that given a vector of integers, which two integers in the vector, sums up to a set target integer.
First I create 10000 integers:
set.seed(1234)
n_numbers <- 10000
nums <- sample(-10^4:10^4, n_numbers, replace = FALSE)
The I do a for loop within a for loop to check every single element against eachother.
# ensure that it is actually solvable
target <- nums[11] + nums[111]
test <- 0
for (i in 1:(length(nums)-1)) {
for (j in 1:(length(nums)-1)) {
j <- j + 1
test <- nums[i] + nums[j]
if (test == target) {
print(i)
print(j)
break
}
}
}
My problem is that it starts wildly printing numbers before ever getting to the right condition of test == target. And I cannot seem to figure out why.
I think there are several issues with your code:
First, you don't have to increase your j manually, you can do this within the for-statement. So if you really want to increase your j by 1 in every step you can just write:
for (j in 2:(length(nums)))
Second, you are breaking only the inner-loop of the for-loop. Look here Breaking out of nested loops in R for further information on that.
Third, there are several entries in nums that gave the "right" result target. Therefore, your if-condition works well and prints all combination of nums[i]+nums[j] that are equal to target.

Iteration in r for loop

I want to write a for loop that iterates over a vector or list, where i'm adding values to them in each iteration. i came up with the following code, it's not iterating more than 1 iteration. i don't want to use a while loop to write this program. I want to know how can i control for loops iterator. thanks in advance.
steps <- 1
random_number <- c(sample(20, 1))
for (item in random_number){
if(item <18){
random_number <- c(random_number,sample(20, 1))
steps <- steps + 1
}
}
print(paste0("It took ", steps, " steps."))
It depends really on what you want to achieve. Either way, I am afraid you cannot change the iterator on the fly. while seems resonable in this context, or perhaps knowing the plausible maximum number of iterations, you could proceed with those, and deal with needless iterations via an if statement. Based on your code, something more like:
steps <- 1
for (item in 1:100){
random_number <- c(sample(20, 1))
if(random_number < 18){
random_number <- c(random_number,sample(20, 1))
steps <- steps + 1
}
}
print(paste0("It took ", steps, " steps."))
Which to be honest is not really different from a while() combined with an if statement to make sure it doesn't run forever.
This can't be done. The vector used in the for loop is evaluated at the start of the loop and any changes to that vector won't affect the loop. You will have to use a while loop or some other type of iteration.

Using for loop to append vectors of variable length

I am trying to create a vector or list of values based on the output of a function performed on individual elements of a column.
library(hpoPlot)
xyz_hpo <- c("HP:0003698", "HP:0007082", "HP:0006956")
getallancs <- function(hpo_col) {
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- list()
output[[length(anc) + 1]] <- append(output, anc)
}
return(anc)
}
all_ancs <- getallancs(xyz_hpo)
get.ancestors outputs a character vector of variable length depending on each term. How can I loop through hpo_col adding the length of each ancs vector to the output vector?
Welcome to Stack Overflow :) Great job on providing a minimal reproducible example!
As mentioned in the comments, you need to move the output <- list() outside of your for loop, and return it after the loop. At present it is being reset for each iteration of the loop, which is not what you want. I also think you want to return a vector rather than a list, so I have changed the type of output.
Also, in your original question, you say that you want to return the length of each anc vector in the loop, so I have changed the function to output the length of each iteration, rather than the whole vector.
getallancs <- function(hpo_col) {
output <- numeric()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- append(output, length(anc))
}
return(output)
}
If you are only doing this for a few cases, such as your example, this approach will be fine, however, this paradigm is typically quite slow in R and it's better to try and vectorise this style of calculation if possible. This is especially important if you are running this for a large number of elements where computation will take more than a few seconds.
For example, one way the function above could be vectorised is like so:
all_ancs <- sapply(xyz_hpo, function(x) length(get.ancestors(hpo.terms, x)))
If in fact you did mean to output the whole vector of anc, not just the lengths, the original function would look like this:
getallancs <- function(hpo_col) {
output <- character()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- c(output, anc)
}
return(output)
}
Or a vectorised version could be
all_ancs <- unlist(lapply(xyz_hpo, function(x) get.ancestors(hpo.terms, x)))
Hope that helps. If it solves your problem, please mark this as the answer.

Can I stop and start a script in R

I am working with R and my script is taking a very long time. I was thinking I can stop it and then start it again by changing my counters.
My code is this
NC <- MLOA
for (i in 1:313578){
len_mods <- length(MLOA[[i]])
for (j in 1:2090){
for(k in 1:len_mods){
temp_match <- matchv[j]
temp_rep <- replacev[j]
temp_mod <- MLOA[[i]][k]
is_found <- match(temp_mod,temp_match, nomatch = 0, incomparables = 0)
if(is_found[1] == 1) NC[[i]][k] <- temp_rep
rm(temp_match,temp_rep,temp_mod)
}
}
}
I am thinking that I can stop my script, then re-start it by checking what values of i,j and k are and changing the counts to start at their current values. So instead of counting "for (i in 1:313578)" if i is up to 100,000 I could do (i in 100000:313578).
I don't want to stop my script though before checking that my logic about restarting it is solid.
Thanks in anticipation
I'm a bit confused what you are doing. Generally on this forum it is a good idea to greatly simplify your code, and only present the core of the problem in a very simple example. That withstanding, this might help. Put your for loop in a function whose parameters are the first elements of the sequence of numbers you loop over. For example:
myloop <- function(x,...){
for (i in seq(x,313578,1)){
...
This way you can easily manipulate were your loop starts.
The more important question is, however, why are you using for loops in the first place? In R, for loops should be avoided at all costs. By vectorizing your code you can greatly increase its speed. I have realized speed increases of a factor of 500!
In general, the only reason you use a for loop in R is if current iterations of the for loop depend on previous iterations. If this is the case then you are likely bound to the slow for loop.
Depending on your computer skills, however, even for loops can be made faster in R. If you know C, or are willing to learn a bit, interfacing with C can dramatically increase the speed of your code.
An easier way to increase the speed of your code, which unfortunately will not yield the same speed up as interfacing with C, is using R's Byte Complier. Check out the cmpfun function.
One final thing on speeding up code: The following line of codetemp_match <- matchv[j] looks innocuous enough, however, this can really slow things down. This is because every time you assign matchv[j] to temp_match you make a copy of temp_match. That means that your computer needs to find some were to store this copy in RAM. R is smart, as you make more and more copies, it will clean up after you and throw away those copies you are no longer using with the garbage collect function. However finding places to store your copies as well as calling the garbage collect function take time. Read this if you want to learn more: http://adv-r.had.co.nz/memory.html.
You could also use while loops for your 3 loops to maintain a counter. In the following, you can stop the script at any time (and view the intermediate results) and restart by changing continue=TRUE or simply running the loop part of the script:
n <- 6
res <- array(NaN, dim=rep(n,3))
continue = FALSE
if(!continue){
i <- 1
j <- 1
k <- 1
}
while(k <= n){
while(j <= n){
while(i <= n){
res[i,j,k] <- as.numeric(paste0(i,j,k))
Sys.sleep(0.1)
i <- i+1
}
j <- j+1
i <- 1
}
k <- k+1
j <- 1
}
i;j;k
res
This is what I got to....
for(i in 1:313578)
{
mp<-match(MLOA[[i]],matchv,nomatch = 0, incomparables=0)
lgic<- which(as.logical(mp),arr.ind = FALSE, useNames = TRUE)
NC[[i]][lgic]<-replacev[mp]}
Thanks to those who responded, Jacob H, you are right, I am definitely a newby with R, your response was useful. Frank - your pointers helped.
My solution probably still isn't an optimal one. All I wanted to do was a find and replace. Matchv was the vector in which I was searching for a match for each MLOA[i], with replacev being the vector of replacement information.

Resources