R: leave a dataset out in a loop - r

I've got a problem doing some calculations in R: I've got a large amount of datasets A[i] and on each of those datasets I'm running some iterated calculations, until the difference between two iterations becomes small enough. However, for one particular A[j], my calculations take way too long, so I suspect that data doesn't fit my method too well and I want to leave it out.
So my question is: can I write a condition in my while loop, such that if a certain time period is exceeded, R just disregards that dataset and goes on to the next one? So does there exist something like this:
while (abs (a-b) > 0.01){
calculations
for (j in 1:n){
if (time > amount){results A[j] <- 0}
}
}
Thanks in advance!

Use Sys.time() before you enter the loop, and then inside the loop check the time again to get the time the loop is running.

Related

How can I add a simulation counter to each simulation in parallel processing? R

I need to add a simulation counter to each simulation in my code using parallel processing. So for each simulation there should be an additional value stating this is "simulation x", the next simulation will be "simulation x+1" etc. which will be stored in an additional column. The problem is that when I attempt to add a counter with a for loop then the counter only stores one digit for each combination of beta, theta and delta; not for each iteration as well. i.e. the pseudo code to help visualise this attempted solution is:
counter<-1
start parallelisation{
function
counter<-counter+1
}
end parallelisation
I've created a very simplified version of my code, hopefully if you can find a solution to this problem then I can apply the same solution to the more complex script. Note I am using 20 cores to solve my issue, you will of course know that you need to specify a reasonable amount of cores based on your PC specifications. Below is the code:
library("parallel")
betavalues<-seq(from=50,to=150,length.out=3)
thetavalues<-seq(from=200,to=300,length.out=3)
deltavalues<-seq(from=20,to=140,length.out=3)
outputbind<-c()
iterations<-5
examplefunction<- function(i=NULL){
for (j in betavalues){
for(k in thetavalues){
for(l in deltavalues){
output<-data.frame(beta=j,theta=k,delta=l)
outputbind<-rbind(outputbind,output)
}
}
}
data<-data.frame(beta=outputbind$beta,theta=outputbind$theta,delta=outputbind$delta)
}
cl <- makeCluster(mc <- getOption("cl.cores", 20))
clusterExport(cl=cl, varlist=ls())
par_results <- parLapply(1:iterations, fun=examplefunction, cl=cl)
clusterEvalQ(cl,examplefunction)
stopCluster(cl)
data <- do.call("rbind", par_results)
To clarify, I wish to add an additional column to data that will state the simulation number.
This problem has been bugging me for weeks, and a solution would be immensely appreciated!
Edit: Adding a sequence of numbers based on the length of data post parallel processing is not a sufficient solution, as the length of each simulation will vary in the more complicated script. Therefore, the solution needs to be added within or prior to the object data being created.

Terminate a loop and retain results so far

I am curious if there is a way in R to do some function which runs either forever or for sone long time which you can matter terminate without losing the results.
For example, say I would like to fit lots of linear models to some randomly generated data like so
dat <-list()
for (i in 1:1e99){
x <- 1:10
y <- cumsum(runif(10))
dat[[i]] <- lm(y~x)
}
I would like to leave my computer for a long time and when I return, I will stop the function. I would like to keep all of the models that have been built.
Basically I want it to do as much as it can before I get back and then not lose its progress when I go to stop it.
Does anyone know a way of accomplishing this in R?
You run the loop then hit the stop button when you get back.

Multiple regressions with loop in loop in R

I want to run the following regressions, the variable which has the problem is EP, is a dummy variable and I must to check different cases, z (lenght=1000) is the threshold variable. Ι want to crate 1000 different variables of EP from z variable and save the coefficients. I use a loop in loop but the results are completely wrong.The code runs properly and does not make an error. The square brackets and parentheses are the code I run. The problem is that there is a huge delay and the results after two hours still running.
I reduced the sample by 99% and again I did not get a result, the code ran without problem .
I do not want anything special, just for each value of z to run a different regression and end up to stored the estimates. I can not understand why take so long. Any idea?
for (k in 1:1000){
z<-u[k]
for (i in 1:length(dS)){
if (dS[i]>=z) {
EP[i]=1
} else {
EP[i]=0
}
fitT <- dynlm(dR ~ L(dR,1)+L(EN)+L(EP)+L(ΚΜ,1)
prob[[k]] <- summary(fitT)$coefficients[1, 2]
}
You don't have a closing } for the i-loop; you also don't have a closing ) for dynlm.
Note, you can really replace your i-loop by
EP <- as.integer(dS >= z)
Next time when asking question, be clear and specific. What do you mean by "I use a loop in loop but the results are completely wrong"? Error message, etc?

How to simplify several for loops into a single loop or function in R

I am trying to combine several for loops into a single loop or function. Each loop is evaluating if an individual is present at a site that is protected, and based on that is assigning a number (numbers represent sites) at each time step. After that, the results for each time step are stored in a matrix and later used in other analysis. The problem that I am having is that I am repeating the same loop several times to evaluate the different scenarios (10%, 50%, 100% of sites protected). Since I need to store my results for each scenario I can't think of a better way to simplify this into a single loop or function. Any ideas or suggestions will be appreciated. This is a very small and simplify idea of the problem. I would like to keep the structure of the loop since my original loop is using several if statements. The only thing that is changing is the proportion of sites that are protected.
N<-10 # number of sites
sites<-factor(seq(from=1,to=N))
sites10<-as.factor(sample(sites,N*1))
sites5<-as.factor(sample(sites,N*0.5))
sites1<-as.factor(sample(sites,N*0.1))
steps<-10
P.stay<-0.9
# storing results
result<-matrix(0,nrow=steps)
time.step<-seq(1,steps)
time.step<-data.frame(time.step)
time.step$event<-0
j<-numeric(steps)
j[1]<-sample(1:N,1)
time.step$event[1]<-j[1]
for(i in 1:(steps-1)){
if(j[i] %in% sites1){
if(rbinom(1,1,P.stay)==1){time.step$event[i+1]<-j[i+1]<-j[i]} else
time.step$event[i+1]<-0
}
time.step$event[i+1]<-j[i+1]<-sample(1:N,1)
}
results.sites1<-as.factor(result)
###
result<-matrix(0,nrow=steps)
time.step<-seq(1,steps)
time.step<-data.frame(time.step)
time.step$event<-0
j<-numeric(steps)
j[1]<-sample(1:N,1)
time.step$event[1]<-j[1]
for(i in 1:(steps-1)){
if(j[i] %in% sites5){
if(rbinom(1,1,P.stay)==1){time.step$event[i+1]<-j[i+1]<-j[i]} else
time.step$event[i+1]<-0
}
time.step$event[i+1]<-j[i+1]<-sample(1:N,1)
}
results.sites5<-as.factor(result)
###
result<-matrix(0,nrow=steps)
time.step<-seq(1,steps)
time.step<-data.frame(time.step)
time.step$event<-0
j<-numeric(steps)
j[1]<-sample(1:N,1)
time.step$event[1]<-j[1]
for(i in 1:(steps-1)){
if(j[i] %in% sites10){
if(rbinom(1,1,P.stay)==1){time.step$event[i+1]<-j[i+1]<-j[i]} else
time.step$event[i+1]<-0
}
time.step$event[i+1]<-j[i+1]<-sample(1:N,1)
}
results.sites10<-as.factor(result)
#
results.sites1
results.sites5
results.sites10
Instead of doing this:
sites10<-as.factor(sample(sites,N*1))
sites5<-as.factor(sample(sites,N*0.5))
sites1<-as.factor(sample(sites,N*0.1))
and running distinct loops for each of the three variables, you can make a general loop and put it in a function, then use one of the -apply functions to call it with specific parameters. For example:
N<-10 # number of sites
sites<-factor(seq(from=1,to=N))
steps<-10
P.stay<-0.9
simulate.n.sites <- function(n) {
n.sites <- sample(sites, n)
result<-matrix(0,nrow=steps)
time.step<-seq(1,steps)
time.step<-data.frame(time.step)
time.step$event<-0
j<-numeric(steps)
j[1]<-sample(1:N,1)
time.step$event[1]<-j[1]
for(i in 1:(steps-1)){
if(j[i] %in% n.sites){
...etc...
return(result)
}
results <- lapply(c(1, 5, 10), simulate.n.sites)
Now results will be a list, with three matrix elements.
The key is to identify places where you repeat yourself, and then refactor those areas into functions. Not only is this more concise, but it's easy to extend in the future. Want to sample for 2 site? Put a 2 in the vector you pass to lapply.
If you're unfamiliar with the -apply family of functions, definitely look into those.
I also suspect that much of the rest of your code could be simplified, but I think you've gutted it too much for me to make sense of it. For example, you define an element of time.step$event based on a condition, but then you overwrite that element. Surely this isn't what the actual code does?

Stopping computation in R; will I lose results up to that point?

I am running some matrix algebra on a large data set. Each iteration of the outer most loop populates one row of two different vectors that are allocated to 64,797 rows. I am printing a counter to screen for the outer loop to check progress. This might not be ideal. R is still working, according to task manager and using a good bit of memory and processor. However, the R console is not responding and I can only read at the end that I am at least to row 31,000ish (there is scroll space, but I cannot scroll down to see the last number printed). I do not know if the program is "hung" (no longer iterating outer loop) and I am wasting my time waiting, or if I should stick it out. The machine has been running for a few days. Given the program's structure, I can END the process and restart from the last row populated. However, if I end the process, will I lose the previously assigned data in my vector I am populating? That would be bad, as I'd have to start all over. Here is the code below. The end goal are the vectors called: save.trace and save.trace2.
for (i in 1:nrow(coor.cal)){
print(i)
for (j in 1:nrow(coor.cal)){
dist<-( (coor.cal[i,1]-coor.cal[j,1])^2 + (coor.cal[i,2]-coor.cal[j,2])^2)^.5
#finding distances between observations
w[j]<-exp(-0.5*((dist/bw)^2))#computing weight matrix for observation i
if (dist>bw){w[j]<-0}
}
for (k in 1:27){
xv<-xmat[ ,k]
xtw[k, ]<-xv*w
}
xtwx<-xtw%*%xmat
xtwx.inv<-ginv(xtwx)
xtwx.inv.xtw<-xtwx.inv%*%xtw
xrow<-xmat[i, ]
temp<-xrow%*%xtwx.inv.xtw
save.trace[i]<-temp[i]
save.trace2[i]<-sum(temp*temp)
}
Here's a better example.
saved <- 0
for(i in 1:100)
{
saved <- i
Sys.sleep(0.1)
}
Run this code, and press escape sometime in the next 10 seconds (before the loop completes).
Take a look at the value of saved. It should be more than 0, indicating that your progress has been stored.
I did not have the memory to risk an experiment to answer my question. I just borrowed another machine, tried it, and indeed you CAN end a process and still retain previously stored information. I had not run into this problem before. I attempted to delete my question, but could not. I'll leave this in case it helps someone else.

Resources