I am trying to pull out individual subject data in R using a for and while loop. I would like for the loop to pull the data accordingly and save it as it's own data file. The issue is the for loop not counting Subjects properly and returning the proper value for b. My while loop works perfectly because I can manually set the value of b, run the loop, and produce the correct data files.
Subjects = (1:2)
r = (1);
ST = (d$onReadyTime) #need to get ST to read not just the first number in onReadyTime for each trial
ST=strsplit(ST,split = "a")
for (b in (1:length(Subjects)){
b <- Subjects[r]
while (r == Subjects[b])
{STSubject = (ST[[b]])
ST2=(STSubject)
#ST2=`colnames<-`(ST2,Subjects[i])
write.table(ST2,file = paste("ST_Subject_",b,".csv",sep=""),row.names = FALSE, col.names= TRUE)
r = r+1
}
}
Without a minimum working example, only general guidance can be offered.
n <- 0
for (i in seq_along(models)) {
for (j in seq_along(meters)) {
n <- n + 1
make_line(i, j) -> glances[[n]]
}
}
The key point here, as remarked, is to initialize the counter outside the inner loop.
Related
I am trying to convert a for loop which I am currently using to run a process across a large matrix. The current for loop finds the maximum value within a 30 x 30 section and creates a new matrix with the maximum value.
The current code for the for loop looks like this:
mat <- as.matrix(CHM) # CHM is the original raster image
maxm <- matrix(nrow=nrow(mat)/30, ncol=ncol(mat)/30) # create new matrix with new dimensions
for(i in 1:dim(maxm)[1]) {
for(j in 1:dim(maxm)[2]) {
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
maxm[i,j] <- max(CHM[row:(row + 29), col:(col + 29)])
}
}
I want to convert this to a foreach loop to use parallel processing. I've got as far as producing the following code but this dosent work. I'm not sure how to produce the new matrix within the foreach loop:
ro<-nrow(mat)/30
co<-ncol(mat)/30
maxm <- matrix(nrow=nrow(mat)/30, ncol=ncol(mat)/30)
foreach(i=ro, .combine='cbind') %:%
foreach(j=co, .combine='c') %dopar% {
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
maxm[i,j]<-(max(CHM[row:(row + 29), col:(col + 29)]))
}
Any suggestions please!
Prior to performing any action in parallel, one should try to see if any vectorizing is possible. And once that is done question 'is parallelization reasonable?'
In this specific example, parallelization is unlikely to be as fast as you expect, as at each iteration you are saving your output into a common object. R does not commonly support this in parallelization, and instead one should seek parallelization in the so called 'embarrassingly parallel-able' problems, until one gets a better understanding of how parallel problems work. In short: Don't perform parallel changes to data in R, unless you know what you're doing. It is unlikely to be faster.
That said in your case it actually becomes quite tricky. You seem to be performing a 'rolling-max window', and the output should be saved in a combined matrix. An alternative method to saving the data directly int othe matrix, is to return a matrix with 3 columns x, i, j, where the latter two are indices that indicate which row/column the value of x should be placed in.
In order for this to work, as Dmitriy noted in his answer, the data needs to be exported to each cluster (parallel session), such that we can use it. Afterwards the following example shows how one can perform the parallization
First: Create a cluster and export the dataset
set.seed(1)
#Generate test example
n <- 3000
dat <- matrix(runif(n^2), ncol = n)
library(foreach)
library(doParallel)
#Create cluster
cl <- parallel::makeCluster(parallel::detectCores())
#Register it for the foreach loop
doParallel::registerDoParallel(cl)
#Export the dataset (could be done directly in the foreach, but this is more explicit)
parallel::clusterExport(cl, "dat")
Next we come to the foreach loop. Note that according to the documentation, nested foreach loops should be seperated using the %:% tag, as shown in my example below:
output <- foreach(i = 1:(nrow(dat)/30), .combine = rbind, .inorder = FALSE) %:%
foreach(j = 1:(ncol(dat)/30), .combine = rbind, .inorder = FALSE) %dopar%{
row <- 30 * (i - 1) + 1
col <- 30 * (j - 1) + 1
c(x = max(dat[row:(row + 29), col:(col + 29)]), i = i, j = j)
}
Note the .inorder = FALSE. As i return the indices i dont care about order, only about speed.
Last but not least, we need to create the matrix. The Matrix package function Matrix::SparseMatrix allows for specifying values and indices.
output <- Matrix::sparseMatrix(output[,"i"], output[,"j"], x = output[,"x"])
This is still rather slow. For n = 3000 it took roughly 6 seconds to perform calculations + a not-insignificant overhead from exporting the data. But it is likely faster than the same method using sequential loops.
Let me try to get an answer here.
As I know, R use cluster system for parallel computation, each node works with an own environment. So, foreach-%dopar%, firstly, copy all current .globalEnv to the each cluster node and after that tried to run your code which is written in the cycle body. With no backcopy after code execution. You'll just get only a result by result = foreach(...) { }. So, the code maxm[i,j]<-(max(CHM[row:(row + 29), col:(col + 29)])) in the each node changes only local copy of your matrix, nothing more.
So, the "correct" code, probably, will be like this:
mat <- as.matrix(CHM);
ro<-nrow(mat)/30;
co<-ncol(mat)/30;
maxm = foreach(i=1:ro, .combine='cbind') %:%
{
result = foreach(j = 1:co, .combine='c') %dopar%
{
row <- 30 * (i - 1) + 1;
col <- 30 * (j - 1) + 1;
max(CHM[row:(row + 29), col:(col + 29)]);
}
result;
}
Maybe it also be need to use as.matrix for maxm.
I am generating huge amount of data and would like to store only selected values during my run. But it always saves last result. For example, in the following sample code it always store last result satisfying my condition. Remember that I have huge data and don't want to store in vector or list but would like to store right away in a file. I need your help.
Thanks.
f<-function(x) (x-1)*(x-5)*(x-10)
fileE<-file("E.txt")
for (i in seq(1,100,0.1)){
if (f(i) > 0 && f(i) < 10)
writeLines(paste0(i," ",f(i)), fileE)
}
close(fileE)
Maybe use write with append:
unlink("E.txt")
for (i in seq(1, 100, 0.1)){
res <- f(i)
if (res > 0 & res < 10)
write(x = paste0(i, " ", res), file = "E.txt", append = TRUE)
}
I am looking at DIGRE model code in R and there is a loop as follow:
idx <- 1
for (i in 1:length(drugName)) {
if (drugName[i] != "Neg_control") {
cat(idx, ". ", drugName[i], "\n", sep = "")
idx <- idx + 1
}
My question is a particular reason for using separate variables ( i and idx ) for loop and the counter. Wouldn't this loop work fine with just one variable. I am new to R therefore curious.
The variable idx only gets incremented if drugName isn't "Neg_control". So i indexes all the observations of drugName and idx counts the 'occurences'. I guess depending on how the data looks like and what the goal of the function is, this could be done without using a loop.
How about this?
controlTF = drugName != "Neg_control"
idx <- sum(controlTF)
paste0(1:idx, ". ", drugName[controlTF])
I'm trying to improve the speed of my code, which is trying to optimise a value using 3 variables which have large ranges. The most likely output uses values in the middle of the ranges, so it is wasting time starting from the lowest possible value of each variable. I want to start from the middle value and iterate out! The actual problem has thousands of lines with numbers from 150-650. C,H and O limits will be defined somewhat based on the starting number, but will always be more likely at a central value in the defined range. Is there a way to define the for loop to work outwards like I want? The only, quite shabby, way I can think of is to simply redefine the value within the loop from a vector (e.g. 1=20, 2=21, 3=19, etc). See current code below:
set_error<-2.5
ct<-c(325.00214,325.00952,325.02004,325.02762,325.03535,325.03831,325.04588, 325.05641,325.06402,325.06766,325.07167,325.07454,325.10396)
FormFun<-function(x){
for(C in 1:40){
for(H in 1:80){
for(O in 1:40){
test_mass=C*12+H*1.007825+O*15.9949146-1.0072765
error<-1000000*abs(test_mass-x)/x
if(error<set_error){
result<-paste("C",C,"H",H,"O",O,sep ="")
return(result)
break;break;break;break
}
}
}
}
}
old_t <- Sys.time()
ct2<-lapply(ct,FormFun)
new_t <- Sys.time() - old_t # calculate difference
print(new_t)
Use vectorization and create a closure:
FormFun1_fac <- function(gr) {
gr <<- gr
function(x, set_error){
test_mass <- with(gr, C*12+H*1.007825+O*15.9949146-1.0072765)
error <- 1000000 * abs(test_mass - x) / x
ind <- which(error < set_error)[1]
if (is.na(ind)) return(NULL)
paste0("C", gr[ind, "C"],"H", gr[ind, "H"],"O", gr[ind, "O"])
}
}
FormFun1 <- FormFun1_fac(expand.grid(C = 1:40, H = 1:80, O = 1:40))
ct21 <- lapply(ct, FormFun1, set_error = set_error)
all.equal(ct2, ct21)
#[1] TRUE
This saves a grid of all combinations of C, H, O in the function environment and calculates the error for all combinations (which is fast in vectorized code). The first combination that passes the test is returned.
I want to combine the results from a for loop into 1 txt file and I have written my code based on suggestion from this link
combine results from a loop in one file
There is one problem. I am supposed to get 8 results (row) but I only ended with only 5. Somehow the other results did not get into the file. I think the problem is with the if statement but I don't know how to fix it.
Here is my code
prob <- c(0.10, 0.20)
for (j in seq(prob)) {
range <- c(2,3)
for (i in seq(range)) {
sample <- c(10,20)
for (k in seq(sample)) {
data <- Simulation(X =1,Y =range[i], Z=sample[k] ,p = prob[j])
filename <- paste('file',i,'txt')
if (j == 1) {
write.table(data, "Desktop/file2.txt", col.names= TRUE)
} else {
write.table(data,"Desktop/file2.txt", append = TRUE, col.names = FALSE)
}
}
}
}
That's because the if ( j == 1 ) bit is meant to check whether this is the first time you've written to the file or not.
If it is the first time, then it will write the column names (i.e. X, Y, Z, p) into the file (see the col.names=TRUE?).
If it isn't the first time, then it won't write the column names, but will just append the data.
Since you have multiple nested loops, that condition won't work so well for you: when j==1 (i.e. for prob=0.1) you perform 4 other loops within. But since j==1, the data is getting overwritten each time.
I'd recommend initialising a variable count that counts how many times you've performed Simulation, and then changing that line to if ( count == 1 ):
count <- 1
prob <- c(0.10,0.20)
# .... code as before
data <- Simulation(X =1,Y =range[i], Z=sample[k] ,p = prob[j])
if ( count == 1 ) {
write.table(data, "Desktop/file2.txt", col.names=T)
} else {
write.table(data, "Desktop/file2.txt", append=T, col.names=F)
}
# increment count
count <- count + 1
}}}