Save loop results as csv table - r

I have simple loop that generate a value at each step, and I want to save all results as a single table. Problem is that each step overwrites the previous.
for(i in 1:5){
x = 3*i
print(c(i,x))
}
This gives
[1] 1 3
[1] 2 6
[1] 3 9
[1] 4 12
[1] 5 15
Now I create a matrix that I will then save as a csv file, but it only shows the final step of the loop.
results = matrix(c(i,x), ncol = 2)
[,1] [,2]
[1,] 5 15
write.table(results, file = "Results.csv", col.names=NA, append = T)
How to show the entire list of results? Thanks in advance!
(ps.- I know that a similar question has been posted previously, e.g. Write output of R loop to file, but the problem was quite specific and I did not manage to adapt the answers to my case).

Your loop only prints, to the console, the results. The matrix you're creating only relies on the single (and last) value of i. There are many ways to do it but if you really want to write a matrix, then you need to store them somewhere to export all iteration intermediate results. You can try something like:
results <- matrix(NA, nrow=5, ncol=2)
for(i in 1:5){
results[i, ] <- c(i, 3*i)
}
write.table(results, file = "Results.csv", col.names=NA, append = T)
And by the way you don't really need a loop here:
i <- 1:5
m <- matrix(c(i, 3*i), nrow=5)
would do the job.

You can usually use sapply instead of for-loops:
results <- t(sapply(1:5, function(x) c(x, 3*x)))
write.table(results, file="Results.csv", col.names=NA, append=T)

Assuming you really want/need a for-loop
1) You store all the result into a matrix and then you write the whole matrix to file
n = 5;
results = matrix(NA, ncol=2, nrow=n);
for(i in 1:n) {
results[i, ] = c(i, x);
}
write.table(results, file = "Results.csv", col.names=NA, append = T);
This is a "good" solution if you don't have many results and you want to access the file just once.
2) You store current result only into a matrix and you write to file at each iteration
n = 5;
for(i in 1:n) {
results = matrix(c(i,x), ncol = 2)
write.table(results, file = "Results.csv", col.names=NA, append = T);
}
This is a "good" solution if you have many data and memory limits. Maybe slower than the previous one because you will open the file many times...

To append using a matrix you could use:
exampleMatrix <- matrix(ncol = 2)
for(i in 1:5){
x = 3*i
if(i ==1){
exampleMatrix<- rbind(c(x,i))
}else{
exampleMatrix<- rbind(exampleMatrix,c(x,i))
}}
To append to a dataframe using a loop you could use the following:
exampleDF <- data.frame()
for(i in 1:5){
x = 3*i
exampleDF <- rbind(exampleDF,c(x,i))
}
write.csv(exampleDF, "C:\\path")

So when you want to store you values while using a loop, it's important to index. Below, I created some code where a(the iteration) and x(the value x * 3) are each stored inside a vector.
After the loop has finished, I combine the two vectors into one data frame with the cbind() function
a <- vector()
x <- vector()
for(i in 1:5){
a[i] = i
x[i] = 3*i
}
df <- as.data.frame(cbind(a, x))
There are other ways to do this without loops. Once you start raising the number of iterations, or doing nested loops, the processing time starts to get really high. Other options are in the apply package.
Hope this helped!

Related

R loop to create data frames with 2 counters

What I want is to create 60 data frames with 500 rows in each. I tried the below code and, while I get no errors, I am not getting the data frames. However, when I do a View on the as.data.frame, I get the view, but no data frame in my environment. I've been trying for three days with various versions of this code:
getDS <- function(x){
for(i in 1:3){
for(j in 1:30000){
ID_i <- data.table(x$ID[j: (j+500)])
}
}
as.data.frame(ID_i)
}
getDS(DATASETNAME)
We can use outer (on a small example)
out1 <- c(outer(1:3, 1:3, Vectorize(function(i, j) list(x$ID[j:(j + 5)]))))
lapply(out1, as.data.table)
--
The issue in the OP's function is that inside the loop, the ID_i gets updated each time i.e. it is not stored. Inorder to do that we can initialize a list and then store it
getDS <- function(x) {
ID_i <- vector('list', 3)
for(i in 1:3) {
for(j in 1:3) {
ID_i[[i]][[j]] <- data.table(x$ID[j:(j + 5)])
}
}
ID_i
}
do.call(c, getDS(x))
data
x <- data.table(ID = 1:50)
I'm not sure the description matches the code, so I'm a little unsure what the desired result is. That said, it is usually not helpful to split a data.table because the built-in by-processing makes it unnecessary. If for some reason you do want to split into a list of data.tables you might consider something along the lines of
getDS <- function(x, n=5, size = nrow(x)/n, column = "ID", reps = 3) {
x <- x[1:(n*size), ..column]
index <- rep(1:n, each = size)
replicate(reps, split(x, index),
simplify = FALSE)
}
getDS(data.table(ID = 1:20), n = 5)

How to iteratively perform combinations on larger datasets?

Background - I want to try and exhaustively search a set of all possible combinations of 250 rows taken 10 at a time. In order to iteratively get this, I use the following code
`
## Function definition
gen.next.cbn <- function(cbn, n){
## Generates the combination that follows the one provided as input
cbn.bin <- rep(0, n)
cbn.bin[cbn] <- 1
if (tail(cbn.bin, 1) == 0){
ind <- tail(which(cbn.bin == 1), 1)
cbn.bin[c(ind, ind+1)] <- c(0, 1)
}else{
ind <- 1 + tail(which(diff(cbn.bin) == -1), 1)
nb <- sum(cbn.bin[-c(1:ind)] == 1)
cbn.bin[c(ind-1, (n-nb+1):n)] <- 0
cbn.bin[ind:(ind+nb)] <- 1
}
cbn <- which(cbn.bin == 1)
}
## Example parameters
n <- 40
k <- 10
## Iteration example
for (i in 1:choose(n, k)){
if (i == 1){
cbn <- 1:k
}else{
cbn <- gen.next.cbn(cbn, n)
}
print(cbn)
}
`
I get the error "cannot allocate vector of size n GB" when I go beyond 40 rows.
Ideal Solution:
a) If the combinations can be dumped and memory can be flushed iteratively after every run in the loop (where I can check the further conditions)
b) If the combinations can be dumped to a csv file such that it does not cause a memory hog.
Thanks for your support.
As I said in the comments, iterpc is the way to go for such a task. You first need to initialize an iterator via the iterpc function. Next we can generate the next n combinations via getnext. After this, we simply append our results to a csv (or any file type you like).
getComboChunks <- function(n, k, chunkSize, totalCombos, myFile) {
myIter <- iterpc(n, k)
## initialized myFile
myCombs <- getnext(myIter, chunkSize)
write.table(myCombs, file = myFile, sep = ",", col.names = FALSE)
maxIteration <- (totalCombos - chunkSize) %/% chunkSize
for (i in 1:maxIteration) {
## get the next "chunkSize" of combinations
myCombs <- getnext(myIter, chunkSize)
## append the above combinations to your file
write.table(myCombs, file = myFile, sep = ",",
col.names = FALSE , append = TRUE)
}
}
For example, getComboChunks(250, 10, 100, 1000, "myCombos.csv") will write out 1000 combinations of 250 choose 10 to the file myCombos.csv 100 combinations at a time. Doing this in chunks will be more efficient than one at a time.
This library is written in C/C++ so it should be fairly efficient, but as #Florian points out in the comments, it won't produce all gmp::chooseZ(250, 10) = Big Integer ('bigz') : [1] 219005316087032475 combinations any time soon. I haven't tested it, but if you settle for 200 choose 5, I think you will be able to produce it in under a day (it is just over 2.5 billion results).

How to store data from for loop inside of for loop? (rolling correlation in r)

require(quantmod)
require(TTR)
iris2 <- iris[1:4]
b=NULL
for (i in 1:ncol(iris2)){
for (j in 1:ncol(iris2)){
a<- runCor(iris2[,i],iris2[,j],n=21)
b<-cbind(b,a)}}
I want to calculate a rolling correlation of different columns within a dataframe and store the data separately by a column. Although the code above stores the data into variable b, it is not as useful as it is just dumping all the results. What I would like is to be able to create different dataframe for each i.
In this case, as I have 4 columns, what I would ultimately want are 4 dataframes, each containing 4 columns showing rolling correlations, i.e. df1 = corr of col 1 vs col 1,2,3,4, df2 = corr of col 2 vs col 1,2,3,4...etc)
I thought of using lapply or rollapply, but ran into the same problem.
d=NULL
for (i in 1:ncol(iris2))
for (j in 1:ncol(iris2))
{c<-rollapply(iris2, 21 ,function(x) cor(x[,i],x[,j]), by.column=FALSE)
d<-cbind(d,c)}
Would really appreciate any inputs.
If you want to keep the expanded loop, how about a list of dataframes?
e <- list(length = length(ncol(iris2)))
for (i in 1:ncol(iris2)) {
d <- matrix(0, nrow = length(iris2[,1]), ncol = length(iris2[1,]))
for (j in 1:ncol(iris2)) {
d[,j]<- runCor(iris2[,i],iris2[,j],n=21)
}
e[[i]] <- d
}
It's also a good idea to allocate the amount of space you want with placeholders and put items into that space rather than use rbind or cbind.
Although it is not a good practice to create dataframes on the fly in R (you should prefer putting them in a list as in other answer), the way to do so is to use the assign and get functions.
for (i in 1:ncol(iris2)) {
for (j in 1:ncol(iris2)){
c <- runCor(iris2[,i],iris2[,j],n=21)
# Assign 'c' to the name df1, df2...
assign(paste0("df", i), c)
}
}
# to have access to the dataframe:
get("df1")
# or inside a loop
get(paste0("df", i))
Since you stated your computation was slow, I wanted to provide you with a parallel solution. If you have a modern computer, it probably has 2 cores, if not 4 (or more!). You can easily check this via:
require(parallel) # for parallelization
detectCores()
Now the code:
require(quantmod)
require(TTR)
iris2 <- iris[,1:4]
Parallelization requires the functions and variables be placed into a special environment that is created and destroyed with each process. That means a wrapper function must be created to define the variables and functions.
wrapper <- function(data, n) {
# variables placed into environment
force(data)
force(n)
# functions placed into environment
# same inner loop written in earlier answer
runcor <- function(data, n, i) {
d <- matrix(0, nrow = length(data[,1]), ncol = length(data[1,]))
for (j in 1:ncol(data)) {
d[,i] <- TTR::runCor(data[,i], data[,j], n = n)
}
return(d)
}
# call function to loop over iterator i
worker <- function(i) {
runcor(data, n, i)
}
return(worker)
}
Now create a cluster on your local computer. This allows the multiple cores to run separately.
parallelcluster <- makeCluster(parallel::detectCores())
models <- parallel::parLapply(parallelcluster, 1:ncol(iris2),
wrapper(data = iris2, n = 21))
stopCluster(parallelcluster)
Stop and close the cluster when finished.

Efficient way to generate permutations of 0 and 1?

What I am trying to do is generate all possible permutations of 1 and 0 given a particular sample size. For instance with a sample of n=8 I would like the m = 2^8 = 256 possible permutations, i.e:
I've written a function in R to do this, but after n=11 it takes a very long time to run. I would prefer a solution in R, but if its in another programming language I can probably figure it out. Thanks!
PermBinary <- function(n){
n.perms <- 2^n
array <- matrix(0,nrow=n,ncol=n.perms)
# array <- big.matrix(n, n.perms, type='integer', init=-5)
for(i in 1:n){
div.length <- ncol(array)/(2^i)
div.num <- ncol(array)/div.length
end <- 0
while(end!=ncol(array)){
end <- end +1
start <- end + div.length
end <- start + div.length -1
array[i,start:end] <- 1
}
}
return(array)
}
expand.grid is probably the best vehicle to get what you want.
For example if you wanted a sample size of 3 we could do something like
expand.grid(0:1, 0:1, 0:1)
For a sample size of 4
expand.grid(0:1, 0:1, 0:1, 0:1)
So what we want to do is find a way to automate that call.
If we had a list of the inputs we want to give to expand.grid we could use do.call to construct the call for us. For example
vals <- 0:1
tmp <- list(vals, vals, vals)
do.call(expand.grid, tmp)
So now the challenge is to automatically make the "tmp" list above in a fashion that we can dictate how many copies of "vals" we want. There are lots of ways to do this but one way is to use replicate. Since we want a list we'll need to tell it to not simplify the result or else we will get a matrix/array as the result.
vals <- 0:1
tmp <- replicate(4, vals, simplify = FALSE)
do.call(expand.grid, tmp)
Alternatively we can use rep on a list input (which I believe is faster because it doesn't have as much overhead as replicate but I haven't tested it)
tmp <- rep(list(vals), 4)
do.call(expand.grid, tmp)
Now wrap that up into a function to get:
binarypermutations <- function(n, vals = 0:1){
tmp <- rep(list(vals), n)
do.call(expand.grid, tmp)
}
Then call with the sample size like so binarypermutations(5).
This gives a data.frame of dimensions 2^n x n as a result - transpose and convert to a different data type if you'd like.
The answer above may be better since it uses base - my first thought was to use data.table's CJ function:
library(data.table)
do.call(CJ, replicate(8, c(0, 1), FALSE))
It will be slightly faster (~15%) than expand.grid, so it will only be more valuable for extreme cases.

Combining different matrices in a for loop

I want to create different matrices in a loop and then combine (either cbind or rbind) them. But the following codes doesn't work. Why not? And how to fix it?
dependent = matrix(c(30,184,6,106), 2, 2, byrow=T)
independent = c(160,166)
expected = numeric()
{for(i in 1:length(independent))
a = dependent*independent[i]/sum(independent)
expected = cbind(expected,a)}
This gives:
expected
[,1] [,2]
[1,] 15.276074 93.69325
[2,] 3.055215 53.97546
This is the result of only using the final iteration of the for loop. So the result is like only 166 is used, but 160 isn't.
A few comments:
Your for loop brackets are in the wrong place. You have:
R> {for(i in 1:3)
+ cat(i, "\n")
+ cat(i, "\n")
+ }
1
2
3
3
instead you should have:
R> for(i in 1:3) {
+ cat(i, "\n")
+ cat(i, "\n")
+ }
1
1
2
2
3
3
When you construct a for loop and ommit the brackets, only the first line after the for statement is used.
You can make your for loop more efficient by saving the result of sum(independent) since that doesn't change with each iteration, i.e.
for(i in 1:length(independent)){
a = dependent*independent[i]
expected = cbind(expected,a)
}
expected = expected//sum(independent)
In fact you can vectorise the whole calculation
y = sapply(independent, '*', dependent)
matrix(y, ncol=4,nrow=2)/sum(independent)
You could forgo the for loop altogether and use:
X <- lapply(independent, function(x) (dependent*x)/sum(independent))
do.call("cbind", X)
EDIT: I edited my response as the order was not correct.

Resources