How to list results for several calculations in R - r

I have loaded two source files, performed some iterative calculations, and then i need to display/export the results. There are hundreds of iterative calculations, hence hundreds of results. However, only results of the final calculation is displayed.
In this example, i have shortened the list of calculations to only 3. Please refer to line 7 (k in 1:3). How do i get R to display result of all calculations?
Many thanks in advance to those who can offer help. If this question has already been asked before, a link would be great. I could not find this probably because i do not know the right terms to search for.
# Load files
d1<-read.csv('testhourly.csv',sep=",",header=F)
names(d1)<-c("elapsedtime","units")
d2<-read.csv('testevent.csv',sep=",",header=F)
names(d2)<-c("eventno","starttime","endtime","starttemp","endtemp")
# Perform for calculations 1 to 3
for(k in 1:3){
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
z <- (x-y)}
results <- sum(z)
# Export results
write.csv(results, file = "results.csv")

You are not saving your output inside the loop for every iteration, so your loop only returns the final value of the last iteration.
temp=vector("list",3)
for(k in 1:3) {
a<-d2[k,2]
b<-d2[k,3]
x<-d1[a:b,]$q
a2<-d2[k,2]-1
b2<-d2[k,3]-1
y<-d1[a2:b2,]$q
temp[[k]] <- (x-y)
}
results <- sum(unlist(temp))

Related

Storing matrix after every iteration

I have following code.
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
}
gcm() is some function which returns a number based on the values of i and j and so, R has all values. But this calculation takes a lot of time. My machine's power was interrupted several times due to which I had to start over. Can somebody please help, how can I save R somewhere after every iteration, so as to be safe? Any help is highly appreciated.
You can use the saveRDS() function to save the result of each calculation in a file.
To understand the difference between save and saveRDS, here is a link I found useful. http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/
If you want to save the R-workspace have a look at ?save or ?save.image (use the first to save a subset of your objects, the second one to save your workspace in toto).
Your edited code should look like
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
save.image(file="path/to/your/file.RData")
}
About your code taking a lot of time I would advise trying the ?apply function, which
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
You want gmc to be run for-each cell, which means you want to apply it for each combination of row and column coordinates
R = 100; # number of rows
C = 100; # number of columns
M = expand.grid(1:R, 1:C); # Cartesian product of the coordinates
# each row of M contains the indexes of one of R's cells
# head(M); # just to see it
# To use apply we need gmc to take into account one variable only (that' not entirely true, if you want to know how it really works have a look how at ?apply)
# thus I create a function which takes into account one row of M and tells gmc the first cell is the row index, the second cell is the column index
gmcWrapper = function(x) { return(gmc(x[1], x[2])); }
# run apply which will return a vector containing *all* the evaluated expressions
R = apply(M, 1, gmcWrapper);
# re-shape R into a matrix
R = matrix(R, nrow=R, ncol=C);
If the apply-approach is again slow try considering the snowfall package which will allow you to follow the apply-approach using parallel computing. An introduction to snowfall usage can be found in this pdf, look at page 5 and 6 in particular

How Can I Avoid This For Loop? (R)

I currently have a for loop as below and it does not run as fast as I would like it to.
library(dplyr)
DF<-data.frame(Name=c('Bob','Joe','Sally')) #etc
PrimaryResult <- Function1(DF)
ResultsDF<-Function2(PrimaryResult)
for(i in 1:9)
{
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
ResultsDF<-rbind(ResultsDF,Function2(NextResult))
}
The code takes an initial result of Function1 (which is a list of names) and tries it again with each name in the initial result being excluded individually to provide alternative results. These are returned as a one row data frame via Function2 and appended to the Results data frame.
How can I make this faster?
It seems like your main problem is the appending results from function 2 each iteration with rbind. This is classically slow because you are telling R to rewrite a bunch of information at each time step and R does not really know how large of a vector you are going to end up with.
Try making your results into a list vector. I don't really know what your functions do so I can't really assist with that part.
results_list <- vector("list", 10)
results_list[[1]] <- Function2(PrimaryResult)
for(i in 1:9){
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
results_list[[i+1]]<-rbind(results_list[[i]],Function2(NextResult))
}
This is not perfect, but it should speed things up a bit.

How to efficiently iterate through a complicated function that outputs a dataframe?

I essentially need to iterate through a set of values for parameters A,B,C to generate a table of results that will help me analyze the importance of such parameters. This is for a program in R.
Let's say that:
A goes from rangeA = 1:10
B goes from rangeB = 11:20
C goes from rangeC = 21:30
The simplest (not most efficient) solution that I currently use goes something like this:
### here I create this empty dataframe because I add on each tmp calc later
res <- data.frame()
### here i just create a random dataframe for replicative purposes
dataset <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
ParameterAdjustment() <- function{
for(a in rangeA){
for(b in rangeB){
for(c in rangeC){
### this is a complicated calculation that is much more
### difficult than the replicable example below
tmp <- CalculateSomething(dataset,a,b,c)
### an example calculation
### EDIT NEW EXAMPLE CALCULATION
tmp <- colMeans(dataset+a*b*c)
tmp <- data.frame(data.frame(t(tmp),sd(tmp))
res <- rbind(res,tmp)
}
}
}
return(res)
}
My problem is that this works fine with my original dataset that runs calculations on a 7000x500 dataframe. However, my new datasets are much larger and performance has become a significant issue. Can anyone suggest or help with a more efficient solution? Thank you.
Not sure what language the above is, so not sure how relevant this is but here goes: Are you outputting/sending the data as you go or collecting all the display-results in memory then outputting them all in one go at the end? When I've encountered similar problems with large datasets and this approach has helped me out a few times. For example, sending 10,000s of data-points back to the client for a graph, rather than generating an array of all those points and sending that, I output to screen after each point and then free up the memory. It still takes a while but that's unavoidable. The important bit is that it doesn't crash.

Running the same function multiple times and saving results with different names in workspace

So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))

printing objects from a double for loop in R

I have code for nested for loops here. The output I would like to receive is a matrix of the means of the columns of the matrix produced by the nested loop. So, the interior loop should run 1000 simulations of a randomized vector, and run a function each time. This works fine on its own, and spits the output into R. But I want to save the output from the nested loop to an object (a matrix of 1000 rows and 11 columns), and then print only the colMeans of that matrix, to be performed by the outer loop.
I think the problem lies in the step where I assign the results of the inner loop to the obj matrix. I have tried every variation on obj[i,],obj[i],obj[[i]], etc. with no success. R tells me that it is an object of only one dimension.
x=ACexp
obj=matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
Inv=append(a,b) #append these two for the Inv vector
for (i in 1:1000){ #run this vector through the simulations
Inv2=sample(Inv,replace=FALSE) #randomize interactions
temp2=rbind(x,Inv2)
obj[i]<-property(temp2) #print results to obj matrix
}
print.table(colMeans(obj)) #get colMeans and print to excel file
}
Any ideas how this can be fixed?
You're repeatedly printing the whole matrix to the screen as it gets modified but your comment says "print to excel file". I'm guessing you actually want to save your data out to a file. Remove print.table command all together and after your loops are completed use write.table()
write.table(colMeans(obj), 'myNewMatrixFile.csv', quote = FALSE, sep = ',', row.names = FALSE)
(my preferred options... see ?write.table to select the ones you like)
Since your code isn't reproducible, we can't quite tell what you want. However, I guess that property is returning a single number that you want to place in the right row/column place of the obj matrix, which you would refer to as obj[row,col]. But you'll have trouble with that as is, because both your loops are using the same index i. Maybe something like this will work for you.
obj <- matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
Inv <- rep(c(1,0), times=c(i, ncol(x)-i)) #repeat 1 for 1:# columns in x, then 0's
for (j in 1:nrow(obj)){ #run this vector through the simulations
Inv2 <- sample(Inv,replace=FALSE) #randomize interactions
temp2 <- rbind(x,Inv2)
obj[j,i] <- property(temp2) #save results in obj matrix
}
}
write.csv(colMeans(obj), 'myFile.csv') #get colMeans and print to csv file

Resources