So, I have a task that needs to be performed on a list of objects (30 total objects). I used to have this coded as individual objects, and would run it one at a time. But after some StackExchange feedback I have moved on to lists. But, when I execute the code so it starts going through the list and completing the desired task, I have no idea what the progress is. I only see the red stopsign in the RStudio GUI, so I have no idea if the computer is hung up, or what object in the list it is currently working on.
Has anyone tried to create some sort of Feedback code chunk, where you will get some sort of feedback when an object in the listed is completed?
Editing for more details
I have this list
sizes <- list(
n1.6<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
n7.8<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1),
n9.10<-c(2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1),
Which actually has 30 objects but the 3 above are an example
and I have this function
strata_eq <- function(n_vec){
sample_means <-matrix(rep(0,80000), nrow=16, ncol=5000)
for(i in 1:5000){
sample <- strata(data[,c(1,23)], "stream", n_vec, method="srswor")
sample <- data[sample$ID_unit,]
stream_means <- with(sample, tapply(avg.RPD, stream, mean))
sample_means[,i] <- c(unlist(stream_means))
}
return(sample_means)
}
I then would like to pass this function to the list, so that the function is applied to every object in the list.
rates <- lapply(sizes, function(x) strata_eq(x)
)
However, this takes really long, and I was hoping there was some sort of code that would help provide some progress on the execution. Maybe just something that tells me what object is completed in the list, or which one it is working on...?
You could include in your code, a print function with the time passed. It should be enough to allow you to estimate the remaining time if your function is equivalent on all the objects.
At the beginning of your code:
ptm <- proc.time()
In your function:
strata_eq <- function(n_vec){
sample_means <-matrix(rep(0,80000), nrow=16, ncol=5000)
for(i in 1:5000){
sample <- strata(data[,c(1,23)], "stream", n_vec, method="srswor")
sample <- data[sample$ID_unit,]
stream_means <- with(sample, tapply(avg.RPD, stream, mean))
sample_means[,i] <- c(unlist(stream_means))
}
print(proc.time() - ptm)
return(sample_means)
}
I then would like to pass this function to the list, so that the function is applied to every object in the list.
rates <- lapply(sizes, strata_eq)
May I suggest you as well some trick to tighten your function:
Create a table equal to data[,c(1,23)] before the beginning of the function
replacing the for by a *ply* function and convert your result into a matrix later on. Easily achievable using rbindlist, for example.
Which lead to : use the package data.table
Related
I am currently experiencing perpetual issues with object selection within loops in R. I am fairly convinced that this is a common problem but I cannot seem to find the answer so here I am...
Here's a practical example of a problem I have:
I have a dataframe as source with a series of variables named sequentially (X1,X2,X3,X4, and so on). I am looking to create a function which takes the data as source matches it to another dataset to create a new, combined dataset.
The number of variables will vary. I want to pass my function a parameter which tells it how many variables I have, and the function needs to adjust the number of times it will run the code accordingly. This seems like a task for a for loop, but again there doesn't appear to be an easy way for that selection and recreation of variables within a loop.
Here's the code I need to repeat:
new1$X1 <- data$X1[match(new1$matf1, data$rowID)]
new1$X2 <- data$X2[match(new1$matf1, data$rowID)]
new1$X3 <- data$X3[match(new1$matf1, data$rowID)]
new1$X4 <- data$X4[match(new1$matf1, data$rowID)]
new1$X5 <- data$X5[match(new1$matf1, data$rowID)]
(...)
return(new1)
I've attempted something like this:
for(i in 1:5) {
new1$Xi <- assign(paste0("X", i)), as.vector(paste0("data$X",i)[match(new1$matf1, data$rowID)])
}
without success.
Thank you for your help!
You can try this simple way, however a join would be more efficient:
vals <- paste0('X',1:5)
for(i in vals){
new1[[i]] <- data[[i]][match(new1$matf1, data$rowID)]
}
everyone.
I am programming a simulation app in Shiny R and I am stuck at the for loops.
Basically, in an reactive I am calling a function that loops through a couple of other functions, like this:
In the server.R:
output.addiction <- reactive ({
SimulateMultiple(input$no.simulations, vectors(), parameters(), input$S.plus, input$q,
input$weeks, input$d, list.output)
})
The function:
SimulateMultiple <- function (no.simulations, vectors, parameters, S.plus, q, weeks, d, list.output) {
for (i in 1:no.simulations) {
thisi <- i
simulation <- SimulateAddictionComponents(vectors, parameters, S.plus, q, weeks, d) # returns list "simulation"
df.output <- BuildOutputDataframe(weeks, simulation, vectors) # returns "df.outout"
output.addiction <-BuildOutputList(df.output, simulation, list.output) # returns "output.addiction"
}
return(output.addiction)
}
And, again, the last function that creates the out put list:
BuildOutputList <- function (df.output, simulation, list.output) {
addiction <- simulation$addiction
output.w.success <- list(df.output, addiction) # includes success data
output.addition <- c(list.output, list(output.w.success)) # adds the new data to the list
return(output.addition)
}
I read about the issue online a lot, I tried to isolate some stuff, to introduce a local({}) etc. But it never works. In the end, I get a list of length 1.
I would be forever grateful, if you could help me - I have been on this for two days now.
The problem solved itself when I edited the code in the function from
output.addition <- c(list.output, list(output.w.success)) # adds the new data to the list
return(output.addition)
to
list.output <- c(list.output, list(output.w.success)) # adds the new data to the list
return(list.output)
so as to not overwrite the object every time in the loop. After all - very easy and stupid problem, but hard to spot.
I currently have a for loop as below and it does not run as fast as I would like it to.
library(dplyr)
DF<-data.frame(Name=c('Bob','Joe','Sally')) #etc
PrimaryResult <- Function1(DF)
ResultsDF<-Function2(PrimaryResult)
for(i in 1:9)
{
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
ResultsDF<-rbind(ResultsDF,Function2(NextResult))
}
The code takes an initial result of Function1 (which is a list of names) and tries it again with each name in the initial result being excluded individually to provide alternative results. These are returned as a one row data frame via Function2 and appended to the Results data frame.
How can I make this faster?
It seems like your main problem is the appending results from function 2 each iteration with rbind. This is classically slow because you are telling R to rewrite a bunch of information at each time step and R does not really know how large of a vector you are going to end up with.
Try making your results into a list vector. I don't really know what your functions do so I can't really assist with that part.
results_list <- vector("list", 10)
results_list[[1]] <- Function2(PrimaryResult)
for(i in 1:9){
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
results_list[[i+1]]<-rbind(results_list[[i]],Function2(NextResult))
}
This is not perfect, but it should speed things up a bit.
So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))
f1 <- function(x) {
zx1 <- sample(1:nrow(zone4[[x]]), nrow(zone4[[x]]), replace=F)
zone4[[x]]$randnums <- zx1
}
f1(1)
## DOESN'T UPDATE zone4[[1]]
zx2 <- sample(1:nrow(zone4[[1]]), nrow(zone4[[1]]), replace=F)
zone4[[1]]$randnums <- zx2
## DOES UPDATE zone[[1]]
If I make a function f1() like shown above, the object 'zone4[[x]]' is not updated. However, if I run the same command as above but explicitly state 'x', as shown below, then the object 'zone4[[x]]' is updated. Why could this be? I want to know because I want to run iterations of the code. If within the definition of the function f1() above I write "names(zone4[[x]])", then the output I get tells me that the function did what it was supposed to, but when queried again, zone[[x]] appears to be unchanged. Thank you for your help. The idea is to make random numbers for each subset of a data set for a given year and another variable, zone. The data set was originally a single data frame, but I used the split() function to separate the data according to year and zone, of which there are 4. Maybe there is a better way to assign random numbers to specific subsets of data without using the split() function?
R functions don't usually have side effects (ie. changing things in global objects)
This is a good thing (most of the time as we don't want unintended consequences)
The idiomatic approach is to assign the results to a new object (it can be the same name to overwrite the original)
f1 <- function(x) {
zx1 <- sample(1:nrow(zone4[[x]]), nrow(zone4[[x]]), replace=F)
zone4[[x]]$randnums <- zx1
# usually a good idea to return the complete object
# especially when a replacement function (in your case `[[<-`)
# is the last one called
return(zone4)
}
zone4 <- f1(1)
An alternative would be to use data.table
library(data.table)
zone4 <- lapply(zone4, as.data.table)
f1 <- function(x) {
zone4[[x]][,randnums := sample(.N)]
invisible(NULL)
}