After using R for the last little bit, I have distanced myself from using for loops for everything, but I still don't know how to cycle through names without using for loops. Whenever I am processing mulitple things, I will use for loops as a way to cover all my bases in one go. Here is a mock example of something I would do. Is there a simpler way to go about doing this?
names <- c("John_Doe","Jane_Doe")
employee <- vector(length = length(names))
for(i in 1:length(names)){
filename <- paste0(names[i],".csv")
employee[i] <- read.csv(filename,header = FALSE)
}
Not sure if it's simpler, but you could try this:
dfs <- lapply(seq_along(names), function(i) read.csv(paste0(names[i], ".csv"), header = FALSE))
Instead of looping it's applying the same function to your index or 1 through the length of your names vector
Related
This question already has answers here:
How to assign values to dynamic names variables
(2 answers)
Closed 7 years ago.
I keep running into situations where I want to dynamically create variables using a for loop (or similar / more efficient construct using dplyr perhaps). However, it's unclear to me how to do it right now.
For example, the below shows a construct that I would intuitively expect to generate 10 variables assigned numbers 1:10, but it doesn't work.
for (i in 1:10) {paste("variable",i,sep = "") = i}
The error
Error in paste("variable", i, sep = "") = i :
target of assignment expands to non-language object
Any thoughts on what method I should use to do this? I assume there are multiple approaches (including a more efficient dplyr method). Full disclosure: I'm relatively new to R and really appreciate the help. Thanks!
I've run into this problem myself many times. The solution is the assign command.
for(i in 1:10){
assign(paste("variable", i, sep = ""), i)
}
If you wanted to get everything into one vector, you could use sapply. The following code would give you a vector from 1 to 10, and the names of each item would be "variable i," where i is the value of each item. This may not be the prettiest or most elegant way to use the apply family for this, but I think it ought to work well enough.
var.names <- function(x){
a <- x
names(a) <- paste0("variable", x)
return(a)
}
variables <- sapply(X = 1:10, FUN = var.names)
This sort of approach seems to be favored because it keeps all of those variables tucked away in one object, rather than scattered all over the global environment. This could make calling them easier in the future, preventing the need to use get to scrounge up variables you'd saved.
No need to use a loop, you can create character expression with paste0 and then transform it as uneveluated expression with parse, and finally evaluate it with eval.
eval(parse(text = paste0("variable", 1:10, "=",1:10, collapse = ";") ))
The code you have is really no more useful than a vector of elements:
x<-1
for(i in 2:10){
x<-c(x,i)
}
(Obviously, this example is trivial, could just use x<-1:10 and be done. I assume there's a reason you need to do non-vectored calculations on each variable).
I currently have a for loop as below and it does not run as fast as I would like it to.
library(dplyr)
DF<-data.frame(Name=c('Bob','Joe','Sally')) #etc
PrimaryResult <- Function1(DF)
ResultsDF<-Function2(PrimaryResult)
for(i in 1:9)
{
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
ResultsDF<-rbind(ResultsDF,Function2(NextResult))
}
The code takes an initial result of Function1 (which is a list of names) and tries it again with each name in the initial result being excluded individually to provide alternative results. These are returned as a one row data frame via Function2 and appended to the Results data frame.
How can I make this faster?
It seems like your main problem is the appending results from function 2 each iteration with rbind. This is classically slow because you are telling R to rewrite a bunch of information at each time step and R does not really know how large of a vector you are going to end up with.
Try making your results into a list vector. I don't really know what your functions do so I can't really assist with that part.
results_list <- vector("list", 10)
results_list[[1]] <- Function2(PrimaryResult)
for(i in 1:9){
Filtered<-filter(DF,Name!=PrimaryResult[i,2])
NextResult <- Function1(Filtered)
results_list[[i+1]]<-rbind(results_list[[i]],Function2(NextResult))
}
This is not perfect, but it should speed things up a bit.
I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.
I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))
So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))