access indexed data in a loop using R

access indexed data in a loop using R - r

Lets say I have 5 databases, named data1-data5. I basically want to create a loop that prints the first 10 rows of the data. In my naïve mind, the code should look something like this:
for (i in 1:5){
print(head(data[i]))
}
That does not work. What's the proper way to do this? How do I define [i] as the "indexing" variable for the different databases?

Another way would be to use get function:
for (i in 1:5){
tmp <- get(paste0("data", i))
## Assigns the data to the variable tmp - just like tmp <- data1/data2/data3 etc
print(head(tmp))
}

It would be better to put these objects in a list and use [[ to reference them. But if you must use separate names for the objects, then you need to parse them and evaluate the resulting expressions.
Here's an example you can emulate. For brevity, it prints the values of numerical objects rather than the heads of "databases."
data1 <- 1; data2 <- 2; data3 <- 3
for (i in 1:3) {
print(eval(parse(text=paste0("data", i))))
}

Related

how to create a routine in R for 4 tables with same name changing just the last character

The idea is to do something like this:
for(i in 1:4){
paste0("tabDummy",i) <- data.frame(data[,c(1,i+1)],colnames(data)[i+1])
}
But I know that paste0 would return a character of the way:
"tabDummy1" or "tabDummy2" ...
instead of just:
tabDummy1 or tabDummy2 ...
which are in fact the names of the tables I would like to work with in my rountine.
Is there a function would do what I am thinking?

Maybe something like this....
for(i in 1:4){
assign(paste0("tabDummy",i), data.frame(data[,c(1,i+1)],colnames(data)[i+1])
}

The R way would be to make a list with four elements that could then be manipulated however you'd like, instead of four distinctly named elements.
tabDummy <- list()
for(i in 1:4){
tabDummy[[i]] <- data.frame(data[, c(1, i+1)], colnames(data)[i+1])
}
Or if you wanted to use the lapply function, which is built for working with lists...
tabDummy <- lapply(1:4, function(i) {
data.frame(data[, c(1, i+1)], colnames(data)[i+1])
})
Or, it seems you're actually converting the data set from wide to long format; there are a multitude of questions here about that, one method would be as follows; this first creates a single long data frame, and then splits it.
data_long <- reshape2::melt(id.vars=1, measure.vars=2:5)
tabDummy <- split(data_long, data_long$variable)

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!

You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

How to use extract function in a for loop?

I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.

One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))

Loop over a string variable in R

I would like to loop over a string variable. For example:
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
data_FK_i<-subset(data_FK, subset= !is.na(FK) & (!is.na(i)))
}
The "i" should receive a different name from the list.
What am I doing wrong? It's not working? Adding "" doesn't seem to help.
Thank,
Einat
Thanks, the "assign" answer did the work!!!!!!!!!!

I agree with #Thomas. You should use a list. However, let me demonstrate how to modify your code to create multiple objects. You can use the function assign to create objects based on strings.
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
assign(paste0("data_FK_", i), complete.cases(data[c("FK", i)]))
}

Try something like this instead, which will give you a list containing the three subsetted dataframes:
lapply(clist, function(x) data_FK[ !is.na(data_FK$FK) & !is.na(data_FK[,x]) ,])
The problem in your code is that i is a character string, specifically one of the values from clist in each iteration of the for-loop. So, when R reads !is.na(i) you're saying !is.na("BMI"), etc.
Various places on Stack Overflow advise against using subset at all in favor of extraction indices (i.e., [) like in the example code above because subset relies on non-standard evaluation that is confusing and sometimes leads you down bad rabbit holes.

Is this what you want?
You need to give the loop something to store the data into.
Also you need to tell the loop how long you want it to run.
clist <- c("BMI", "trig", "hdl")
#empty vector
data_FK<-c()
#I want a loop and it will 'loop' 3 times (1 to 3), which is the length of my list
for (i in 1:length(clist)) {
#each loop stores the corresponding item from the list into the vector
data_FK<-c(data_FK,clist[i])
}
## or if you want to store the values in a data frame
## there are other ways to create this, but here is a simple solution
data_FK<-data.frame(placer=1:length(clist))
for(i in 1:length(clist)){
data_FK$items[i]<-clist[i]
}
## or maybe you just want to print the names
for (i in 1:length(clist)){
print(clist[i])
}

Running the same function multiple times and saving results with different names in workspace

So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!

The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}

You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

access indexed data in a loop using R - r

Another way would be to use get function: for (i in 1:5){ tmp <- get(paste0("data", i)) ## Assigns the data to the variable tmp - just like tmp <- data1/data2/data3 etc print(head(tmp)) }

Related

how to create a routine in R for 4 tables with same name changing just the last character

update a vector using assign in R

How to use extract function in a for loop?

Loop over a string variable in R

Running the same function multiple times and saving results with different names in workspace

Categories

Resources