Merge multiple lists in a list with loop function - r

I have 4 datasets that contains the same var called "siteid_public". The ultimate goal is: I want to see how many unique "siteid_public" in this four datasets. I will add them together and then use length (unique()) to get the number.
I use very stupid way to get this goal,the code like this:
site1<-dflist[[1]]%>%
select(siteid_public)
site2<-dflist[[2]]%>%
select(siteid_public)
site3<-dflist[[3]]%>%
select(siteid_public)
site4<-dflist[[4]]%>%
select(siteid_public)
site<-c(site1$siteid_public, site2$siteid_public,site3$siteid_public,site4$siteid_public)
length(unique(site))
But now, I want to improve its efficiency.
So, first, I use this code to create a list called "sitelist" which contains 4 lists coming from for datasets.(The dflist[[i]] in the code is the place where I store these 4 datasets.) After I run the code below, each list has one same var called siteid_public. The code is here:
sitelist<-list()
for (i in 1:4){
sitelist[[i]]<-dflist[[i]]%>%
select(siteid_public)
}
Now I want to add all 4 lists in sitelist as one list, and then use unique to see how many unique siteid_public value in this combined list. Could people help me to continue this code and achieve that goal? thanks a lot~~!

You can use lapply to iterate on a list of frames, either on the whole list or just as easily a subset (including one or zero).
Your site1 through site4 can be created as a list with
sites <- lapply(dflist[1:4], function(z) select(z, siteid_public))
and you can do your unique-counting with
unique(unlist(sites))
This works as well with
sites <- lapply(dflist, ...) # all of it
sites <- lapply(dflist[3], ...) # singleton, note not the `[[` index operator
indices <- ... # integer or logical of indices to include
sites <- lapply(dflist[indices], ...)

Related

Loop for including a list instead of an individual in ggenealogy package in R

I am trying to trace back a pedigree and I have a package to do it for specific individuals but instead, I need to use a list of 2000 animals. What I need is all the ancestors of each individual 5 generations back .
Here it is an example:
library(ggenealogy)
data(sbGeneal)
getAncestors("5601T", sbGeneal, 5)
I need to use a list of individuals instead of writing one by one the name of the animals.
Would it be possible?
Have you tried something like this?
library(ggenealogy)
data(sbGeneal)
lst <- sapply(sbGeneal[,1], function(x) getAncestors(x, sbGeneal, 5))
It gets all results done and store them to a list lst. This is just a rough idea. You may need to adjust the code.
To retrieve those values:
lst$`5601T`
lst$Adams
would be the same as
getAncestors("5601T", sbGeneal, 5)
getAncestors("Adam", sbGeneal, 5)

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!
You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

R - create new dataframes by random selection of consisting dataframes

i'm a R-beginner and i have a little problem. I want to create new dataframes by a random selection of consisting dataframes.
I have 4 (categories), each divided into 10 dataframes and i want to create 10 new dataframes, containing 1 dataframe from each category.
For example, these are my dataframes:
Cat_1_Data_1 Cat_2_Data_1 Cat_3_Data_1 Cat_4_Data_1
Cat_1_Data_2 Cat_2_Data_2 Cat_3_Data_2 Cat_4_Data_2
Cat_1_Data_3 Cat_2_Data_3 Cat_3_Data_3 Cat_4_Data_3
Cat_1_Data_4 Cat_2_Data_4 Cat_3_Data_4 Cat_4_Data_4
Cat_1_Data_5 Cat_2_Data_5 Cat_3_Data_5 Cat_4_Data_5
Cat_1_Data_6 Cat_2_Data_6 Cat_3_Data_6 Cat_4_Data_6
Cat_1_Data_7 Cat_2_Data_7 Cat_3_Data_7 Cat_4_Data_7
Cat_1_Data_8 Cat_2_Data_8 Cat_3_Data_8 Cat_4_Data_8
Cat_1_Data_9 Cat_2_Data_9 Cat_3_Data_9 Cat_4_Data_9
Cat_1_Data_10 Cat_2_Data_10 Cat_3_Data_10 Cat_4_Data_10
Creating new dataframes (that's how i do it):
new_data_1 <- rbind(cat_1_data_1,cat_2_data_1,cat_3_data_1,cat_4_data_1)
...
new_data_10 <- rbind(cat_1_data_10,cat_2_data_10,cat_3_data_10,cat_4_data_10)
But i want a random pick of the datasets, like:
new_data_1 <- rbind(cat_1_data_[Random 1-10],cat_2_data_[Random 1-10]... and so on)
...
new_data_10 <- rbind(cat_1_data_[Random 1-10],cat_2_data_[Random 1-10]...and so on)
Is there any possibility to solve this problem? Actually i don't know how to approach this problem :(
Here is one sampling strategy that would work.
Create lists of your data.frames, one per category shuffling them as you go:
dflist.cat1 <- sample(list(Cat_1_Data_1, Cat_1_Data_2, ...))
dflist.cat2 <- sample(list(Cat_2_Data_1, Cat_2_Data_2, ...))
...
Run lapply to rbind the corresponding element of each list. This will result in a list of length 10:
dflist.new <- lapply(1:10, function(i){
rbind(dflist.cat1[[i]],
dflist.cat2[[i]],
dflist.cat3[[i]],
dflist.cat4[[i]])
})
You can access your data.frames using dflist.new[[1]] for the first one, and so on.
I am sure there is a more elegant way to do this with 2-dimensional list indices, but this works well for a small number of categories.

Creating a new nested list element that is a combination of two existing nested list elements (in R)

I am looking for a hint about how to create a new nested list element from two existing nested list elements. In the current form of the script I am working on, I create a list called tardis that is n elements long, based on the number of elements in an input list. In the example blow, that input list, dataLayers, is 2 elements long.
After creating tardis, the script populates it by reading in data from 1200 netCDF files. Each of the 12 elements in 'mean' and 'sd' in tardis are matrices of geographic data, tardis[['data']][[decade]][['mean']][[month]], for example, for the 12 calendar months. When the list is fully populated I would like to create some derived variables. For example, in the snippet below, I would like to create a variable TOTALPRECIP by adding SNOW and RAIN. In doing this, I would like to create TOTALPRECIP from SNOW + RAIN as a third list element in tardis with the exact nested structure as the other two elements (adding them together and preserving the structure).
Is this possible with apply or its related functions?
begin <- 1901
end <- 1991
dataLayers <- c("SNOW","RAIN")
tardis<-list()
for (i in 1:length(dataLayers)){
tardis[[dataLayers[i]]]<-list('longName'='timeLord','units'='theDr','data'=list())
for (j in seq(begin,end,10)){
tardis[[dataLayers[[i]]]][['data']][[as.character(j)]]<-list('mean'=vector(mode='list',length=12),'sd'=vector(mode='list',length=12))
}
}
#add SNOW AND RAIN
print(names(tardis))
>[1] "SNOW" "RAIN" "TOTALPRECIP"
Here are your for loops using replicate (Note that the expression value for each replicate is the same expression you have in the assignment portion of your for loop)
## This is your inner for-loop, using replicate
inds <- seq(begin, end, 10)
datas <- replicate(length(inds), list('mean'=vector(mode='list',length=12),'sd'=vector(mode='list',length=12))
, simplify=FALSE)
names(datas) <- inds
# This is your outer loop
tardis2 <- replicate(length(dataLayers), list('longName'='timeLord','units'='theDr','data'=datas)
, simplify=FALSE)
names(tardis2) <- dataLayers
# Compare Results
identical(tardis2, tardis)
# [1] TRUE
However, I'm not sure if lists are relaly the best structure for this. Have you considered data.frames?

Loop within a function and automatically create objects in R

I try to calculate the column means for diffrent groups in R. there exist several methods to assign groups and so two columns where created that contain diffrent groupings.
# create a test df
df.abcd.2<-data.frame(Grouping1=c("a","f","a","d","d","f","a"),Grouping2=c("y","y","z","z","x","x","q"),Var1=sample(1:7),Var2=sample(1:7),Var3=rnorm(1:7))
df.abcd.2
Now I created a loop with assign, lapply, split and colMeans to get my results and store the in diffrent dfs. The loop works fine.
#Loop to create the colmeans and store them in dataframes
for (i in 1:2){
nam <- paste("RRRRRR",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df.abcd.2[,3:5], df.abcd.2[,i]), colMeans)
)
)
}
So now i would like to create a function to apply this method on diffrent dataframes. My attemp looked like this:
# 1. function to calculate colMeans for diffrent groups
# df= desired datatframe,
# a=starting column: beginning of the columns that contain the groups, b= end of columns that contain the groups
# c=startinc column: beginning of columns to be analized, d=end of columns do be analized
function.split.colMeans<-function(df,a,b,c,d)
{for (i in a:b){
nam <- paste("OOOOO",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df[,c:d], df[,i]), colMeans)
)
)
}
}
#test the function
function.split.colMeans(df.abcd.2,1,2,3,5)
So when I test this function I get neither an error message nor results... Can anyone help me out, please?
It's working perfectly. Read the help for assign. Learn about frames and environments.
In other words, its creating the variables inside your function, but they don't leak out into the environment you see when you do ls() at the command line. If you put print(ls()) inside your functions loop you'll see them, but when the function ends, they disappear.
Normally, the only way functions interact with their calling environment is by their return value. Any other method is entering a whole world of pain.
DONT use assign to create things with sequential or informative names. Ever. Unless you know what you are doing, which you don't... Stick them in lists, then you can index the parts for looping and so on.

Resources