refer to a data frame that was dynamically created - r

I need to create data frames dynamically and refer to them. So far I can create data frame dynamically like:
master<-c("bob","ed","frank")
d<-seq(1:10)
for (i in 1:length(master)){
assign(master[i], d )
}
ed[6]
now if I do
ls()
I can see there is an "ed" object. I want to refer to an manipulate the data in it WITHOUT referring to the name.
i.e. instead of doing "ed[6]"
I want to have "ed" in a variable like:
master[2][6] # BUT THIS DOES NOT WORK
or
df<-"ed" #this does not work either
df[6]
The point of me naming the data frames dynamically was so I can refer to them dynamically. How can I do this?
Thank you!

You can use get as Atilla suggests, but for a case like this you may be better off creating a list and then referring to objects by list index instead. It's tidier to create one object then a whole bunch, and referencing the contents is simple.
# create empty list
my_list <- list()
# put stuff in the list
for (i in 1:length(master)) {
my_list[[i]] <- d
}
# get the 6th element from the 2nd object in my_list
my_list[[2]][6]

Use get.
get(master[2])[5]
get(master[2])[2]
If you want to set values, you need to use assign. But be careful, it assign values as a whole. What I mean is that you can not set one value of vector, you need to set whole vector.
master<-c("bob","ed","frank")
d<-seq(1:10)
for (i in 1:length(master)){
assign(master[i], d )
}
ed[6]
get(master[2])[5]
get(master[2])[2]
temp <- get(master[2])
# assign value 20 to index 5
assign("ed",20)
assign("ed[6]",20) # it creates a variable named "ed[6]", not what you want
ls(pattern = "^ed.*$")
temp[6] = 20
assign("ed",temp)
rm(temp) # remove temp if you do not need it
get("ed")[6]

Related

Dynamically change part of variable name in R

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.
You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

How can I split any table/df dynamically depending on its number of elements and store each element as a list?

I am trying to split a table on the basis of difference of the value of one column as follows:
Creating a new table that contains information organised by chromosome numbers as lists
t2_s=split(tbl2, tbl2$chr)
this creates a list of lists. Each list contains up to 10,000 rows. I want to now extract each list and assign it a name dynamically, I want to be able to do this dynamically so that my program can handle any table with any size and any number of lists after splitting based on a column.
I tried the following but I think I am trying to apply Java logic to R:
counter = 1
for (j in t2_s){
paste(c("chrList", counter), collapse = " ") <- (t2_s[[counter]])
counter = counter + 1
}
I need something that would not choke R performance wise as well, as the size of each generated list will be huge as well.
I am an amateur coder so any help would be much appreciated.
You can use the function assign() for this task.
for (j in 1:length(t2_s)) {
tmp <- paste("chrList", j, sep = "_")
assign(tmp, t2_s[[j]])
}
This will create an object for each observation in your list. If you want to then create a list of all the objects you just created, you could do so with this.
# this will get a character vector of all objects in your global environment
all_env_objects <- ls()
# this will extract the newly created objects from the above code
all_new_objects <- all_env_objects[grep("chrList_", all_env_objects)]
# this will create a list containing all the objects your created
your_list <- do.call("list", mget(all_new_objects))

Using Loop variable to access and write specific data.frames

I wrote a script, that reads CSV-Data with help of user input. For example when the user enters "20 40 160" the CSV files 1, 2 and 3 are read and saved as the data.frames d20, d40 and d160 in my global enviroment/workspace. The variable vel has the values for the user input.
Now for the actual question:
Im trying to manipulate the read data in a loop with the vel variable. For example:
for (i in vel)
{
newVariable"i" <- d"i"[6]
}
I know thats not the correct syntax for the programming, but what im trying to do ist to write a newVariable with a specific row from a specific data frame d.
The result should be:
newVariable20 = d20[20]
newVariable40 = d40[20]
newVariable160 = d160[20]
So I think the actual question is, how do I use the Loop Variable for calling out the names of the created data frames and for writing new variables.
There are a couple of ways to do this. One is to store all of your dataframes in a list originally. There are a couple ways to do this. Start with an empty list and then put each df into the next position in the list. Note that you have to use list(df) because a dataframe is actually already a list and gets messed up if you don't do this.
list_of_df <- list();
list_of_df[1] <- list(df1);
list_of_df["df20"] <- list(df2)
This makes it easy to loop through the dataframes. If you want column 4 of dataframe 2 you just put in
list_of_df[[2]][,4]
# Same thing different code
list_of_df[["df20"]][,4]
The double brackets [[2]] give you the value that is stored in the list at position 2 (instead of [2] which gives you a list containing the value and metadata). The next [,4] says that from the dataframe we just got the value of, we now want to get every row of the 4th column. Note that this will output a vector and not a dataframe.
Or in a loop:
for(df in list_of_df) {
print(df)
}

Convert A List Object into a Useable Matrix Name (R)

I want to be able to use a loop to perform the same funtion on a group of data sets without having to recall the name of all of the data sets individually. For example, say I have the following matricies:
a<-matrix(1:5,nrow=5,ncol=2)
b<-matrix(6:10,nrow=5,ncol=2)
c<-matrix(11:15,nrow=5,ncol=2)
I define a vector of set names:
SetNames<- c("a","b","c")
Then I want to sum the second column of all of the matricies without having to call each matrix name. Basically, I would like to be able to call SetNames[1], have the program return 'a' as USEABLE text which can be used to call apply(a[2],2,sum).
If apply(SetNames[1][2],2,sum) worked, that would be the basic syntax I was looking for, however I would replace the 1 with a variable I can increase in a loop.
sapply can do that.
sapply(SetNames, function(z) {
dfz <- get(z)
sum(dfz[,2])
})
# a b c
# 15 40 65
Notice that get() is used here to dynamically access a variable.
a less compact way of writing this would be
sumRowTwo <- function(z) {
dfz <- get(z)
sum(dfz[,2])
}
sapply(SetNames, sumRowTwo)
and now you can play around with sumRowTwo and see what e.g.
sumRowTwo("a")
returns

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!
You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

Resources