Using Loop variable to access and write specific data.frames - r

I wrote a script, that reads CSV-Data with help of user input. For example when the user enters "20 40 160" the CSV files 1, 2 and 3 are read and saved as the data.frames d20, d40 and d160 in my global enviroment/workspace. The variable vel has the values for the user input.
Now for the actual question:
Im trying to manipulate the read data in a loop with the vel variable. For example:
for (i in vel)
{
newVariable"i" <- d"i"[6]
}
I know thats not the correct syntax for the programming, but what im trying to do ist to write a newVariable with a specific row from a specific data frame d.
The result should be:
newVariable20 = d20[20]
newVariable40 = d40[20]
newVariable160 = d160[20]
So I think the actual question is, how do I use the Loop Variable for calling out the names of the created data frames and for writing new variables.

There are a couple of ways to do this. One is to store all of your dataframes in a list originally. There are a couple ways to do this. Start with an empty list and then put each df into the next position in the list. Note that you have to use list(df) because a dataframe is actually already a list and gets messed up if you don't do this.
list_of_df <- list();
list_of_df[1] <- list(df1);
list_of_df["df20"] <- list(df2)
This makes it easy to loop through the dataframes. If you want column 4 of dataframe 2 you just put in
list_of_df[[2]][,4]
# Same thing different code
list_of_df[["df20"]][,4]
The double brackets [[2]] give you the value that is stored in the list at position 2 (instead of [2] which gives you a list containing the value and metadata). The next [,4] says that from the dataframe we just got the value of, we now want to get every row of the 4th column. Note that this will output a vector and not a dataframe.
Or in a loop:
for(df in list_of_df) {
print(df)
}

Related

Dynamically change part of variable name in R

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.
You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

How can I use for loop for these process in R

I have a data frame that includes 43 different countries.
To summarize my data frame, row names like that: (AUS1, AUS2, AUS3, ... BRA1, BRA2, ... GER1, GER2...GER56) and there is a variable like Country which includes country codes.
I need to find their export values. I can find separately but, it is taking so much time because I have 14 different years. Thus, I want to use for loop. However, I can not find any way to use for loop for the below process.
This is my code to find export for single country.
##AUT
AUT <- filter(wiot, wiot$Country == "AUT")
exportAUT <- sum(AUT$TOT) - sum(select(AUT, starts_with("AUT")))
##BEL
BEL <- filter(wiot, wiot$Country == "BEL")
exportBEL <- sum(BEL$TOT) - sum(select(BEL, starts_with("BEL")))
Trying to create individually named objects for this set of results is the path to madness in R. Instead create a list with a more generic name and then put results in the "leaves" (individual element) inside the list:
export <- list()
for (i in wiot$Country) {
export[i] <- sum(wiot[i]$TOT) - sum(select(wiot, starts_with(i)))
#or maybe: export[i] <- sum(wiot[i]$TOT) - sum(wiot[ grepl(i,names(wiot)) ] )
}
This is a guess, since I'm not able to figure out how the rows and columns are referenced in your data.frame object. It would be much easier to debug this if you provided a less ambiguous description of the data object named wiot. Use either the output of str(wiot) or show output of dput(head(wiot))
Consider base R's by to build a named list of export calculations:
export_list <- by(wiot, wiot$country, function(sub)
sum(sub$TOT) - sum(select(sub, starts_with(sub$country[1])))
)
export_list$AUT
export_list$BEL
export_list$GER
...

How can I split any table/df dynamically depending on its number of elements and store each element as a list?

I am trying to split a table on the basis of difference of the value of one column as follows:
Creating a new table that contains information organised by chromosome numbers as lists
t2_s=split(tbl2, tbl2$chr)
this creates a list of lists. Each list contains up to 10,000 rows. I want to now extract each list and assign it a name dynamically, I want to be able to do this dynamically so that my program can handle any table with any size and any number of lists after splitting based on a column.
I tried the following but I think I am trying to apply Java logic to R:
counter = 1
for (j in t2_s){
paste(c("chrList", counter), collapse = " ") <- (t2_s[[counter]])
counter = counter + 1
}
I need something that would not choke R performance wise as well, as the size of each generated list will be huge as well.
I am an amateur coder so any help would be much appreciated.
You can use the function assign() for this task.
for (j in 1:length(t2_s)) {
tmp <- paste("chrList", j, sep = "_")
assign(tmp, t2_s[[j]])
}
This will create an object for each observation in your list. If you want to then create a list of all the objects you just created, you could do so with this.
# this will get a character vector of all objects in your global environment
all_env_objects <- ls()
# this will extract the newly created objects from the above code
all_new_objects <- all_env_objects[grep("chrList_", all_env_objects)]
# this will create a list containing all the objects your created
your_list <- do.call("list", mget(all_new_objects))

Dynamically assign variable names for vectors in R?

I'm new to R and I am trying to create variables referencing vectors within a for loop, where the index of the loop will be appended to the variable name. However, the following code below, where I'm trying to insert the new vectors into the appropriate place in the larger data frame, is not working and I've tried many variations of get(), as.vector(), eval() etc. in the data frame construction function.
I want num_incorrect.8 and num_incorrect.9 to be vectors with a value of 0 and then be inserted into mytable.
cols_to_update <- c(8,9)
for (i in cols_to_update)
{
#column name of insertion point
insertion_point <- paste("num_correct",".",i,sep="")
#create the num_incorrect col -- as a vector of 0s
assign(paste("num_incorrect",".",i,sep=""), c(0))
#index of insertion point
thespot <- which(names(mytable)==insertion_point)
#insert the num_incorrect vector and rebuild mytable
mytable <- data.frame(mytable[1:thespot], as.vector(paste("num_incorrect",".",i,sep="")), mytable[(thespot+1):ncol(mytable)])
#update values
mytable[paste("num_incorrect",".",i,sep="")] <- mytable[paste("num_tries",".",i,sep="")] - mytable[paste("num_correct",".",i,sep="")]
}
When I look at how the column insertion went, it looks like this:
[626] "num_correct.8"
[627] "as.vector.paste..num_incorrect........i..sep........2"
...
[734] "num_correct.9"
[735] "as.vector.paste..num_incorrect........i..sep........3"
Basically, it looks like it's taking my commands as literal text. The last line of code works as expected and creates new columns at the end of the data frame (since the line before it didn't insert the column into the proper place):
[1224] "num_incorrect.8"
[1225] "num_incorrect.9"
I am kind of out of ideas, so if someone could please give me an explanation of what's wrong and why, and how to fix it, I would appreciate it. Thanks!
The mistake is in the second last lines of your code, excluding the comments where you are creating the vector and adding it to your data frame.
You just need to add the vector and update the name. You can remove the assign function as it's not creating a vector instead just assigning a value of 0 to the variable.
Instead of the second last line of your code put the code below and it should work.
#insert the vector at the desired location
mytable <- data.frame(mytable[1:thespot], newCol = vector(mode='numeric',length = nrow(mytable)), mytable[(thespot+1):ncol(mytable)])
#update the name of new location
names(mytable)[thespot + 1] = paste("num_incorrect",".",i,sep="")

refer to a data frame that was dynamically created

I need to create data frames dynamically and refer to them. So far I can create data frame dynamically like:
master<-c("bob","ed","frank")
d<-seq(1:10)
for (i in 1:length(master)){
assign(master[i], d )
}
ed[6]
now if I do
ls()
I can see there is an "ed" object. I want to refer to an manipulate the data in it WITHOUT referring to the name.
i.e. instead of doing "ed[6]"
I want to have "ed" in a variable like:
master[2][6] # BUT THIS DOES NOT WORK
or
df<-"ed" #this does not work either
df[6]
The point of me naming the data frames dynamically was so I can refer to them dynamically. How can I do this?
Thank you!
You can use get as Atilla suggests, but for a case like this you may be better off creating a list and then referring to objects by list index instead. It's tidier to create one object then a whole bunch, and referencing the contents is simple.
# create empty list
my_list <- list()
# put stuff in the list
for (i in 1:length(master)) {
my_list[[i]] <- d
}
# get the 6th element from the 2nd object in my_list
my_list[[2]][6]
Use get.
get(master[2])[5]
get(master[2])[2]
If you want to set values, you need to use assign. But be careful, it assign values as a whole. What I mean is that you can not set one value of vector, you need to set whole vector.
master<-c("bob","ed","frank")
d<-seq(1:10)
for (i in 1:length(master)){
assign(master[i], d )
}
ed[6]
get(master[2])[5]
get(master[2])[2]
temp <- get(master[2])
# assign value 20 to index 5
assign("ed",20)
assign("ed[6]",20) # it creates a variable named "ed[6]", not what you want
ls(pattern = "^ed.*$")
temp[6] = 20
assign("ed",temp)
rm(temp) # remove temp if you do not need it
get("ed")[6]

Resources