Dynamically change part of variable name in R - r

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.

You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

Related

How can I specify data structure name from the string output in r programming?

I am coding this in r and solved this in an alternative way to make the vector to a list and assign value to each of the element of the list, but is there any other direct simple approach?
for(i in 1:5){
paste('var',i,sep='')=i
}
i want output where 1:5 will assign like
var1=1
var2=2
var3=3
var4=4
var5=5
Don’t do this. Use a vector or list instead:
var = 1 : 5
Now you can use var[1] (instead of var1) etc.
Your code doesn’t work because paste creates a character vector, not a variable name.

How can I use for loop for these process in R

I have a data frame that includes 43 different countries.
To summarize my data frame, row names like that: (AUS1, AUS2, AUS3, ... BRA1, BRA2, ... GER1, GER2...GER56) and there is a variable like Country which includes country codes.
I need to find their export values. I can find separately but, it is taking so much time because I have 14 different years. Thus, I want to use for loop. However, I can not find any way to use for loop for the below process.
This is my code to find export for single country.
##AUT
AUT <- filter(wiot, wiot$Country == "AUT")
exportAUT <- sum(AUT$TOT) - sum(select(AUT, starts_with("AUT")))
##BEL
BEL <- filter(wiot, wiot$Country == "BEL")
exportBEL <- sum(BEL$TOT) - sum(select(BEL, starts_with("BEL")))
Trying to create individually named objects for this set of results is the path to madness in R. Instead create a list with a more generic name and then put results in the "leaves" (individual element) inside the list:
export <- list()
for (i in wiot$Country) {
export[i] <- sum(wiot[i]$TOT) - sum(select(wiot, starts_with(i)))
#or maybe: export[i] <- sum(wiot[i]$TOT) - sum(wiot[ grepl(i,names(wiot)) ] )
}
This is a guess, since I'm not able to figure out how the rows and columns are referenced in your data.frame object. It would be much easier to debug this if you provided a less ambiguous description of the data object named wiot. Use either the output of str(wiot) or show output of dput(head(wiot))
Consider base R's by to build a named list of export calculations:
export_list <- by(wiot, wiot$country, function(sub)
sum(sub$TOT) - sum(select(sub, starts_with(sub$country[1])))
)
export_list$AUT
export_list$BEL
export_list$GER
...

Using Loop variable to access and write specific data.frames

I wrote a script, that reads CSV-Data with help of user input. For example when the user enters "20 40 160" the CSV files 1, 2 and 3 are read and saved as the data.frames d20, d40 and d160 in my global enviroment/workspace. The variable vel has the values for the user input.
Now for the actual question:
Im trying to manipulate the read data in a loop with the vel variable. For example:
for (i in vel)
{
newVariable"i" <- d"i"[6]
}
I know thats not the correct syntax for the programming, but what im trying to do ist to write a newVariable with a specific row from a specific data frame d.
The result should be:
newVariable20 = d20[20]
newVariable40 = d40[20]
newVariable160 = d160[20]
So I think the actual question is, how do I use the Loop Variable for calling out the names of the created data frames and for writing new variables.
There are a couple of ways to do this. One is to store all of your dataframes in a list originally. There are a couple ways to do this. Start with an empty list and then put each df into the next position in the list. Note that you have to use list(df) because a dataframe is actually already a list and gets messed up if you don't do this.
list_of_df <- list();
list_of_df[1] <- list(df1);
list_of_df["df20"] <- list(df2)
This makes it easy to loop through the dataframes. If you want column 4 of dataframe 2 you just put in
list_of_df[[2]][,4]
# Same thing different code
list_of_df[["df20"]][,4]
The double brackets [[2]] give you the value that is stored in the list at position 2 (instead of [2] which gives you a list containing the value and metadata). The next [,4] says that from the dataframe we just got the value of, we now want to get every row of the 4th column. Note that this will output a vector and not a dataframe.
Or in a loop:
for(df in list_of_df) {
print(df)
}

Iterate over Factors in a dataframe in R

I am rather new to R and struggling at the moment with a specific issue. I need to iterate over a dataframe with 1 variable returned from a SQL database so that I can ultimately issue additional SQL queries using the information in the 1 variable. I need help understanding how to do this.
Here is what I have
> dt
Col
1 5D2D3F03-286E-4643-8F5B-10565608E5F8
2 582771BE-811E-4E45-B770-42A98EB5D7FB
3 4EB4D553-C680-4576-A854-54ED817226B0
4 80D53D5D-80D1-4A60-BD86-C85F6D53390D
5 9EF6CABF-0A4F-4FA9-9FD9-132589CAAC31
when trying to access by using it prints the entire list just as above
> dt[1]
Col
1 5D2D3F03-286E-4643-8F5B-10565608E5F8
2 582771BE-811E-4E45-B770-42A98EB5D7FB
3 4EB4D553-C680-4576-A854-54ED817226B0
4 80D53D5D-80D1-4A60-BD86-C85F6D53390D
5 9EF6CABF-0A4F-4FA9-9FD9-132589CAAC31
when trying to access by dt[1,] it brings additional unwanted information.
> a<-dt[1,]
> a
[1] 5D2D3F03-286E-4643-8F5B-10565608E5F8
5 Levels: 4EB4D553-C680-4576-A854-54ED817226B0 ... 9EF6CABF-0A4F-4FA9-9FD9-132589CAAC31
I need to isolate just the '5D2D3F03-286E-4643-8F5B-10565608E5F8' information and not the '5 levels......'.
I am sure this is simple, I just can't find it. any help is appreciated!
thanks!
There are two issues you need to address. One is that you want character data, not a factor variable (a factor is essentially a category variable). The other is that you want a simple vector of the values, not a data.frame.
1) To get the first column as a vector, use double-brackets or the $ notation:
a <- dt[[1]]
a <- dt[['Col']]
a <- dt$Col
Your notation dt[1,] does actually return the column as a vector too, but using the somewhat obscure fact that the [ method for data.frame objects will silently "drop" its value to a vector when using the two-index form dt[i,j], but not when using the one-index form dt[i]:
When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list. In this usage a drop argument is ignored, with a warning.
Think of "dropping" like unboxing the data - instead of getting a data.frame with a single column, you're just getting the column data itself.
2) To convert to character data, use one of the suggestions in the comments from #akrun or #Vlo:
a <- as.character(dt[[1]])
a <- as.character(dt[['Col']])
a <- as.character(dt$Col)
or use the API of whatever you're using to make the SQL query - or to read in the results of the query - not convert the strings to factors in the first place.

R equivalent to the MATLAB structure?

Is there an R type equivalent to the Matlab structure type?
I have a few named vectors and I try to store them in a data frame. Ideally, I would simply access one element of an object and it would return the named vectors (like a structure in Matlab). I feel that using a data frame is not the right thing to do since it can store the values of the named vectors but not the names when they differ from one vector to the other.
More generally, is it possible to store a bunch of different objects in a single one in R?
Edit: As Joran said I think that list does the job.
l = list()
l$vec1 = namedVector1
l$vec2 = namedVector2
...
If I have a list of names
name1 = 'vec1'
name2 = 'vec2'
is there any way for the interpreter to understand that when I use a variable name like name1, I am not referring to the variable name but to its content? I have tried get(name1) but it does not work.
I could still be wrong about what you're trying to do, but I think this is the best you're going to get in terms of accessing each list element by name:
l <- list(a= 1:3,b = 1:10)
> ind <- "a"
> l[[ind]]
[1] 1 2 3
Namely, you're going to have to use [[ explicitly.

Resources