assign rownames within a loop to constructed variables names - r

I'm creating data frames within a loop. The data frames' name should be the combination of a name and a number (the loops iteration). I use the assign function for this and works. I also want to assign names to the data frames' rows. I tried two ways, but I'm getting the error messages "target of assignment expands to non-language object" and "only the first element is used as variable name". Below is a reproducible example of I'm trying to do.
rows<-c("a","b")
df<-data.frame(var1=c(1,2),var2=c(10,20))
for (n in 1:2){
assign (paste("data",n,sep="_"),df)
rownames(get(paste("data",n,sep="_")))<-rows # it doesn't work
assign(rownames(get(paste("data",n,sep="_"))),rows) # it doesn't work
}
I'd like to know why it doesn't work and how to solve it. I found similar threads like this and this, but I was not able to solve my case. Thank you.

Based on Roland's comment, I come out with this solution:
rows<-c("a","b")
df<-data.frame(var1=c(1,2),var2=c(10,20))
dfs<-list()
for (n in 1:2){
dfs[[n]]<-df
rownames(dfs[[n]])<-rows
}
A list is the key!

Or without list, you just need a dummy variable:
rows<-c("a","b")
df<-data.frame(var1=c(1,2),var2=c(10,20))
for (n in 1:2){
assign (paste("data",n,sep="_"),df)
labelling <- get(paste("data",n,sep="_"))
labels <- rows
rownames(labelling)<-labels
assign(paste("data",n,sep="_"),labelling)
}

Related

Renaming Columns with index with a For Loop in R

I am writing this post to ask for some advice for looping code to rename columns by index.
I have a data set that has scale item columns positioned next to each other. Unfortunately, they are oddly named.
I want to re-name each column in this format: SimRac1, SimRac2, SimRac3.... and so on. I know the location of the columns (Columns number 30 to 37). I know these scale items are ordered in such a way that they can be named and numbered in increased order from left to right.
The code I currently have works, but is not efficient. There are other scales, in different locations, that also need to be renamed in a similar fashion. This would result in dozens of code rows.
See below code.
names(Total)[30] <- "SimRac1"
names(Total)[31] <- "SimRac2"
names(Total)[32] <- "SimRac3"
names(Total)[33] <- "SimRac4"
names(Total)[34] <- "SimRac5"
names(Total)[35] <- "SimRac6"
names(Total)[36] <- "SimRac7"
names(Total)[37] <- "SimRac8"
I want to loop this code so that I only have a chunk of code that does the work.
I was thinking perhaps a "for loop" would help.
Hence, the below code
for (i in Total[,30:37]){
names(Total)[i] <- "SimRac(1:8)"
}
This, unfortunately does not work. This chunk of code runs without error, but it doesn't do anything.
Do advice.
In the OP's code, "SimRac(1:8)" is a constant. To have dynamic names, use paste0.
We do not need a loop here. We can use a vectorized function to create the names, then assign the names to a subset of names(Total)
names(Total)[30:37]<-paste0('SimRac', 1:8)

For loop to create multiple empty data frames gives error

I wrote a for loop to create empty multiple data frames, using a vector of names, but even though it seemed really easy at start I got an error message : Error in ID_names[i] <- data.frame() : replacement has length zero
To be more specific I' ll provide you with a reproducable example:
ID_names <- c("Athens","Rome","Barcelona","London","Paris","Madrid")
for(i in 1:length(ID_names){
ID_names[i] <- data.frame()
}
Do you have any idea why this is wrong? I would like to ask you not only provide a solution, but specify me why this for loop is wrong in order to avoid such kind of mistakes in the future.
You are trying to store a dataframe in one element of a vector (ID_names[i]) which is not possible. You might want to create a list of empty dataframes and assign names to it which can be done using replicate.
ID_names <- c("Athens","Rome","Barcelona","London","Paris","Madrid")
list_data <- setNames(replicate(length(ID_names), data.frame()), ID_names)
However, very rarely such initialisation of empty dataframes will be useful. It ends up creating more confusion down the road. Depending on your actual use case there might be other better ways to handle this.

How to reference a dynamically assigned dataframe name

I have successfully allocated dataframe names and populated them (see code) but I do not know how to subsequently reference them. So I loop through to assign df.test1 and populate it with some data 1 and so on. I know that the df has been created, and can view or summary it in the console, but not in the code.
I am pretty new to R so am not sure if some of the solutions I have looked at apply to me.
num.clusters <- 5
for (i in 1:num.clusters) {
assign(paste("df.test",i,sep=""), paste("somedata", i))
}
This works but Then want to do something like:
View(df.test,i)
to view whatever iteration from 1 to 5.
I want to be able to use the assigned dataframes like any other dataframe. I could hard code this as View(df.test1) but that would defeat the point. I also want to do other things with the datframe, e.g. subsetting.
I know this doesn't work. Would love to know what does.
Many thanks...
Your question is the proof that the approach is problematic: avoid using assign in general because it makes accessing the variables afterwards awkward (among other issues).
A cleaner way is to just put your "data frames" (copying from your example) in a list:
num.clusters <- 5
df.test <- list()
for (i in 1:num.clusters) {
df.test[[i]] <- paste("somedata", i)
}
Then you would just access them like this:
View(df.test[[i]])
If what you put in there was an actual data.frame (and not the strings you were using), you could then access its columns like any other data.frame:
df.test[[i]]$Name
Or
df.test[[i]][, "Name"]

Name new dataframes from character vectors - loop

I think this one is easy but I still can't figure it out and I really need help with this. I've looked everywhere but still couldn't find it.
Let's say I have this vector:
filenames <- c("fn1", "fn2", "fn3")
And I want to associate them with an dataframe that is created according to a function, that is generated at that time
df|name from filenames[i]| <- df
so it would return these dataframes
dffn1
dffn2
dffn3
I hope I made myself clear. My problem is create a new data frame and name it according to a list or whatever, in a for loop.
You can use assign to achieve what you want.
for(nms in filenames){
assign(paste('df',nms,sep=''), df) }

Looping in R to create transformed variables

I have a dataset of 80 variables, and I want to loop though a subset of 50 of them and construct returns. I have a list of the names of the variables for which I want to construct returns, and am attempting to use the dplyr command mutate to construct the variables in a loop. Specifically my code is:
for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") = (i - lag(i,1))/lag(i,1))}
where returnvars is my list, and alldta is my dataset. When I run this code outside the loop with just one of the `i' values, it works fine. The code for that looks like this:
alldta <- mutate(alldta,rVar = (Var- lag(Var,1))/lag(Var,1))
However, when I run it in the loop (e.g., attempting to do the previous line of code 50 times for 50 different variables), I get the following error:
Error: unexpected '=' in:
"for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") ="
I am unsure why this issue is coming up. I have looked into a number of ways to try and do this, and have attempted solutions that use lapply as well, without success.
Any help would be much appreciated! If there is an easy way to do this with one of the apply commands as well, that would be great. I did not provide a dataset because my question is not data specific, I'm simply trying to understand, as a relative R beginner, how to construct many transformed variables at once and add them to my data frame.
EDIT: As per Frank's comment, I updated the code to the following:
for (i in returnvars) {
varname <- paste("r",i,sep="")
alldta <- mutate(alldta,varname = (i - lag(i,1))/lag(i,1))}
This fixes the previous error, but I am still not referencing the variable correctly, so I get the error
Error in "Var" - lag("Var", 1) :
non-numeric argument to binary operator
Which I assume is because R sees my variable name Var as a string, rather than as a variable. How would I correctly reference the variable in my dataset alldta? I tried get(i) and alldta$get(i), both without success.
I'm also still open to (and actively curious about), more R-style ways to do this entire process, as opposed to using a loop.
Using mutate inside a loop might not be a good idea either. I am not sure if mutate makes a copy of the data frame but its generally not a good practice to grow a data frame inside a loop. Instead create a separate data frame with the output and then name the columns based on your logic.
result = do.call(rbind,lapply(returnvars,function(i) {...})
names(result) = paste("r",returnvars,sep="")
After playing around with this more, I discovered (thanks to Frank's suggestion), that the following works:
extended <- alldta # Make a copy of my dataset
for (i in returnvars) {
varname <- paste("r",i,sep="")
extended[[varname]] = (extended[[i]] - lag(extended[[i]],1))/lag(extended[[i]],1)}
This is still not very R-styled in that I am using a loop, but for a task that is only repeating about 50 times, this shouldn't be a large issue.

Resources