Multiple Recodes in R - r

I am looking to recode a large number of variables, and figure I can probably use some sort of loop to do so. What throws me is how to programmatically name each variable (I just want to keep the var name and append ".rc".
Here is an example. Lets say I have a set of variables, var.1 to var.5. I am looking to create a new variable in my dataframe that is var.1.rc <- var.1 / sum(var.1 to var1.5). Ill do the same for the next variable, and so on.
I am new to R but this would be a HUGE step forward for me.
Is it possible. Best ways to do it? Any help will be much appreciated!
Regards,
Brock

If I understand you correctly, there is actually a pretty easy way to do this. Assuming your original data frame is called dat, you can do this:
dat.rc <- dat/rowSums(dat)
names(dat.rc) <- paste(names(dat), ".rc", sep="")
dat <- data.frame(dat,dat.rc)

You could try the following loop.
Here the eval(parse(text="")) allows you evaluate a pasted together string containing the various static and dynamic portions of the expression to create each new variable.
for (i in 1:5) {
X<-paste("var.",i,".rc<-var.",i,"/(var.1+var.2+var.3+var.4+var.5)",sep="")
eval(parse(text=X))
}

Related

How to reference a dynamically assigned dataframe name

I have successfully allocated dataframe names and populated them (see code) but I do not know how to subsequently reference them. So I loop through to assign df.test1 and populate it with some data 1 and so on. I know that the df has been created, and can view or summary it in the console, but not in the code.
I am pretty new to R so am not sure if some of the solutions I have looked at apply to me.
num.clusters <- 5
for (i in 1:num.clusters) {
assign(paste("df.test",i,sep=""), paste("somedata", i))
}
This works but Then want to do something like:
View(df.test,i)
to view whatever iteration from 1 to 5.
I want to be able to use the assigned dataframes like any other dataframe. I could hard code this as View(df.test1) but that would defeat the point. I also want to do other things with the datframe, e.g. subsetting.
I know this doesn't work. Would love to know what does.
Many thanks...
Your question is the proof that the approach is problematic: avoid using assign in general because it makes accessing the variables afterwards awkward (among other issues).
A cleaner way is to just put your "data frames" (copying from your example) in a list:
num.clusters <- 5
df.test <- list()
for (i in 1:num.clusters) {
df.test[[i]] <- paste("somedata", i)
}
Then you would just access them like this:
View(df.test[[i]])
If what you put in there was an actual data.frame (and not the strings you were using), you could then access its columns like any other data.frame:
df.test[[i]]$Name
Or
df.test[[i]][, "Name"]

R loop to create multiple objects from equally as many existing objects

I've tried searching for the answer to this but am having trouble because I'm not sure I'm even searching the right thing. Basically I would like in R to create a loop to create multiple objects, each from a different object. For example, let's say I have 50 existing objects (matrix, data frame, graph etc.) and they are all named similarly (table1, table2...table50). Now I would like to create 50 new objects, lets say graph1...graph50. I'm having trouble with a loop because I don't know how to work with the names being strings. I've tried the assign function, but it isn't dynamic enough in the assignment argument. I would basically like something like this:
for (i in list(table names)){
graph "i" <- as.network(table "i")
}
I would also like to have this hold for objects assigned as some function of itself ie graph "i" <- somefunction(graph "i") etc...
Additionally if there is a more efficient way by all means I'm open to it. It seems like an easy task but I can't figure it out. Right now I'm literally just concatenating the statements in excel and pasting to R so it doesn't take too long, but it is a pain. Thank you.
I think you could have a nested loop to do what you're looking for; you could could apply whatever transformations you're wanting to each object within the input list and store the results in a new list with the same object names.
in_list <- list(table1 = iris,
table2 = EuStockMarkets)
out_list <- list()
for(i in 1:length(in_list)){
for(j in colnames(in_list[[i]])){
out_list[[ gsub("table", "graph", names(in_list)[i]) ]][[j]] <- summary(in_list[[i]][,j])
}
}
Hope this helps!

Name new dataframes from character vectors - loop

I think this one is easy but I still can't figure it out and I really need help with this. I've looked everywhere but still couldn't find it.
Let's say I have this vector:
filenames <- c("fn1", "fn2", "fn3")
And I want to associate them with an dataframe that is created according to a function, that is generated at that time
df|name from filenames[i]| <- df
so it would return these dataframes
dffn1
dffn2
dffn3
I hope I made myself clear. My problem is create a new data frame and name it according to a list or whatever, in a for loop.
You can use assign to achieve what you want.
for(nms in filenames){
assign(paste('df',nms,sep=''), df) }

Looping in R to create transformed variables

I have a dataset of 80 variables, and I want to loop though a subset of 50 of them and construct returns. I have a list of the names of the variables for which I want to construct returns, and am attempting to use the dplyr command mutate to construct the variables in a loop. Specifically my code is:
for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") = (i - lag(i,1))/lag(i,1))}
where returnvars is my list, and alldta is my dataset. When I run this code outside the loop with just one of the `i' values, it works fine. The code for that looks like this:
alldta <- mutate(alldta,rVar = (Var- lag(Var,1))/lag(Var,1))
However, when I run it in the loop (e.g., attempting to do the previous line of code 50 times for 50 different variables), I get the following error:
Error: unexpected '=' in:
"for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") ="
I am unsure why this issue is coming up. I have looked into a number of ways to try and do this, and have attempted solutions that use lapply as well, without success.
Any help would be much appreciated! If there is an easy way to do this with one of the apply commands as well, that would be great. I did not provide a dataset because my question is not data specific, I'm simply trying to understand, as a relative R beginner, how to construct many transformed variables at once and add them to my data frame.
EDIT: As per Frank's comment, I updated the code to the following:
for (i in returnvars) {
varname <- paste("r",i,sep="")
alldta <- mutate(alldta,varname = (i - lag(i,1))/lag(i,1))}
This fixes the previous error, but I am still not referencing the variable correctly, so I get the error
Error in "Var" - lag("Var", 1) :
non-numeric argument to binary operator
Which I assume is because R sees my variable name Var as a string, rather than as a variable. How would I correctly reference the variable in my dataset alldta? I tried get(i) and alldta$get(i), both without success.
I'm also still open to (and actively curious about), more R-style ways to do this entire process, as opposed to using a loop.
Using mutate inside a loop might not be a good idea either. I am not sure if mutate makes a copy of the data frame but its generally not a good practice to grow a data frame inside a loop. Instead create a separate data frame with the output and then name the columns based on your logic.
result = do.call(rbind,lapply(returnvars,function(i) {...})
names(result) = paste("r",returnvars,sep="")
After playing around with this more, I discovered (thanks to Frank's suggestion), that the following works:
extended <- alldta # Make a copy of my dataset
for (i in returnvars) {
varname <- paste("r",i,sep="")
extended[[varname]] = (extended[[i]] - lag(extended[[i]],1))/lag(extended[[i]],1)}
This is still not very R-styled in that I am using a loop, but for a task that is only repeating about 50 times, this shouldn't be a large issue.

Executing for loop in R

I am pretty new to R and have a couple of questions about a loop I am attemping to execute. I will try explain myself as best as possible reguarding what I wish the loop to do.
for(i in (1988:1999,2000:2006)){
yearerrors=NULL
binding=do.call("rbind.fill",x[grep(names(x), pattern ="1988.* 4._ data=")])
cmeans=lapply(binding[,2:ncol(binding)],mean)
datcmeans=as.data.frame(cmeans)
finvec=datcmeans[1,]
kk=0
result=RMSE2(yields[(kk+1):(kk+ncol(binding))],finvec)
kk=kk+ncol(binding)
yearerrors=c(result)
}
yearerrors
First I wish for the loop to iterate over file names of data.
Specifically over the years 1988-2006 in the place where 1988 is
placed right now in the binding statement. x is a list of data files
inputted into R and the 1988 is part of the file name. So, I have
file names starting with 1988,1989,...,2006.
yields is a numeric vector and I would like to input the indices of
the vector into the function RMSE2 as indicated in the loop. For
example, over the first iteration I wish for the indices 1 to the
number of columns in binding to be used. Then for the next iteration
I want the first index to be 1 more than what the previous iteration
ended with and continue to a number equal to the number of columns in the next binding
statement. I just don't know if what I have written will accomplish
this.
Finally, I wish to store each of these results in the vector
yearerrors and then access this vector afterwards.
Thanks so much in advance!
OK, there's a heck of a lot of guesswork here because the structure of your data is extremely unclear, I have no idea what the RMSE2 function is (and you've given no detail). Based on your question the other day, I'm going to assume that your data is in .csv files. I'm going to have a stab at your problem.
I would start by building the combined dataframe while reading the files in, not doing one then the other. Like so:
#Set your working directory to the folder containing the .csv files
#I'm assuming they're all in the form "YEAR.something.csv" based on your pattern matching
filenames <- list.files(".", pattern="*.csv") #if you only want to match a specific year then add it to the pattern match
years <- gsub("([0-9]+).*", "\\1", filenames)
df <- mdply(filenames, read.csv)
df$year <- as.numeric(years[df$X1]) #Adds the year
#Your column mean dataframe didn't work for me
cmeans <- as.data.frame(t(colMeans(df[,2:ncol(df)])))
It then gets difficult to know what you're trying to achieve. Since your datcmeans is a one row data.frame, datcmeans[1,] doesn't change anything. So if a one row from a dataframe (or a numeric vector) is an argument required for your RMSE2 function, you can just pass it datcmeans (cmeans in my example).
Your code from then is pretty much indecipherable to me. Without know what yields looks like, or how RMSE2 works, it's pretty much impossible to help more.
If you're going to do a loop here, I'll say that setting kk=kk+ncol(binding) at the end of the first iteration is not going to help you, since you've set kk=0, kk is not going to be equal to ncol(binding), which is, I'm guessing, not what you want. Here's my guess at what you need here (assuming looping is required).
yearerrors=vector("numeric", ncol(df)) #Create empty vector ahead of loop
for(i in 1:ncol(df)) {
yearerrors[i] <- RMSE2(yields[i:ncol(df)], finvec)
}
yearerrors
I honestly can't imagine a function that would work like this, but it seems the most logical adaption of your code.

Resources