How do I change column names in list of data frames inside a function? - r

I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows:
# Get a group of drying data data files, remove 1st column
files <- list.files('Mang_Run1', pattern = '*.csv', full = TRUE)
mr1 <- lapply(files, read.csv, skip = 1, header = TRUE, colClasses = c("NULL", NA, NA, NA))
I will have 6 such file groups. If I run the following code on a single list, the names of the columns in each data frame within the specified list will be changed correctly.
for (i in seq_along(mr1)) {
names(mr1[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
However, if I try to generalize the function (see code below) to take any list as an argument, it does not work correctly.
nameChange <- function(ls) {
for (i in seq_along(ls)) {
names(ls[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
return(ls)
}
When I call nameChange on mr1 (list generated from above), it prints the entire contents of the list to the console and does not change the names of the columns in the data frames within the list. I'm clearly missing something fundamental about the inner workings of R here. I've tried the above function with and without return, and have made several modifications to the code, none of which have proven successful. I'd greatly appreciate any help, and would really like to understand the 'why' behind the problem as well. I've had considerable trouble in the past handling functions that take lists as arguments.
Thanks very much in advance for any constructive input.

I think this might be a very simple fix:
First, generalize the function you are using to rename the columns. This only needs to work on one dataframe at a time.
renameFunction<-function(x,someNames){
names(x) <- someNames
return(x)
}
Now we need to define the names we want to change each column name to.
someNames <- c('Date_Time', 'Temp_F', 'RH')
Then we call the new function and apply it to every element of the "mr1" list.
lapply(mr1, renameFunction, someNames)
I may have gotten some of the details wrong with regards to your exact sitiuation, but I've used this method before to solve similar issues. Since you were able to get it to work on the specific case, I'm pretty sure this will generalize readily using lapply

Related

Renaming Columns with index with a For Loop in R

I am writing this post to ask for some advice for looping code to rename columns by index.
I have a data set that has scale item columns positioned next to each other. Unfortunately, they are oddly named.
I want to re-name each column in this format: SimRac1, SimRac2, SimRac3.... and so on. I know the location of the columns (Columns number 30 to 37). I know these scale items are ordered in such a way that they can be named and numbered in increased order from left to right.
The code I currently have works, but is not efficient. There are other scales, in different locations, that also need to be renamed in a similar fashion. This would result in dozens of code rows.
See below code.
names(Total)[30] <- "SimRac1"
names(Total)[31] <- "SimRac2"
names(Total)[32] <- "SimRac3"
names(Total)[33] <- "SimRac4"
names(Total)[34] <- "SimRac5"
names(Total)[35] <- "SimRac6"
names(Total)[36] <- "SimRac7"
names(Total)[37] <- "SimRac8"
I want to loop this code so that I only have a chunk of code that does the work.
I was thinking perhaps a "for loop" would help.
Hence, the below code
for (i in Total[,30:37]){
names(Total)[i] <- "SimRac(1:8)"
}
This, unfortunately does not work. This chunk of code runs without error, but it doesn't do anything.
Do advice.
In the OP's code, "SimRac(1:8)" is a constant. To have dynamic names, use paste0.
We do not need a loop here. We can use a vectorized function to create the names, then assign the names to a subset of names(Total)
names(Total)[30:37]<-paste0('SimRac', 1:8)

Name new dataframes from character vectors - loop

I think this one is easy but I still can't figure it out and I really need help with this. I've looked everywhere but still couldn't find it.
Let's say I have this vector:
filenames <- c("fn1", "fn2", "fn3")
And I want to associate them with an dataframe that is created according to a function, that is generated at that time
df|name from filenames[i]| <- df
so it would return these dataframes
dffn1
dffn2
dffn3
I hope I made myself clear. My problem is create a new data frame and name it according to a list or whatever, in a for loop.
You can use assign to achieve what you want.
for(nms in filenames){
assign(paste('df',nms,sep=''), df) }

Executing for loop in R

I am pretty new to R and have a couple of questions about a loop I am attemping to execute. I will try explain myself as best as possible reguarding what I wish the loop to do.
for(i in (1988:1999,2000:2006)){
yearerrors=NULL
binding=do.call("rbind.fill",x[grep(names(x), pattern ="1988.* 4._ data=")])
cmeans=lapply(binding[,2:ncol(binding)],mean)
datcmeans=as.data.frame(cmeans)
finvec=datcmeans[1,]
kk=0
result=RMSE2(yields[(kk+1):(kk+ncol(binding))],finvec)
kk=kk+ncol(binding)
yearerrors=c(result)
}
yearerrors
First I wish for the loop to iterate over file names of data.
Specifically over the years 1988-2006 in the place where 1988 is
placed right now in the binding statement. x is a list of data files
inputted into R and the 1988 is part of the file name. So, I have
file names starting with 1988,1989,...,2006.
yields is a numeric vector and I would like to input the indices of
the vector into the function RMSE2 as indicated in the loop. For
example, over the first iteration I wish for the indices 1 to the
number of columns in binding to be used. Then for the next iteration
I want the first index to be 1 more than what the previous iteration
ended with and continue to a number equal to the number of columns in the next binding
statement. I just don't know if what I have written will accomplish
this.
Finally, I wish to store each of these results in the vector
yearerrors and then access this vector afterwards.
Thanks so much in advance!
OK, there's a heck of a lot of guesswork here because the structure of your data is extremely unclear, I have no idea what the RMSE2 function is (and you've given no detail). Based on your question the other day, I'm going to assume that your data is in .csv files. I'm going to have a stab at your problem.
I would start by building the combined dataframe while reading the files in, not doing one then the other. Like so:
#Set your working directory to the folder containing the .csv files
#I'm assuming they're all in the form "YEAR.something.csv" based on your pattern matching
filenames <- list.files(".", pattern="*.csv") #if you only want to match a specific year then add it to the pattern match
years <- gsub("([0-9]+).*", "\\1", filenames)
df <- mdply(filenames, read.csv)
df$year <- as.numeric(years[df$X1]) #Adds the year
#Your column mean dataframe didn't work for me
cmeans <- as.data.frame(t(colMeans(df[,2:ncol(df)])))
It then gets difficult to know what you're trying to achieve. Since your datcmeans is a one row data.frame, datcmeans[1,] doesn't change anything. So if a one row from a dataframe (or a numeric vector) is an argument required for your RMSE2 function, you can just pass it datcmeans (cmeans in my example).
Your code from then is pretty much indecipherable to me. Without know what yields looks like, or how RMSE2 works, it's pretty much impossible to help more.
If you're going to do a loop here, I'll say that setting kk=kk+ncol(binding) at the end of the first iteration is not going to help you, since you've set kk=0, kk is not going to be equal to ncol(binding), which is, I'm guessing, not what you want. Here's my guess at what you need here (assuming looping is required).
yearerrors=vector("numeric", ncol(df)) #Create empty vector ahead of loop
for(i in 1:ncol(df)) {
yearerrors[i] <- RMSE2(yields[i:ncol(df)], finvec)
}
yearerrors
I honestly can't imagine a function that would work like this, but it seems the most logical adaption of your code.

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

How to use a value that is specified in a function call as a "variable"

I am wondering if it is possible in R to use a value that is declared in a function call as a "variable" part of the function itself, similar to the functionality that is available in SAS IML.
Given something like this:
put.together <- function(suffix, numbers) {
new.suffix <<- as.data.frame(numbers)
return(new.suffix)
}
x <- c(seq(1000,1012, 1))
put.together(part.a, x)
new.part.a ##### does not exist!!
new.suffix ##### does exist
As it is written, the function returns a dataframe called new.suffix, as it should because that is what I'm asking it to do.
I would like to get a dataframe returned that is called new.part.a.
EDIT: Additional information was requested regarding the purpose of the analysis
The purpose of the question is to produce dataframes that will be sent to another function for analysis.
There exists a data bank where elements are organized into groups by number, and other people organize the groups
into a meaningful set.
Each group has an id number. I use the information supplied by others to put the groups together as they are specified.
For example, I would be given a set of id numbers like: part-1 = 102263, 102338, 202236, 302342, 902273, 102337, 402233.
So, part-1 has seven groups, each group having several elements.
I use the id numbers in a merge so that only the groups of interest are extracted from the large data bank.
The following is what I have for one set:
### all.possible.elements.bank <- .csv file from large database ###
id.part.1 <- as.data.frame(c(102263, 102338, 202236, 302342, 902273, 102337, 402233))
bank.names <- c("bank.id")
colnames(id.part.1) <- bank.names
part.sort <- matrix(seq(1,nrow(id.part.1),1))
sort.part.1 <- cbind(id.part.1, part.sort)
final.part.1 <- as.data.frame(merge(sort.part.1, all.possible.elements.bank,
by="bank.id", all.x=TRUE))
The process above is repeated many, many times.
I know that I could do this for all of the collections that I would pull together, but I thought I would be able to wrap the selection process into a function. The only things that would change would be the part numbers (part-1, part-2, etc..) and the groups that are selected out.
It is possible using the assign function (and possibly deparse and substitute), but it is strongly discouraged to do things like this. Why can't you just return the data frame and call the function like:
new.part.a <- put.together(x)
Which is the generally better approach.
If you really want to change things in the global environment then you may want a macro, see the defmacro function in the gtools package and most importantly read the document in the refrences section on the help page.
This is rarely something you should want to do... assigning to things out of the function environment can get you into all sorts of trouble.
However, you can do it using assign:
put.together <- function(suffix, numbers) {
assign(paste('new',
deparse(substitute(suffix)),
sep='.'),
as.data.frame(numbers),
envir=parent.env(environment()))
}
put.together(part.a, 1:20)
But like Greg said, its usually not necessary, and always dangerous if used incorrectly.

Resources