Accessing individual dataframes from a split function in R - r

I'm new to R am trying reorganise my data based on the sampleID
I've used the split() function in R which has does exactly what I wanted it to and stored my information in new data.frames
My question is now that they are in separate data.frames how do I access them individually for further processing?
My code goes as follows
splitList.list = list()
for (i in 1:31)
{
splitList.list[[i]] = split(chromList.list[[i]], chromList.list[[i]]$sampleID)
}
splitList.list[[1]]
I take the files I have (31 files), split them and store them in a list. This much works. I get an output that looks like this
This can be repeated with any of list elements and work. I now what to do some processing on separately on each data.frame but don't know how to access just one of these. Please help

Related

Looping through list of dataframes in R

I am relatively new to R, and have the below problem:
I have a list of dataframes in R which was generated through lapply and cbind funtion (code mentioned below - it works fine):
res<-lapply(1:35,function(i){cbind(df1[i],df2[i],df3[i])})
This has generated list of 35 dataframes each containing list[72*3](S3: data.frame)
Next what i want to do is, save each of these dataframes assigning separate names to it. The names would be specific dates retrieved from an already stored list. The below is the code for it:
for (i in 1:length(res)) {
a<-res[[i]]
for (j in as.list(Date.table)){
newname<-paste(j)
d<-data.frame(a)
names(d)<-c("RIC","MV","BVMV")
assign(newname,d)
}
}
While 35 dataframes are being generated with different dates, the data in all these dataframes is the same i.e. of the last dataframe.
Could somebody please point out the error in the code to resolve this. It is essentially not saving each dataframe but saving only the last one.
Many thanks!!!

How do I change column names in list of data frames inside a function?

I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows:
# Get a group of drying data data files, remove 1st column
files <- list.files('Mang_Run1', pattern = '*.csv', full = TRUE)
mr1 <- lapply(files, read.csv, skip = 1, header = TRUE, colClasses = c("NULL", NA, NA, NA))
I will have 6 such file groups. If I run the following code on a single list, the names of the columns in each data frame within the specified list will be changed correctly.
for (i in seq_along(mr1)) {
names(mr1[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
However, if I try to generalize the function (see code below) to take any list as an argument, it does not work correctly.
nameChange <- function(ls) {
for (i in seq_along(ls)) {
names(ls[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
return(ls)
}
When I call nameChange on mr1 (list generated from above), it prints the entire contents of the list to the console and does not change the names of the columns in the data frames within the list. I'm clearly missing something fundamental about the inner workings of R here. I've tried the above function with and without return, and have made several modifications to the code, none of which have proven successful. I'd greatly appreciate any help, and would really like to understand the 'why' behind the problem as well. I've had considerable trouble in the past handling functions that take lists as arguments.
Thanks very much in advance for any constructive input.
I think this might be a very simple fix:
First, generalize the function you are using to rename the columns. This only needs to work on one dataframe at a time.
renameFunction<-function(x,someNames){
names(x) <- someNames
return(x)
}
Now we need to define the names we want to change each column name to.
someNames <- c('Date_Time', 'Temp_F', 'RH')
Then we call the new function and apply it to every element of the "mr1" list.
lapply(mr1, renameFunction, someNames)
I may have gotten some of the details wrong with regards to your exact sitiuation, but I've used this method before to solve similar issues. Since you were able to get it to work on the specific case, I'm pretty sure this will generalize readily using lapply

Assign names to existing data frames with a For look

I have the following data frames. These already exist and they are identical but with different contents and row count. I want to assign dataframe FX_nyear to Astar in each iteration.
FX_3year
FX_4year
FX_5year
...
and I want to run some complex analysis etc. I do not want to use lapply. Just a simple For loop as shown below:
for(n in 3:n)
{ Astar = assign(paste("FX_",n,"year",sep="")) }
While I can get Astar named to "FX_3year" using only paste, I am having trouble setting Astar to the actual pre-existing data frame FX_3year.
I know this is a very basic question and variants of this have been asked in the past, but I cannot get it to work.
You can use get()
for(i in 3:n){
assign("Astar",get(paste("FX_",i,"year",sep="")))
}

How to make loops in R that operate on and return multiple objects

This is my first post, and I think I have looked thoroughly for my answer with no luck, but I might not be typing in the right search terms, since I am relatively new to R. I apologize if this has been answered before and if it has a link would be greatly appreciated.
In essence, I am trying to make a loop that will operate on a set of data frames that I have read into R from .txt files using read.table. I am working with simulated vegetation data organized into many species by site matrices, so it would be best for me if I could create loops that will just operate on the objects I have read in using some functions I have made and then put out new objects into my workspace with a specific naming pattern (e.g. put "_av" on the end of the name of the object operated on when creating a new object).
for convenience sake, lets say I have only four matrices I want to work with, all which contain the phrase "mod" for model. I have read that I can put these data frames into a list of data frames by the following code:
list.mods=lapply(ls(pattern="mod"),get)
This does create a list which I have been having trouble on getting my functions to actually operate on. From what I read this is the best way to make a list of objects you want to operate on.
So lets say that list.mods is now my list of operable matrices - mod1, mod2, mod3, and mod4. Also, lets say I have a function that simply calculates Bray-Curtis dissimilarity as follows:
bc=function(x){
vegdist(x,method="bray")
}
I can use this by typing in:
mod1.bc=bc(mod1)
That works. But it seems like I should be able to apply my list of models to the function bc and have it output the models with a pattern mod1.bc, mod2.bc, mod3.bc, and mod4.bc. I cannot get my list of files to work in the function much less save each operation as a new object with a patterned name.
What am I doing wrong? In the end I might have as many as a hundred models or more and would really appreciate being able to create a list of items that I can run through loops.
Thanks in advance.
You can use lapply again:
new.list.mods <- lapply(list.mods, bc)
This will return a new list in which each element is the result of applying bc to the corresponding element of list.mods.
The 'apply' family of functions in R basically allows you to save typing. If that's easier for you to understand, you can use a 'for loop' instead. Of course you will need to know how to access elements in a list for that. There is a question about that.
How about collecting the names of the models/objects you want into a list:
mod_list <- sapply(ls(pattern = "mod"), as.name)
and then looping over them with your function:
output_list <- lapply(eval(mod_list), bc)
With this approach you avoid creating the potentially large and redundant list.mods object in your example. Also, I think this will result in conveniently named lists.

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

Resources