Looping through list of dataframes in R - r

I am relatively new to R, and have the below problem:
I have a list of dataframes in R which was generated through lapply and cbind funtion (code mentioned below - it works fine):
res<-lapply(1:35,function(i){cbind(df1[i],df2[i],df3[i])})
This has generated list of 35 dataframes each containing list[72*3](S3: data.frame)
Next what i want to do is, save each of these dataframes assigning separate names to it. The names would be specific dates retrieved from an already stored list. The below is the code for it:
for (i in 1:length(res)) {
a<-res[[i]]
for (j in as.list(Date.table)){
newname<-paste(j)
d<-data.frame(a)
names(d)<-c("RIC","MV","BVMV")
assign(newname,d)
}
}
While 35 dataframes are being generated with different dates, the data in all these dataframes is the same i.e. of the last dataframe.
Could somebody please point out the error in the code to resolve this. It is essentially not saving each dataframe but saving only the last one.
Many thanks!!!

Related

Accessing individual dataframes from a split function in R

I'm new to R am trying reorganise my data based on the sampleID
I've used the split() function in R which has does exactly what I wanted it to and stored my information in new data.frames
My question is now that they are in separate data.frames how do I access them individually for further processing?
My code goes as follows
splitList.list = list()
for (i in 1:31)
{
splitList.list[[i]] = split(chromList.list[[i]], chromList.list[[i]]$sampleID)
}
splitList.list[[1]]
I take the files I have (31 files), split them and store them in a list. This much works. I get an output that looks like this
This can be repeated with any of list elements and work. I now what to do some processing on separately on each data.frame but don't know how to access just one of these. Please help

Losing data frame cells in foreach loop

similar questions have been posted but I can't find one that actually addresses the problem i'm having, so sorry if this is not distinct enough.
I'm processing a for loop in parallel using doParallel and foreach. The core of my code is:
combinedOut <- foreach(i = 1:48, .combine=rbind) %dopar%
{
##function that builds a data frame row with 6 columns, adding different columns seperately
##data frame is called out18
out18[i,]
}
When I run this is as a for loop my output (out18) is correct, and in this form.
However when I run it as a foreach, only the first and last column contain the right values (referring to combinedOut here). I have no idea why its only the middle four columns that are empty.
Essentially I want to copy the entire ith row of every foreach iteration and combine them all into one data frame at the end.
Thanks for any responses.

Change a date column in multiple data frames with one function

I know there are several questions regarding the "apply one function to multiple data frames"-issue. However, I coundn't find a solution to my problem but I think I got close to it using a solution from this question:
Same function over multiple data frames in R
I have 12 data frames with 4 columns each. The second one contains the data as an integer (e.g. 20161014, so %Y%m%d).
To get it into 2016-10-14 I used
TX_SOUID100758.txt[,2]<-as.Date(as.character(TX_SOUID100758.txt[,2]), "%Y%m%d")
Since I want to apply this function on all 15 data frames I tried
zch_filelist <- list.files(path=path, pattern="*.txt")
for (file in zch_filelist){
assign(file, read.csv(paste(path, file, sep=''),na.strings = -9999))
}
lapply(zch_filelist, function(x) (as.Date(as.character(x[2]), "%Y%m%d")))
I used the previously created list of file names when I imported the files into R.
However, it is not working. I guess the mistake is the indexing in the as.date function.
Any help is greatly appreciated.
Thanks!

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

nested for loops in R to parse csv files?

Edit: I've corrected the typo in the coding (copy and paste error). I can't add an example of the csv files, as its too complex to model in a simple example (I tried..)
I've spent hours looking through similarly titled questions to solve a for loop problem in R, and have tried a lot of different approaches, but I'm having no luck.
I have many different csv files, each of which has a set of 10 separate strings (variables) identifying a specific row (e.g., names = c("Delta values", "Scream factor", "nightmare mode"). Two rows below such a string, I need the max value of that row of data. I can create loops scanning files for such a value in single csv files using the following
test files-
test1.csv, test2.csv, test3.csv test4.csv
names<-list.files(pattern=".csv")
DF <- NULL
for (i in names){
dat <- read.csv(i, header=FALSE, stringsAsFactors=FALSE)
index <- which(dat=="Delta values", arr.ind=TRUE)
row=as.numeric(rownames(dat)[index[1]])
aver=dat[row+2,]
p=max(na.omit(as.numeric(aver)))
DF=rbind(DF, p)
colnames(DF)=dat[index]}
However, my problem comes in trying to generalize it, so that I get a data frame returned indicating the file each value was retrieved from as a row (not "p") and looping over the files so that I can retrieve the next several variables, while appending to the same data frame so that I end up with a data frame listing by row the filename the variable was derived from, and each variable listed in a separate column.
I'm pretty sure I need a nested loop listing the values I want to retrieve as calculated by "p" but I can't find any good examples describing how to iteratively loop using such an approach, and append the new variables to the growing data frame while staying consistent with the row numbering by file.
please help!

Resources