I have a script that takes raw csv files in a folder, transforms the data in a method described in a function(filename) called "analyze", and spits out values into the console. When I attempt to write.csv these values, it only gives the last value of the function. IF there was a set amount of files per folder I would just do each specific csv file through the program, say [1:5], and lapply/set a matrix into write.csv. However, there is a potential for an infinite amount of files drawn from the directory, so this will not work (I think?). How would I export potentially infinite function outputs to a csv file? I have listed below my final steps after the function definition. It lists all the files in the folder and applys the function "anaylze" to all the files in the folder.
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
for (f in filename) {
print(f)
analyze(f)
}
Best,
Evan
It's hard to tell without a reproducible example, but I think you have assign the output of analyze to a vector or a dataframe (instead of spitting it out to the console).
Something along these lines:
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
results <- vector() #empty vector
for (f in filename) {
print(f)
results[which(filename==f)] <- analyze(f) #assign output vector
}
write.csv(results, file=xxx) #write csv file when loop is finished
I hope this answers your question, but it really depends on the format of the output of the analyze function.
Related
I've used a lot of posts to get me this far (such as here R list files with multiple conditions and here How can I read multiple files from multiple directories into R for processing? but can't accomplish what I need in R.
I have many .csv files distributed in multiple subdirectories that I want to read in and then save as separate objects to the corresponding basename. The end result will be to rbind each of those files together. Here's sample dir structure and some of what I've tried:
./DATA/Cat_Animal/animal1.csv
./DATA/Dog_Animal/animal2.csv
./DATA/Dog_Animal/animal3.csv
./DATA/Dog_Animal/animal3.1.csv
#read in all csv files
files <- list.files(path="./DATA", pattern="*.csv", full.names=TRUE, recursive=TRUE)
But this results in all files in all subdirectories. I want to match specific files (animalsX.csv) in specific subdirectories matching the pattern (X_Animal) such as this:
files <- dir(path=paste0("./DATA/", pattern="*+_Animal"), recursive=TRUE, full.names=TRUE, pattern="animal+.*csv")
Once I get my list of files, I want to read each of them in and save each to the corresponding file's basename. So the file named animal1.csv
would be saved to animal1. I think I need to use the function basename() somewhere in a loop but not sure how.
Help very much appreciated I've spent a lot of time trying out various options with little progress.
This question is really two questions, consider splitting them up. On the last part of your question, how to rbind a list full of data.frames together try:
finalDf = do.call(rbind, result)
You'll likely need to use str_split() from the stringr package to extract the parts of the file path you need. You could also use str_extract() regular expressions.
I think I found a work-around for the short term because luckily I only have a few subdirectories currently.
myFiles1 <- list.files(path = "./DATA/Cat_Animal/", pattern="animal+.*csv")
processFile <- function(f) {
df <- read.csv(file = paste0("./DATA/Cat_Animal/", f ))
}
result1 <- sapply(myFiles1, processFile)
#then do it again for the next subdir:
myFiles2 <- list.files(path = "./DATA/Dog_Animal/", pattern="animal+.*csv")
processFile <- function(f) {
df <- read.csv(file = paste0("./DATA/Dog_Animal/", f ))
}
result2 <- sapply(myFiles2, processFile)
finalDf = do.call(rbind, result1, result2)
I know there is a better way but can't figure out the pattern matching for the subdirectories! It's so easy in unix for example
You can simply do it two times.
a <- list.files(path="./DATA", pattern="*_Animal", full.names=T, recursive=F)
a
#[1] "./DATA/Cat_Animal" "./DATA/Dog_Animal"
files <- list.files(path=a, pattern="*animal*", full.names=T)
files
#[1] "./DATA/Cat_Animal/animal1.txt" "./DATA/Dog_Animal/animal2.txt" #"./DATA/Dog_Animal/animal3.txt"
#[4] "./DATA/Dog_Animal/animal4.txt"
In the first step, please make sure to use full.names = T and recursive = F. You need full.names = T to get the file path not just file name, otherwise you might lose path to animal*.csv in the second step. And recursive = T would return nothing since Dog_Animal and Cat_Animal are folders not files.
I would like to use a R Script that i wrote on Multiple Folders that include a csv file and a text file.
The function i wrote takes the csv and the text file and calculates a vector.
So basically the code i need would open every folder, take the csv file and the text file and would calculate me the fitting vectors.
I thought about using list.files to get a list with the names of all folders and then use lapply to apply the function on every folder, but i dont know how to define the read.csv and read.table.
setwd("C:\\WD")
ptf = "C:\\PathtoFiles"
temp = list.files(path = ptf)
lapply(temp, exfunction)
exfunction = function() {
csvfile = read.csv("nameofile.csv")
textfile = read.table("nameoffile.txt", header=TRUE)
calcvec = vector(mode = "numeric", length = length(textfile))
#Code that calculates the vector
return(calcvec)
}
I am trying to loop through all the subfolders of my wd, list their names, open 'data.csv' in each of them and extract the second and last value from that csv file.
The df would look like this :
Name_folder_1 2nd value Last value
Name_folder_2 2nd value Last value
Name_folder_3 2nd value Last value
For now, I managed to list the subfolders and each of the file (thanks to this thread: read multiple text files from multiple folders) but I struggle to implement (what I'm guessing should be) a nested loop to read and extract data from the csv files.
parent.folder <- "C:/Users/Desktop/test"
setwd(parent.folder)
sub.folders1 <- list.dirs(parent.folder, recursive = FALSE)
r.scripts <- file.path(sub.folders1)
files.v <- list()
for (j in seq_along(r.scripts)) {
files.v[j] <- dir(r.scripts[j],"data$")
}
Any hints would be greatly appreciated !
EDIT :
I'm trying the solution detailed below but there must be something I'm missing as it runs smoothly but does not produce anything. It might be something very silly, I'm new to R and the learning curve is making me dizzy :p
lapply(files, function(f) {
dat <- fread(f) # faster
dat2 <- c(basename(dirname(f)), head(dat$time, 1), tail(dat$time, 1))
write.csv(dat2, file = "test.csv")
})
Not easy to reproduce but here is my suggestion:
library(data.table)
files <- list.files("PARENTDIR", full.names = T, recursive = T, pattern = ".*.csv")
lapply(files, function(f) {
dat <- fread(f) # faster
# Do whatever, get the subfolder name for example
basename(dirname(f))
})
You can simply look recursivly for all CSV files in your parent directory and still get their corresponding parent folder.
If i have multiple csv files stored as:
m1.csv, m2.csv,.....,m50.csv and what I would like to do is load each csv into R, run the data in the i-th file and store it as a variable: m'i'. I am trying to use a for loop but i'm not sure if i can quite use them in such a way. For example:
for (i in 1:100){
A<-as.matrix(read.csv("c:/Users/Desktop/m"i".csv))
...
#some analysis on A
...
m"i"<- #result of analysis on A
}
V<-cbind(m1,m2, .... ,m100)
Try this
filenames = list.files(getwd())
filenames = filenames[grepl(".csv",file_names)]
files = lapply(filenames, read.csv)
files = do.call(rbind,files)
I would like to execute anova on multiple datasets stored in my working directory. I have come up so far with:
files <- list.files(pattern = ".csv")
for (i in seq_along(files)) {
mydataset.i <- files[i]
AnovaModel.1 <- aov(DES ~ DOSE, data=mydataset.i)
summary(AnovaModel.1)
}
As you can see I am very new to loops and cannot make this work. I also understand that I need to add a code to append all summary outputs in one file. I would appreciate any help you can provide to guide to the working loop that can execute anovas on multiple .csv files in the directory (same headers) and produce outputs for the record.
you might want to use list.files with full.names = TRUE in case you are not on the same path.
files <- list.files("path_to_my_dir", pattern="*.csv", full.names = T)
# use lapply to loop over all files
out <- lapply(1:length(files), function(idx) {
# read the file
this.data <- read.csv(files[idx], header = TRUE) # choose TRUE/FALSE accordingly
aov.mod <- aov(DES ~ DOSE, data = this.data)
# if you want just the summary as object of summary.aov class
summary(aov.mod)
# if you require it as a matrix, comment the previous line and uncomment the one below
# as.matrix(summary(aov.mod)[[1]])
})
head(out)
This should give you a list with each entry of the list having a summary matrix in the same order as the input file list.
Your error is that your loop is not loading your data. Your list of file names is in "files" then you start moving through that list and set mydataset.i equal to the name of the file that matches your itterator i... but then you try to run aov on the file name that is stored in mydataset.i!
The command you are looking for to redirect your output to a file is sink. Consider the following:
sink("FileOfResults.txt") #starting the redirect to the file
files <- list.files("path_to_my_dir", pattern="*.csv", full.names = T) #using the fuller code from Arun
for (i in seq_along(files)){
mydataset.i <- files[i]
mydataset.d <- read.csv(mydataset.i) #this line is new
AnovaModel.1 <- aov(DES ~ DOSE, data=mydataset.d) #this line is modified
print(summary(AnovaModel.1))
}
sink() #ending the redirect to the file
I prefer this approach to Arun's because the results are stored directly to the file without jumping through a list and then having to figure out how to store the list to a file in a readable fashion.