Storing file in CSV format in R looping - r

I have many data in same format in different directories and also I have one of function for processing those data.
I want to load all of my data and then process those data using my function and then store those data in CSV file.
When I use one of my data, code look like
ENFP_0719 <- f_preprocessing2("D:/DATA/output/ENFP_0719")
write.csv(ENFP_0719, "D:/DATA/output2/ENFP_0719.csv")
And everything is OK, file ENFP_0719.csv was created correctly.
But when I try to use looping, code looks like
setwd("D:/DATA/output")
file_list <- list.files()
for (file in file_list){
file <- f_preprocessing2(print(eval(sprintf("D:/DATA/output/%s",file))))
print("Storing data to csv....")
setwd("D:/DATA/output2")
write.csv(file, sprintf("%s.csv",file))
}
I got error like this
[1] "D:/DATA/output/ENFP_0719"
[1] "Storing data to csv...."
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
I've tried also to use paste paste('data', file, 'csv', sep = '.')
But I got same error. I am so confused with that error because nothing wrong with my function, I already show to you when I tried to use one data everything is ok.
So, whats wrong with my code, is it I have wrong in my loop code or in I have wrong when put parameters for write.csv.
I will wait for your light.
Thank you

I think you could make it a lot simpler by using the full.names argument to list.files and making a few other changes like this:
path = 'data/output'
file_list <- list.files('data/output', full.names=TRUE)
for (file in file_list) {
file_proc <- f_preprocessing2(file)
new_path <- gsub('output', 'output2', file)
write.csv(file_proc, new_path)
}

Related

How to convert and export a list of xlsx files to csv files in R

I'm attempting to convert a large number of .xlsx files to .csv, while also specifying a new folder or directory for them to be placed in. Specifically, I want to create a new folder in my working directory to house the newly-converted .csv files.
Based on previous examples, I have managed to complete the conversion portion using the following code
setwd("~/Myfolder")
files.to.read = list.files(pattern="xlsx")
lapply(files.to.read, function(f) {
df = read.xlsx(f, sheetIndex=1)
write.csv(df, gsub("xlsx", "csv", f), row.names=FALSE)})
This successfully converts all .xlsx files to .csv in my original working directory. However, what I want is to create a new subfolder within that directory and place those .csv files in it. I know the answer likely involves adding either
dir.create()
or
file.path() to the write.csv() command. However, when I use either of them, I get the following error.
Error in file(file, ifelse(append, "a", "w")) : invalid 'open' argument
It's hard to know without a reproducible example. What happens if you try to do read.xlsx(files.to.read[1], sheetIndex=1)?
If that works, you are quite close.
dir.create("your_folder_name")
files.to.read = list.files(pattern="xlsx")
lapply(files.to.read, function(f) {
df = read.xlsx(f, sheetIndex=1)
# Make the new filename here
new_filename = file.path(getwd(), "your_folder_name", gsub("xlsx", "csv", f))
write.csv(df, new_filename , row.names=FALSE)
# provide some feedback
print(paste("Writing", new_filename))
}
)
It might be that your list.files() command is having trouble.
If the previous fails, Try:
# Mind the full.names=TRUE to get the full path
files.to.read = list.files(pattern="xlsx", full.names=TRUE)
And get rid of the new_filename line. You won't need to create it via file.path, just use the gsub command as you were doing.

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

R: Exception handling with try()

I have a list of 15 files stored in an object FILELIST. The task is to read all the files from FILELIST from a particular directory and append one below other.
In below code, object called 'dataset' will have the final appended file. The issue I am facing is if one or more files present in FILELIST is not present in directory, I am getting an error as below. What I need is if 1 or more out of 15 files are not present in the directory, code should proceed appending rest of the files.
I have tried with try exception handling method, but still getting below error and the code doesn't process rest of the files.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'PREDICTION_2016_Q4_Wk13.csv': No such file or directory
Code:
for (file in FILELIST) {
try(
if (!exists("dataset")) {
dataset <- read.table(file, header=TRUE, sep=",")
}
if (exists("dataset")) {
temp_dataset <-read.table(file, header=TRUE, sep=",")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
},
silent = T
)
}
I would not use exception handling for this. Instead do something like this:
for (file in intersect(FILELIST, list.files())) {
Combination of the two other answers, using readr + dplyr for speed:
library(dplyr)
library(readr)
# existing files
f <- intersect(FILELIST, list.files())
# or identically:
# f <- intersect(FILELIST, dir())
# f <- FILELIST[ file.exists(FILELIST) ]
# combine in a single dataset
d <- bind_rows(lapply(f, read_csv))
First use file.exists and Filter to reduce FILELIST to the ones that exist and then read each one and rbind them together at the end.
Note that this works both in the situation that FILELIST contains file names from the current directory and also works if the files are located elsewhere and path/filenames are specified in FILELIST.
No packages are used.
do.call("rbind", lapply(Filter(file.exists, FILELIST), read.csv))
Update: Improved code.

import multiple txt files into R

I am working with MODIS 8-day data and am trying to import all the txt files of one MODIS product into the R, but not as one single data.frame, as individual txt files. So I can later apply same functions on them. The main objective is to export specific elements within each txt file. I was successful in excluding the desired elements from one txt file with the following command:
# selecting the element within the table
idxs <- gsub("\\]",")", gsub("\\[", "c(", "[24,175], [47,977], [159,520], [163,530]
,[165,721], [168,56], [217,820],[243,397],[252,991],[284,277],[292,673]
,[322,775], [369,832], [396,872], [434,986],[521,563],[522,717],[604,554]
,[608,50],[614,69],[752,213],[780,535],[786,898],[788,1008],[853,1159],[1014,785],[1078,1070]") )
lst <- rbind( c(24,175), c(47,977), c(159,520), c(163,530) ,c(165,721), c(168,56), c(217,820),c(243,397),c(252,991),c(284,277),c(292,673),c(322,775), c(369,832), c(396,872), c(434,986),c(521,563),c(522,717),c(604,554),c(608,50),c(614,69),c(752,213),c(780,535),c(786,898),c(788,1008),c(853,1159),c(1014,785),c(1078,1070))
mat <- matrix(scan("lst.txt",skip = 6),nrow=1200)
Clist <- as.data.frame(mat[lst])
But I need these element from all of the txt files and honestly I do not want to run it manually for 871 times. So I try to read all the txt files and then apply this function to them. but unfortunately it does not work. here is my approach:
folder <- "C:/Users/Documents/R/MODIS/txt/"
txt_files <- list.files(path=folder, pattern=".txt")
df= c(rep(data.frame(), length(txt_files)))
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(txt_files[i]))}
and this is the error I encounter:
**Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'rastert_a2001361.txt': No such file or directory**
additional information: each txt file includes 1200rows and 1200columns and 20-30 elements need to be extracted from the table.
I am very much looking forward for your answers and appreciate any helps or recommendations with this matter.
The issue is that list.files returns only the file name within the folder, not the full path to the file. If you working direction is not "C:/Users/Documents/R/MODIS/txt/" your code could not work. Change your code to
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(file.path(folder, txt_files[i])))}
Now it should be working.
file.path combines your path and your file with correct, OS specific, path seperator.

Passing directory path as parameter in R

I have a simple function in R that runs summary() via lapply() on many CSVs from one directory that I specify. The function is shown below:
# id -- the file name (i.e. 001.csv) so ID == 001.
# directory -- location of the CSV files (not my working directory)
# summarize -- boolean val if summary of the CSV to be output to console.
getMonitor <- function(id, dir, summarize = FALSE)
{
fl <- list.files(dir, pattern = "*.csv", full.names = FALSE)
fdl <- lapply(fl, read.csv)
dataSummary <- lapply(fdl, summary)
if(summarize == TRUE)
{ dataSummary[[id]] }
}
When I try to specify the directory and then pass it as a parameter to the function like so:
dir <- "C:\\Users\\ST\\My Documents\\R\\specdata"
funcVar <- getMonitor("001", dir, FALSE)
I receive the error:
Error in file(file, "rt") : cannot open the connection. In addition: Warning message:
In file(file, "rt") : cannot open file '001.csv': No such file or directory
Yet when I run the code below on its own:
fl <- list.files("C:\\Users\\ST\\My Documents\\R\\specdata",
pattern = "*.csv",
full.names = FALSE)
fl[1]
It find the directory I'm pointing to and fl[1] correctly outputs [1] "001.csv" which is the first file listed.
My question is what am I doing wrong when trying to pass this path variable as a parameter to my function. Is R incapable of handling a parameter this way? Is there something I'm just completely missing? I've tried searching around and am familiar with other programming languages so, frankly, I feel kind of stupid/defeated for getting stuck on this right now.
You're passing fl[1] directly to read.csv with the qualifying path. If, instead, you use full.names=TRUE you'll get the full path and your read.csv step will work properly. However, you'll have to do a little munge to make your if statement function again.
You could also expand on your lapply function to paste the directory and file name together:
fdl <- lapply(fl, function(x) read.csv(paste(dir, x, sep='\\')))
Or create this pasted full path in a separate line:
fl.qualified <- paste(dir, fl, sep='\\')
fdl <- lapply(fl.qualified, read.csv)
When you do the paste step, if you want to be really explicit, I would encourage a regex to make sure you don't have someone passing a directory with a trailing slash:
fl.qualified <- paste(gsub('\\\\$', '', dir), f1, sep='\')
or something along those lines.

Resources