Prompting user for multiple input files in R - r

I'm trying to do something I think should be straight forward enough, but so far I've been unable to figure it out (not surprisingly I'm a noob)...
I would like to be able to prompt a user for input file(s) in R. I've successfully used file.choose() to get a single file, but I would like to have the option of selecting more than one file at a time.
I'm trying to write a program that sucks in daily data files, with the same header and appends them into one large monthly file. I can do it in the console by importing the files individually, and then using rbind(file1, file2,...) but I need a script to automate the process. The number of files to append will not necessarily be constant between runs.
Thanks
Update: Here the code I came up that works for me, maybe it will be helpful to someone else as well
library (tcltk)
File.names <- tk_choose.files() #Prompts user for files to be combined
Num.Files <-NROW(File.names) # Gets number of files selected by user
# Create one large file by combining all files
Combined.file <- read.delim(File.names [1], header=TRUE, skip=2) #read in first file of list selected by user
for(i in 2:Num.Files){
temp <- read.delim(File.names [i], header=TRUE, skip=2) #temporary file reads in next file
Combined.file <-rbind(Combined.file, temp) #appends Combined file with the last file read in
i<-i+1
}
output.dir <- dirname(File.names [1]) #Finds directory of the files that were selected
setwd(output.dir) #Changes directory so output file is in same directory as input files
output <-readline(prompt = "Output Filename: ") #Prompts user for output file name
outfile.name <- paste(output, ".txt", sep="", collapse=NULL)
write.table(Combined.file, file= outfile.name, sep= "\t", col.names = TRUE, row.names=FALSE)` #write tab delimited text file in same dir that original files are in

Have you tried ?choose.files
Use a Windows file dialog to choose a list of zero or more files interactively.

If you are willing to type each file name, why not just loop over all the files like this:
filenames <- c("file1", "file2", "file3")
filecontents <- lapply(filenames, function(fname) {<insert code for reading file here>})
bigfile <- do.call(rbind, filecontents)
If your code must be interactive, you can use the readline function in a loop that will stop asking for more files when the user inputs an empty line:
getFilenames <- function() {
filenames <- list()
x <- readline("Filename: ")
while (x != "") {
filenames <- append(filenames, x)
x <- readline("Filename: ")
}
filenames
}

Related

Skip empty files inside zip files

I am reading a lot of .csv files inside a .zip file with the following code
for (i in unzip("data.zip", list = TRUE)) {
read.csv(unz("data.zip", i))
}
The problem is that some of .csv files are empty that leads to no lines available in input error that causes the execution of the loop be interrupted. How can I skip those empty files?
Try this
flist <- unzip("data.zip", list=TRUE)
Now flist$Length gives you the length of each file, so e.g.
keep <- flist$Length > 100 # or some other value that indicates the file has no data
Now you can read the nonempty ones and save them to a list:
AllFiles <- lapply(flist$Name[keep], read.csv)

Rstudio Automatically use input file name in read.csv when using write.csv & two headers

Currently I use a script to edit columns of a dataset.
I click run on Rmarkdown, and my first line of code is
Data <- read.csv(file.choose(), sep = "," ,header = T , skip = 2)
This skips the first 2 lines and gives the third line a header for the file that I select after clicking run. When the script finishes, the last line of code is
write.csv(Data, "FileName.csv", row.names=FALSE)
This removes all the row names that were given numerical values on the left, and saving a FileName.csv in my working directory.
My question is if I do a read.csv of a certain file that I pick, for example, the file name is "FileName.csv", is there a way to use that name that I picked the same name as the file that I use
write.csv
and it would give out the name FileName on my working directory without manually writing it. Also is there a way to add back the other first 2 lines that I skipped when doing write.csv
You can capture the filename from file.choose and save the skipped lines to write out later.
## Capture file name
FileName = file.choose()
# Capture skipped header lines
IN=file(FileName, open="r")
Header=readLines(IN, 2)
Input <- read.csv(IN) # No need to skip lines now
close(IN)
## Whatever processing
## Output
OUT = file(FileName ,open="w")
writeLines(Header, OUT)
write.csv(Input, OUT, row.names=FALSE)
close(OUT)
You can save the filepath/filename into a variable and use that variable in both read.csv and write.csv:
myfile <- file.choose()
data <- read.csv(file=myfile, ...)
... lots of code...
write.csv(data, file=myfile)
I'd comment if I could as I am not sure this is a full answer, but I guess you could just use:
Data <- read.csv(file.choose(), skip=2))
FileName <- basename(file.choose())
write.csv(Data, FileName, row.names=FALSE)

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

load new files in directory

I have a R script to load multiple text files in a directory and save the data as compressed .rda. It looks like this,
#!/usr/bin/Rscript --vanilla
args <- commandArgs(TRUE)
## arg[1] is the folder name
outname <- paste(args[1], ".rda", sep="")
files <- list.files(path=args[1], pattern=".txt", full=TRUE)
tmp <- list()
if(file.exists(outname)){
message("found ", outname)
load(outname)
tmp <- get(args[1]) # previously read stuff
files <- setdiff(files, names(tmp))
}
if(is.null(files))
message("no new files") else {
## read the files into a list of matrices
results <- plyr::llply(files, read.table, .progress="text")
names(results) <- files
assign(args[1], c(tmp, results))
message("now saving... ", args[1])
save(list=args[1], file=outname)
}
message("all done!")
The files are quite large (15Mb each, 50 of them typically), so running this script takes up to a few minutes typically, a substantial part of which is taken writing the .rda results.
I often update the directory with new data files, therefore I would like to append them to the previously saved and compressed data. This is what I do above by checking if there's already an output file with that name. The last step is still pretty slow, saving the .rda file.
Is there a smarter way to go about this in some package, keeping a trace of which files have been read, and saving this faster?
I saw that knitr uses tools:::makeLazyLoadDB to save its cached computations, but this function is not documented so I'm not sure where it makes sense to use it.
For intermediate files that I need to read (or write) often, I use
save (..., compress = FALSE)
which speeds up things considerably.

How to get rid of file extensions (.CSV) in a file name that is generated by R

I have a series of .csv files in my working folder and I wrote a code to get them all, do everything I want to do with them and, in the end, write the result in another file adding "_pp" to the original file name:
random <- grep(".csv",list.files(), fixed=TRUE)
files <- list.files()[random]
for (igau in 1:length(files))
{
(.......)
file <- paste("H:/METEO_data/AEMET_2/",files[igau],"_pp.csv",sep="")
write.table(d,file,row.names=TRUE, col.names=NA, sep=" ")
}
the problem is that I get "3059.csv_pp.csv" when what I wanted was "3059_pp.csv". Is there a way of taking the first .csv out?
thanks
Your first two lines can be simplified to one list.files call that uses the pattern argument. Then you can change the output file name using gsub.
files <- list.files(pattern=".csv")
for(i in 1:length(files)) {
outFile <- file.path("H:/METEO_data/AEMET_2",
gsub(".csv", "_pp.csv", files[igau]))
write.table(d, outFile, row.names=TRUE, col.names=NA, sep=" ")
}
You could also loop over the elements in files, but that assumes you don't need the igau index for anything else. And in order to potentially avoid confusing yourself in the future, you may want to avoid using file for variable names because it's base package function that opens a connection to a file.
for(File in files) {
outFile <- file.path("H:/METEO_data/AEMET_2",
gsub(".csv", "_pp.csv", File))
write.table(d, outFile, row.names=TRUE, col.names=NA, sep=" ")
}
The problem is that files[igau] contains the .csv extension. You'll have to do something like this:
basefile <- strsplit(files[igau], ".")
file <- paste("H:/METEO_data/AEMET_2/",basefile[0],"_pp.csv",sep="")
basefile[0] will contain everything before the first .. This means that this code will break if you have filenames with dots in them (i.e. 3059.2.csv). If this is the case, then you'll have to paste() together everything in basefile except for the last element, which will be the csv that you're trying to get rid of.

Resources