I have a list of 15 files stored in an object FILELIST. The task is to read all the files from FILELIST from a particular directory and append one below other.
In below code, object called 'dataset' will have the final appended file. The issue I am facing is if one or more files present in FILELIST is not present in directory, I am getting an error as below. What I need is if 1 or more out of 15 files are not present in the directory, code should proceed appending rest of the files.
I have tried with try exception handling method, but still getting below error and the code doesn't process rest of the files.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'PREDICTION_2016_Q4_Wk13.csv': No such file or directory
Code:
for (file in FILELIST) {
try(
if (!exists("dataset")) {
dataset <- read.table(file, header=TRUE, sep=",")
}
if (exists("dataset")) {
temp_dataset <-read.table(file, header=TRUE, sep=",")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
},
silent = T
)
}
I would not use exception handling for this. Instead do something like this:
for (file in intersect(FILELIST, list.files())) {
Combination of the two other answers, using readr + dplyr for speed:
library(dplyr)
library(readr)
# existing files
f <- intersect(FILELIST, list.files())
# or identically:
# f <- intersect(FILELIST, dir())
# f <- FILELIST[ file.exists(FILELIST) ]
# combine in a single dataset
d <- bind_rows(lapply(f, read_csv))
First use file.exists and Filter to reduce FILELIST to the ones that exist and then read each one and rbind them together at the end.
Note that this works both in the situation that FILELIST contains file names from the current directory and also works if the files are located elsewhere and path/filenames are specified in FILELIST.
No packages are used.
do.call("rbind", lapply(Filter(file.exists, FILELIST), read.csv))
Update: Improved code.
Related
I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.
I am working with MODIS 8-day data and am trying to import all the txt files of one MODIS product into the R, but not as one single data.frame, as individual txt files. So I can later apply same functions on them. The main objective is to export specific elements within each txt file. I was successful in excluding the desired elements from one txt file with the following command:
# selecting the element within the table
idxs <- gsub("\\]",")", gsub("\\[", "c(", "[24,175], [47,977], [159,520], [163,530]
,[165,721], [168,56], [217,820],[243,397],[252,991],[284,277],[292,673]
,[322,775], [369,832], [396,872], [434,986],[521,563],[522,717],[604,554]
,[608,50],[614,69],[752,213],[780,535],[786,898],[788,1008],[853,1159],[1014,785],[1078,1070]") )
lst <- rbind( c(24,175), c(47,977), c(159,520), c(163,530) ,c(165,721), c(168,56), c(217,820),c(243,397),c(252,991),c(284,277),c(292,673),c(322,775), c(369,832), c(396,872), c(434,986),c(521,563),c(522,717),c(604,554),c(608,50),c(614,69),c(752,213),c(780,535),c(786,898),c(788,1008),c(853,1159),c(1014,785),c(1078,1070))
mat <- matrix(scan("lst.txt",skip = 6),nrow=1200)
Clist <- as.data.frame(mat[lst])
But I need these element from all of the txt files and honestly I do not want to run it manually for 871 times. So I try to read all the txt files and then apply this function to them. but unfortunately it does not work. here is my approach:
folder <- "C:/Users/Documents/R/MODIS/txt/"
txt_files <- list.files(path=folder, pattern=".txt")
df= c(rep(data.frame(), length(txt_files)))
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(txt_files[i]))}
and this is the error I encounter:
**Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'rastert_a2001361.txt': No such file or directory**
additional information: each txt file includes 1200rows and 1200columns and 20-30 elements need to be extracted from the table.
I am very much looking forward for your answers and appreciate any helps or recommendations with this matter.
The issue is that list.files returns only the file name within the folder, not the full path to the file. If you working direction is not "C:/Users/Documents/R/MODIS/txt/" your code could not work. Change your code to
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(file.path(folder, txt_files[i])))}
Now it should be working.
file.path combines your path and your file with correct, OS specific, path seperator.
Using Mac OS 10.10.3
RStudio Version 0.98.1103
My working directory is a list of 332 .csv files and I set it correctly. Here's the code:
pollutantmean <- function(directory, pollutant, id = 1:332) {
all_files <- list.files(directory, full.names = T)
dat <- data.frame()
for(i in id) {
dat <- rbind(dat, read.csv(all_files[i]))
}
ds <- (dat[, pollutant], na.rm = TRUE)
mean(ds[, pollutant])
}
Part of the assignment is to get the mean of the first 10 numeric values of a pollutant. To do this, I used the call function (where "spectata" is the directory with 332 .csv files):
pollutantmean(specdata, "Nitrate", 1:10)
The error messages I get are:
**Error in file(file, "rt") : cannot open the connection
** In addition: Warning message: In file(file, "rt") : cannot open file 'NA': No such file or directory
Like many students that have posed questions here, I’m new to programming and to R and still distant from getting any results when calling my function. There are many questions and answers about this coursera assignment in stack overflow but my review of these exchanges hasn't addressed the bug in my code.
Anyone have a suggestion how to fix the bug?
In addition to the other answers is you can try this:
all_files <- list.files(directory, pattern="*.csv", full.names = TRUE)
to avoid select any other kind of file.
or even this strange one
all_files <- paste(directory, "\\", sprintf("%03d", id), ".csv", sep="")
I take the time to answer since the question comes back at every Coursera session.
First, be careful with the typo : Do call pollutantmean("specdata", "Nitrate", 1:10)
instead of pollutantmean(specdata, "Nitrate", 1:10.
Then your working directory should be the parent directory of "specdata" (for exemple, if your path was /dev/specdata, your working directory should have been /dev).
You can get the current working directory with getwd() and set the new one with setwd() (careful there, the path would be relative to the current working directory).
Add a line after all_files <- list.files(directory, full.names = TRUE) (it's a bad habit to use T instead of TRUE):
print(all_files)
Then call your function again, so you will see the content of that object. Then, check where are you working with getwd().
Modify your line no. 5 to dat <- rbind(dat, read.csv(i, comment.char = ""))
This will bind the data of all csv files to 'dat' dataframe.
Based upon the information provided, it can be assumed there are not 332 files in the directory you specify (if one attempts to access an index of a vector that is out of bounds, an NA is returned - hence the error "cannot open file 'NA'"). This is suggestive that the path you are using (which is not provided) points to a directory which does not contain the csv files (presuming there truly are 332 files in that directory). Some suggestions:
Check that the directory you are providing is accurate. Simply do a list.files to see what files exist in the directory you are using.
Use the pattern argument of list.files to be sure you are only going to read the csv files
Loop over the files using the length of the vector returned from list.files, rather than having to code this manually
You can add a sanity check to be sure you are reading all files by printing out each file, or returning a list containing the results and file names
I have many data in same format in different directories and also I have one of function for processing those data.
I want to load all of my data and then process those data using my function and then store those data in CSV file.
When I use one of my data, code look like
ENFP_0719 <- f_preprocessing2("D:/DATA/output/ENFP_0719")
write.csv(ENFP_0719, "D:/DATA/output2/ENFP_0719.csv")
And everything is OK, file ENFP_0719.csv was created correctly.
But when I try to use looping, code looks like
setwd("D:/DATA/output")
file_list <- list.files()
for (file in file_list){
file <- f_preprocessing2(print(eval(sprintf("D:/DATA/output/%s",file))))
print("Storing data to csv....")
setwd("D:/DATA/output2")
write.csv(file, sprintf("%s.csv",file))
}
I got error like this
[1] "D:/DATA/output/ENFP_0719"
[1] "Storing data to csv...."
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
I've tried also to use paste paste('data', file, 'csv', sep = '.')
But I got same error. I am so confused with that error because nothing wrong with my function, I already show to you when I tried to use one data everything is ok.
So, whats wrong with my code, is it I have wrong in my loop code or in I have wrong when put parameters for write.csv.
I will wait for your light.
Thank you
I think you could make it a lot simpler by using the full.names argument to list.files and making a few other changes like this:
path = 'data/output'
file_list <- list.files('data/output', full.names=TRUE)
for (file in file_list) {
file_proc <- f_preprocessing2(file)
new_path <- gsub('output', 'output2', file)
write.csv(file_proc, new_path)
}
I have a simple function in R that runs summary() via lapply() on many CSVs from one directory that I specify. The function is shown below:
# id -- the file name (i.e. 001.csv) so ID == 001.
# directory -- location of the CSV files (not my working directory)
# summarize -- boolean val if summary of the CSV to be output to console.
getMonitor <- function(id, dir, summarize = FALSE)
{
fl <- list.files(dir, pattern = "*.csv", full.names = FALSE)
fdl <- lapply(fl, read.csv)
dataSummary <- lapply(fdl, summary)
if(summarize == TRUE)
{ dataSummary[[id]] }
}
When I try to specify the directory and then pass it as a parameter to the function like so:
dir <- "C:\\Users\\ST\\My Documents\\R\\specdata"
funcVar <- getMonitor("001", dir, FALSE)
I receive the error:
Error in file(file, "rt") : cannot open the connection. In addition: Warning message:
In file(file, "rt") : cannot open file '001.csv': No such file or directory
Yet when I run the code below on its own:
fl <- list.files("C:\\Users\\ST\\My Documents\\R\\specdata",
pattern = "*.csv",
full.names = FALSE)
fl[1]
It find the directory I'm pointing to and fl[1] correctly outputs [1] "001.csv" which is the first file listed.
My question is what am I doing wrong when trying to pass this path variable as a parameter to my function. Is R incapable of handling a parameter this way? Is there something I'm just completely missing? I've tried searching around and am familiar with other programming languages so, frankly, I feel kind of stupid/defeated for getting stuck on this right now.
You're passing fl[1] directly to read.csv with the qualifying path. If, instead, you use full.names=TRUE you'll get the full path and your read.csv step will work properly. However, you'll have to do a little munge to make your if statement function again.
You could also expand on your lapply function to paste the directory and file name together:
fdl <- lapply(fl, function(x) read.csv(paste(dir, x, sep='\\')))
Or create this pasted full path in a separate line:
fl.qualified <- paste(dir, fl, sep='\\')
fdl <- lapply(fl.qualified, read.csv)
When you do the paste step, if you want to be really explicit, I would encourage a regex to make sure you don't have someone passing a directory with a trailing slash:
fl.qualified <- paste(gsub('\\\\$', '', dir), f1, sep='\')
or something along those lines.