Passing directory path as parameter in R - r

I have a simple function in R that runs summary() via lapply() on many CSVs from one directory that I specify. The function is shown below:
# id -- the file name (i.e. 001.csv) so ID == 001.
# directory -- location of the CSV files (not my working directory)
# summarize -- boolean val if summary of the CSV to be output to console.
getMonitor <- function(id, dir, summarize = FALSE)
{
fl <- list.files(dir, pattern = "*.csv", full.names = FALSE)
fdl <- lapply(fl, read.csv)
dataSummary <- lapply(fdl, summary)
if(summarize == TRUE)
{ dataSummary[[id]] }
}
When I try to specify the directory and then pass it as a parameter to the function like so:
dir <- "C:\\Users\\ST\\My Documents\\R\\specdata"
funcVar <- getMonitor("001", dir, FALSE)
I receive the error:
Error in file(file, "rt") : cannot open the connection. In addition: Warning message:
In file(file, "rt") : cannot open file '001.csv': No such file or directory
Yet when I run the code below on its own:
fl <- list.files("C:\\Users\\ST\\My Documents\\R\\specdata",
pattern = "*.csv",
full.names = FALSE)
fl[1]
It find the directory I'm pointing to and fl[1] correctly outputs [1] "001.csv" which is the first file listed.
My question is what am I doing wrong when trying to pass this path variable as a parameter to my function. Is R incapable of handling a parameter this way? Is there something I'm just completely missing? I've tried searching around and am familiar with other programming languages so, frankly, I feel kind of stupid/defeated for getting stuck on this right now.

You're passing fl[1] directly to read.csv with the qualifying path. If, instead, you use full.names=TRUE you'll get the full path and your read.csv step will work properly. However, you'll have to do a little munge to make your if statement function again.
You could also expand on your lapply function to paste the directory and file name together:
fdl <- lapply(fl, function(x) read.csv(paste(dir, x, sep='\\')))
Or create this pasted full path in a separate line:
fl.qualified <- paste(dir, fl, sep='\\')
fdl <- lapply(fl.qualified, read.csv)
When you do the paste step, if you want to be really explicit, I would encourage a regex to make sure you don't have someone passing a directory with a trailing slash:
fl.qualified <- paste(gsub('\\\\$', '', dir), f1, sep='\')
or something along those lines.

Related

Error in file(file, "rt") : cannot open the connection in r even after setting right path

loading required library
install.packages("tidyverse")
install.packages("dplyr")
library("tidyverse")
library("dplyr")
vector for file names
files<- list.files(path = "C:/Users/91932/Downloads/archive (2)/Fitabase Data 4.12.16-5.12.16",pattern =".csv")
concat directory to file names
files<-str_c("C:/Users/91932/Downloads/archive (2)/Fitabase Data 4.12.16-5.12.16",files)
applying function to each element of vector
> map_df(.x = files, .f = read.csv,)
in above function map_df() i am getting an error in file (file,"rt"), even after using setwd() to set path and checking using getwd() paths seems to be correct for .csv file .
why this error occurred? how to avoid such errors?
The error is because the path is wrong: you forgot the trailing slash in the path prefix in str_c. However, rather than using str_c, you can instruct list.files to give you full paths from the get-go:
files <- list.files(
path = "C:/Users/91932/Downloads/archive (2)/Fitabase Data 4.12.16-5.12.16",
pattern = ".csv",
full.names = TRUE
)

Error comes while importing files by data.table

I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

coursera air pollution assignment

Using Mac OS 10.10.3
RStudio Version 0.98.1103
My working directory is a list of 332 .csv files and I set it correctly. Here's the code:
pollutantmean <- function(directory, pollutant, id = 1:332) {
all_files <- list.files(directory, full.names = T)
dat <- data.frame()
for(i in id) {
dat <- rbind(dat, read.csv(all_files[i]))
}
ds <- (dat[, pollutant], na.rm = TRUE)
mean(ds[, pollutant])
}
Part of the assignment is to get the mean of the first 10 numeric values of a pollutant. To do this, I used the call function (where "spectata" is the directory with 332 .csv files):
pollutantmean(specdata, "Nitrate", 1:10)
The error messages I get are:
**Error in file(file, "rt") : cannot open the connection
** In addition: Warning message: In file(file, "rt") : cannot open file 'NA': No such file or directory
Like many students that have posed questions here, I’m new to programming and to R and still distant from getting any results when calling my function. There are many questions and answers about this coursera assignment in stack overflow but my review of these exchanges hasn't addressed the bug in my code.
Anyone have a suggestion how to fix the bug?
In addition to the other answers is you can try this:
all_files <- list.files(directory, pattern="*.csv", full.names = TRUE)
to avoid select any other kind of file.
or even this strange one
all_files <- paste(directory, "\\", sprintf("%03d", id), ".csv", sep="")
I take the time to answer since the question comes back at every Coursera session.
First, be careful with the typo : Do call pollutantmean("specdata", "Nitrate", 1:10)
instead of pollutantmean(specdata, "Nitrate", 1:10.
Then your working directory should be the parent directory of "specdata" (for exemple, if your path was /dev/specdata, your working directory should have been /dev).
You can get the current working directory with getwd() and set the new one with setwd() (careful there, the path would be relative to the current working directory).
Add a line after all_files <- list.files(directory, full.names = TRUE) (it's a bad habit to use T instead of TRUE):
print(all_files)
Then call your function again, so you will see the content of that object. Then, check where are you working with getwd().
Modify your line no. 5 to dat <- rbind(dat, read.csv(i, comment.char = ""))
This will bind the data of all csv files to 'dat' dataframe.
Based upon the information provided, it can be assumed there are not 332 files in the directory you specify (if one attempts to access an index of a vector that is out of bounds, an NA is returned - hence the error "cannot open file 'NA'"). This is suggestive that the path you are using (which is not provided) points to a directory which does not contain the csv files (presuming there truly are 332 files in that directory). Some suggestions:
Check that the directory you are providing is accurate. Simply do a list.files to see what files exist in the directory you are using.
Use the pattern argument of list.files to be sure you are only going to read the csv files
Loop over the files using the length of the vector returned from list.files, rather than having to code this manually
You can add a sanity check to be sure you are reading all files by printing out each file, or returning a list containing the results and file names

Storing file in CSV format in R looping

I have many data in same format in different directories and also I have one of function for processing those data.
I want to load all of my data and then process those data using my function and then store those data in CSV file.
When I use one of my data, code look like
ENFP_0719 <- f_preprocessing2("D:/DATA/output/ENFP_0719")
write.csv(ENFP_0719, "D:/DATA/output2/ENFP_0719.csv")
And everything is OK, file ENFP_0719.csv was created correctly.
But when I try to use looping, code looks like
setwd("D:/DATA/output")
file_list <- list.files()
for (file in file_list){
file <- f_preprocessing2(print(eval(sprintf("D:/DATA/output/%s",file))))
print("Storing data to csv....")
setwd("D:/DATA/output2")
write.csv(file, sprintf("%s.csv",file))
}
I got error like this
[1] "D:/DATA/output/ENFP_0719"
[1] "Storing data to csv...."
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
I've tried also to use paste paste('data', file, 'csv', sep = '.')
But I got same error. I am so confused with that error because nothing wrong with my function, I already show to you when I tried to use one data everything is ok.
So, whats wrong with my code, is it I have wrong in my loop code or in I have wrong when put parameters for write.csv.
I will wait for your light.
Thank you
I think you could make it a lot simpler by using the full.names argument to list.files and making a few other changes like this:
path = 'data/output'
file_list <- list.files('data/output', full.names=TRUE)
for (file in file_list) {
file_proc <- f_preprocessing2(file)
new_path <- gsub('output', 'output2', file)
write.csv(file_proc, new_path)
}

Resources