coursera air pollution assignment - r

Using Mac OS 10.10.3
RStudio Version 0.98.1103
My working directory is a list of 332 .csv files and I set it correctly. Here's the code:
pollutantmean <- function(directory, pollutant, id = 1:332) {
all_files <- list.files(directory, full.names = T)
dat <- data.frame()
for(i in id) {
dat <- rbind(dat, read.csv(all_files[i]))
}
ds <- (dat[, pollutant], na.rm = TRUE)
mean(ds[, pollutant])
}
Part of the assignment is to get the mean of the first 10 numeric values of a pollutant. To do this, I used the call function (where "spectata" is the directory with 332 .csv files):
pollutantmean(specdata, "Nitrate", 1:10)
The error messages I get are:
**Error in file(file, "rt") : cannot open the connection
** In addition: Warning message: In file(file, "rt") : cannot open file 'NA': No such file or directory
Like many students that have posed questions here, I’m new to programming and to R and still distant from getting any results when calling my function. There are many questions and answers about this coursera assignment in stack overflow but my review of these exchanges hasn't addressed the bug in my code.
Anyone have a suggestion how to fix the bug?

In addition to the other answers is you can try this:
all_files <- list.files(directory, pattern="*.csv", full.names = TRUE)
to avoid select any other kind of file.
or even this strange one
all_files <- paste(directory, "\\", sprintf("%03d", id), ".csv", sep="")

I take the time to answer since the question comes back at every Coursera session.
First, be careful with the typo : Do call pollutantmean("specdata", "Nitrate", 1:10)
instead of pollutantmean(specdata, "Nitrate", 1:10.
Then your working directory should be the parent directory of "specdata" (for exemple, if your path was /dev/specdata, your working directory should have been /dev).
You can get the current working directory with getwd() and set the new one with setwd() (careful there, the path would be relative to the current working directory).

Add a line after all_files <- list.files(directory, full.names = TRUE) (it's a bad habit to use T instead of TRUE):
print(all_files)
Then call your function again, so you will see the content of that object. Then, check where are you working with getwd().

Modify your line no. 5 to dat <- rbind(dat, read.csv(i, comment.char = ""))
This will bind the data of all csv files to 'dat' dataframe.

Based upon the information provided, it can be assumed there are not 332 files in the directory you specify (if one attempts to access an index of a vector that is out of bounds, an NA is returned - hence the error "cannot open file 'NA'"). This is suggestive that the path you are using (which is not provided) points to a directory which does not contain the csv files (presuming there truly are 332 files in that directory). Some suggestions:
Check that the directory you are providing is accurate. Simply do a list.files to see what files exist in the directory you are using.
Use the pattern argument of list.files to be sure you are only going to read the csv files
Loop over the files using the length of the vector returned from list.files, rather than having to code this manually
You can add a sanity check to be sure you are reading all files by printing out each file, or returning a list containing the results and file names

Related

Parsing issue, unexpected character when loading a folder

I am using this answer to load in a folder of Excel Files:
# Get the list of files
#----------------------------#
folder <- "path/to/files"
fileList <- dir(folder, recursive=TRUE) # grep through these, if you are not loading them all
# use platform appropriate separator
files <- paste(folder, fileList, sep=.Platform$file.sep)
So far, so good.
# Load them in
#----------------------------#
# Method 1:
invisible(sapply(files, source, local=TRUE))
#-- OR --#
# Method 2:
sapply(files, function(f) eval(parse(text=f)))
But the source function (Method 1) gives me the error:
Error in source("C:/Users/Username/filename.xlsx") :
C:/Users/filename :1:3: unexpected input
1: PK
^
For method 2 get the error:
Error in parse(text = f) : <text>:1:3: unexpected '/'
1: C:/
^
EDIT: I tried circumventing the issue by setting the working directory to the directory of the folder, but that did not help.
Any ideas why this happens?
EDIT 2: It works when doing the following:
How can I read multiple (excel) files into R?
setwd("...")
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
just to provide a proper answer outside of the comment section...
If your target is to read many Excel files, you shouldn't use source.
source is dedicated to run external R code.
If you need to read many Excel files you can use the following code and the support of one of these libraries: readxl, openxlsx, tidyxl (with unpivotr).
filelist <- dir(folder, recursive = TRUE, full.names = TRUE, pattern = ".xlsx$|.xls$", ignore.case = TRUE)
l_df <- lapply(filelist, readxl::read_excel)
Note that we are using dir to list the full paths (full.names = TRUE) of all the files that ends with .xlsx, .xls (pattern = ".xlsx$|.xls$"), .XLSX, .XLS (ignore.case = TRUE) in the folder folder and all its subfolders (recursive = TRUE).
readxl is integrated with tidyverse. It is pretty easy to use. It is most likely what you're looking for.
Personally, I advice to use openxlsx if you need to write (rather than read) customized Excel files with many specific features.
tidyxl is the best package I've seen to read Excel files, but it may be rather complicated to use. However, it's really careful in the types preservation.
With the support of unpivotr it allows you to handle complicated Excel structures.
For example, when you find multiple headers and multiple left index columns.

Error comes while importing files by data.table

I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

import multiple txt files into R

I am working with MODIS 8-day data and am trying to import all the txt files of one MODIS product into the R, but not as one single data.frame, as individual txt files. So I can later apply same functions on them. The main objective is to export specific elements within each txt file. I was successful in excluding the desired elements from one txt file with the following command:
# selecting the element within the table
idxs <- gsub("\\]",")", gsub("\\[", "c(", "[24,175], [47,977], [159,520], [163,530]
,[165,721], [168,56], [217,820],[243,397],[252,991],[284,277],[292,673]
,[322,775], [369,832], [396,872], [434,986],[521,563],[522,717],[604,554]
,[608,50],[614,69],[752,213],[780,535],[786,898],[788,1008],[853,1159],[1014,785],[1078,1070]") )
lst <- rbind( c(24,175), c(47,977), c(159,520), c(163,530) ,c(165,721), c(168,56), c(217,820),c(243,397),c(252,991),c(284,277),c(292,673),c(322,775), c(369,832), c(396,872), c(434,986),c(521,563),c(522,717),c(604,554),c(608,50),c(614,69),c(752,213),c(780,535),c(786,898),c(788,1008),c(853,1159),c(1014,785),c(1078,1070))
mat <- matrix(scan("lst.txt",skip = 6),nrow=1200)
Clist <- as.data.frame(mat[lst])
But I need these element from all of the txt files and honestly I do not want to run it manually for 871 times. So I try to read all the txt files and then apply this function to them. but unfortunately it does not work. here is my approach:
folder <- "C:/Users/Documents/R/MODIS/txt/"
txt_files <- list.files(path=folder, pattern=".txt")
df= c(rep(data.frame(), length(txt_files)))
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(txt_files[i]))}
and this is the error I encounter:
**Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'rastert_a2001361.txt': No such file or directory**
additional information: each txt file includes 1200rows and 1200columns and 20-30 elements need to be extracted from the table.
I am very much looking forward for your answers and appreciate any helps or recommendations with this matter.
The issue is that list.files returns only the file name within the folder, not the full path to the file. If you working direction is not "C:/Users/Documents/R/MODIS/txt/" your code could not work. Change your code to
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(file.path(folder, txt_files[i])))}
Now it should be working.
file.path combines your path and your file with correct, OS specific, path seperator.

Passing directory path as parameter in R

I have a simple function in R that runs summary() via lapply() on many CSVs from one directory that I specify. The function is shown below:
# id -- the file name (i.e. 001.csv) so ID == 001.
# directory -- location of the CSV files (not my working directory)
# summarize -- boolean val if summary of the CSV to be output to console.
getMonitor <- function(id, dir, summarize = FALSE)
{
fl <- list.files(dir, pattern = "*.csv", full.names = FALSE)
fdl <- lapply(fl, read.csv)
dataSummary <- lapply(fdl, summary)
if(summarize == TRUE)
{ dataSummary[[id]] }
}
When I try to specify the directory and then pass it as a parameter to the function like so:
dir <- "C:\\Users\\ST\\My Documents\\R\\specdata"
funcVar <- getMonitor("001", dir, FALSE)
I receive the error:
Error in file(file, "rt") : cannot open the connection. In addition: Warning message:
In file(file, "rt") : cannot open file '001.csv': No such file or directory
Yet when I run the code below on its own:
fl <- list.files("C:\\Users\\ST\\My Documents\\R\\specdata",
pattern = "*.csv",
full.names = FALSE)
fl[1]
It find the directory I'm pointing to and fl[1] correctly outputs [1] "001.csv" which is the first file listed.
My question is what am I doing wrong when trying to pass this path variable as a parameter to my function. Is R incapable of handling a parameter this way? Is there something I'm just completely missing? I've tried searching around and am familiar with other programming languages so, frankly, I feel kind of stupid/defeated for getting stuck on this right now.
You're passing fl[1] directly to read.csv with the qualifying path. If, instead, you use full.names=TRUE you'll get the full path and your read.csv step will work properly. However, you'll have to do a little munge to make your if statement function again.
You could also expand on your lapply function to paste the directory and file name together:
fdl <- lapply(fl, function(x) read.csv(paste(dir, x, sep='\\')))
Or create this pasted full path in a separate line:
fl.qualified <- paste(dir, fl, sep='\\')
fdl <- lapply(fl.qualified, read.csv)
When you do the paste step, if you want to be really explicit, I would encourage a regex to make sure you don't have someone passing a directory with a trailing slash:
fl.qualified <- paste(gsub('\\\\$', '', dir), f1, sep='\')
or something along those lines.

Resources