Error: Invalid 'path' argument to function - r

I have recently downloaded R (so should be the latest version). I am trying to create a function (corr) that reads in multiple csv files from a directory containing data on pollutants, and uses the complete cases in each file to return the correlation between the "sulfate" and "nitrate" columns. A threshold for the minimum number of complete cases is also used.
The function corr is created without any errors, but when I try to use it (try and run the bottom line of code) I get the error:
Error in list.files(directory, pattern = ".csv", full.names = TRUE) :
invalid 'path' argument
Below is the code I am trying:
corr <- function(directory, threshold = 0) {
filenames3 <- list.files(directory, pattern = ".csv", full.names = TRUE)
loop_length <- length(filenames3)
correlation_values <- numeric()
for(i in loop_length) {
read_in_data3 <- read.csv(filenames3[i])
complete_boolean <- complete.cases(read_in_data3)
nobs2 <- sum(complete_boolean)
data_rmNA <- read_in_data3[complete_boolean,]
if(nobs2 > threshold) {
correlation_values <- c(correlation_values,
cor(data_rmNA[["sulfate"]],
data_rmNA[["nitrate"]]))
}
}
correlation_values
}
corr("C:/Users/Danie/OneDrive/Documents/R/specdata")
I am new to R so it may be a basic mistake. The working directory is the same as in the last line of code, and contains all the csv data files. If I put path=directory in the first line of code then the error changes to:
Error in list.files(directory, pattern = ".csv", full.names = TRUE) :
object 'directory' not found
I don't understand this as the directory is specified as an argument in corr.
Really stuck on this and don't seem to be making any progress. Thanks in advance for any help!
Ps. First post, let me know if there's any etiquette rules that I've missed.

Related

How to read a range in read_excel within a loop (where length(x) == 1L)

I'm reading multiple Excel files and sheets within those files in loops. A certain range should be read from those sheets and added to a dataframe, corresponding to each file.
With the code I have written so far I can read the files and sheets and put them into a dataframe. However, it gives me the following error when specifying the range:
Error in as.cell_limits.character(range) : length(x) == 1L is not TRUE
path <- "my file path"
files_list <- list.files(path, pattern="*.xlsx", full.names = TRUE)
files_list_names <- str_extract(list.files(path, pattern="*.xlsx"),"[^.]+") ###extract filename without file extension
count_files <- length(files_list_names)
for (i in 1:count_files){
current_file <- files_list_names[i]
current_file_data <- read_excel(files_list[i], range="B22:B30")
this_file_sheets <- excel_sheets(files_list[i])
count_sheets <- length(this_file_sheets)
for (j in 1:count_sheets){
current_sheet_data <- read_excel(files_list[i],sheet = this_file_sheets[j],range("F22:F30"))
bind_cols(current_file_data,current_sheet_data)
}
assign(paste0(current_file),current_file_data,envir = .GlobalEnv)
}
I have absolutely no clue what that error means and I can't find anything on the web.
As always, your help is much appreciated!
Be sure to use range=, rather than range() when you call readxl::read_excel(). The latter will be invoking base::range() function. When you pass a string to base::range(), you get a vector of length 2 like this:
base::range("B22:B30")
[1] "B22:B30" "B22:B30"
If that vector of length 2 is passed to the range parameter of the read_excel() function, you will get the above error.

Error in file(file, "rt") : cannot open the connection - Unsure of what to do

I am currently working through Coursera's R Programming course and have hit a bit of a snag with this assignment. I have been getting various errors (not I'm not totally sure I've nailed down) but this is a new one and no matter what I do I can't seem to shake it.
Whenever I run the below code it comes back with
Error in file(file, "rt") : cannot open the connection
pollutantmean <- function (directory, pollutant, id){
files<- list.files(path = directory, "/", full.names = TRUE)
dat <- data.frame()
dat <- sapply(file = directory,"/", read.csv)
mean(dat["pollutant"], na.rm = TRUE)
}
I have tried numerous different solutions posted here on SO for this issue but none of it has worked. I made sure that I am running after setting the working directory to the folder with all of the CSV files and I can see all of the files in the file pane. I have also moved that working directory around a few times since some of the suggestions were to put it on the desktop, etc. but none of that has worked. I am currently running R Studio as an admin but that does not seem to have done anything and I have also modified the permissions on the specdata file to ensure there's no weird restrictions there. Any help is appreciated.
Here are two possible implementations:
# list all files in "directory", read them, combine and then take mean of "pollutant" column
pollutantmean_1 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
dat <- data.table::rbindlist(dat) |> as.data.frame()
mean(dat[, 'pollutant' ], na.rm = TRUE)
}
# list all files in "directory", read them, take the mean of "pollutant" column for each file and return them
pollutantmean_2 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
pollutant_means <- sapply(dat, function(x) mean(x[ , 'pollutant' ], na.rm = TRUE))
names(pollutant_means) <- basename(files)
pollutant_means
}

R rename files keeping part of original name

I'm trying to rename all files in a folder (about 7,000 files) files with just a portion of their original name.
The initial fip code is a 4 or 5 digit code that identifies counties, and is different for every file in the folder. The rest of the name in the original files is the state_county_lat_lon of every file.
For example:
Original name:
"5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth"
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth"
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth"
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth"
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth"
And I need it to rename with just the initial code (fips):
"5081.wth"
"7083.wth"
"11085.wth"
"13087.wth"
"17089.wth"
I've tried by using the list.files and file.rename functions, but I do not know how to identify the code name out of he full name. Some kind of a "wildcard" could work, but don't know how to apply those properly because they all have the same pattern but differ in content.
This is what I've tried this far:
setwd("C:/Users/xxx")
Files <- list.files(path = "C:/Users/xxx", pattern = "fips_*.wth" all.files = TRUE)
newName <- paste("fips",".wth", sep = "")
for (x in length(Files)) {
file.rename(nFiles,newName)}
I've also tried with the "sub" function as follows:
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("_*", ".wth", Files)}
but get Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
OR
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("^(\\d+)_.*", "\\1.wth", file)}
Which runs without errors but does nothing to the names in the file.
I could use any help.
Thanks
Here is my example.
Preparation for data to use;
dir.create("test_dir")
data_sets <- c("5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth",
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth",
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth",
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth",
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth")
setwd("test_dir")
file.create(data_sets)
Rename the files;
Files <- list.files(all.files = TRUE, pattern = ".wth")
newName <- sub("^(\\d+)_.*", "\\1.wth", Files)
file.rename(Files, newName)

How to bypass errors in for loops in r?

I created a for loop to merge several csv files in a directory together into one table. In some cases the files that are indicated in the loop have not been created. When the files do not exist the loop produces an error and no files are merged. I am trying to adjust the code so the loop inserts "NULL" or "error" in the parts of the matrix reserved for the files.
Here is the original code:
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
data<-read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
}
This above code works for every loop where the "fpath" actually exists in my directory. When the csv does not exist the following message occurs.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '2.1_PermitIndirectCosts/2.1_2016.csv': No such file or directory
I looked at a few other posts to see how to solve the issue and tried the following
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
possibleerror<- tryCatch(data<-read.csv(fpath,header = TRUE, as.is = TRUE),silent = TRUE),
error=function(e) e
)
if(!inherits(possibleerror,"error"))
{last <- max(ncol(data))
COMP_raw[,(1+3)] <- data[,last]
colnames(COMP_raw)[(1+3)] <- names(data[last])}
}
But that is still generating an error
What about using file.exists().
file.exists returns a logical vector indicating whether the files named by its argument exist.
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j <- dirnames[1] #Take the directory folder name
id <- gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
#Checks if file exists if not, assign NULL
if(file.exists(fpath)){
data <- read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
} else{
colnames(COMP_raw)[(1+1)] <- NULL
}
}
Not specific to your example (I'm on a mobile) but it should help:
var <- try(some function)
if(is(var, "try-error")){
some other function
next;}
If try fails, it will assign the variable a value of "try-error" which you can handle accordingly. next will go to the next item in the loop.

Calculate the mean of one column from several CSV files

I have over 300 CSV files in a folder (named 001.csv, 002.csv and so on). Each contains a data frame with a header. I am writing a function that will take three arguments: the location of the files, the name of the column you want to calculate the mean (inside the data frames), and the files to use in the calculation.
Here is my function:
pollutantmean2 <- function(directory = getwd(), pollutant, id = 1:332) {
# add one or two zeros to ID so that they match the CSV file names
filenames <- sprintf("%03d.csv", id)
# path to specdata folder
# if no path is provided, default is working directory
filedir <- file.path(directory, filenames)
# get the data from selected ID or IDs from the specified path
dataset <- read.csv(filedir, header = TRUE)
# calculate mean removing all NAs
polmean <- mean(dataset$pollutant, na.rm = TRUE)
# return mean
polmean
}
It appears there are two things wrong with my code. To break it down, I separated the function into two separate function to handle the two tasks: 1) get the required files and 2) calculate the mean of the desired column (aka pollutant).
Task 1: Getting the appropriate files - It works as long as I only want one file. If I select a range of files, such as 1:25 I get an error message that says Error in file(file, "rt") : invalid 'description' argument. I have Googled this error but still have no clue how to fix it.
# function that obtains csv files and stores them
getfile <- function(directory = getwd(), id) {
filenames <- sprintf("%03d.csv", id)
filedir <- file.path(directory, filenames)
dataset <- read.csv(filedir, header = TRUE)
dataset
}
If I run getfile("specdata", 1) it works fine, but if I run getfile("specdata", 1:10) I get the following error: Error in file(file, "rt") : invalid 'description' argument.
Task 2: Calculating mean of specified named column - Assuming I have a usable data frame, I then try to calculate the mean with the following function:
calcMean <- function(dataset, pollutant) {
polmean <- mean(dataset$pollutant, na.rm = TRUE)
polmean
}
But if I run calcMean(mydata, "sulfate") (where mydata is a data frame I loaded manually) I get an error message:
Warning message:
In mean.default(dataset$pollutant, na.rm = TRUE) :
argument is not numeric or logical: returning NA
The odd thing is that if I run mean(mydata$sulfate, na.rm = TRUE) in the console, it works fine.
I have researched this for several days and after endless tweaking, I have run out of ideas.
You do not need more functions. The solution can be simpler from my understanding in 6 lines:
pollutantmean <- function(directory, pollutant, id = 1:10) {
filenames <- sprintf("%03d.csv", id)
filenames <- paste(directory, filenames, sep="/")
ldf <- lapply(filenames, read.csv)
df=ldply(ldf)
# df is your list of data.frames
mean(df[, pollutant], na.rm = TRUE)
}
I think your major problem is listing the files in your working directory and reading them into R. Try list.files function in R Example code which may work for you is
files <- list.files(pattern = ".csv") ## creates a vector with all file names in your folder
polmean <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
polmean[i] <- mean(data$pollutant)
}
result <- cbind(files,polmean)
write.csv(result,"result_polmeans.csv")
This program gives you the data with name of file in the first column and corresponding means in the second column.

Resources