How to bypass errors in for loops in r? - r

I created a for loop to merge several csv files in a directory together into one table. In some cases the files that are indicated in the loop have not been created. When the files do not exist the loop produces an error and no files are merged. I am trying to adjust the code so the loop inserts "NULL" or "error" in the parts of the matrix reserved for the files.
Here is the original code:
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
data<-read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
}
This above code works for every loop where the "fpath" actually exists in my directory. When the csv does not exist the following message occurs.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '2.1_PermitIndirectCosts/2.1_2016.csv': No such file or directory
I looked at a few other posts to see how to solve the issue and tried the following
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
possibleerror<- tryCatch(data<-read.csv(fpath,header = TRUE, as.is = TRUE),silent = TRUE),
error=function(e) e
)
if(!inherits(possibleerror,"error"))
{last <- max(ncol(data))
COMP_raw[,(1+3)] <- data[,last]
colnames(COMP_raw)[(1+3)] <- names(data[last])}
}
But that is still generating an error

What about using file.exists().
file.exists returns a logical vector indicating whether the files named by its argument exist.
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j <- dirnames[1] #Take the directory folder name
id <- gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
#Checks if file exists if not, assign NULL
if(file.exists(fpath)){
data <- read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
} else{
colnames(COMP_raw)[(1+1)] <- NULL
}
}

Not specific to your example (I'm on a mobile) but it should help:
var <- try(some function)
if(is(var, "try-error")){
some other function
next;}
If try fails, it will assign the variable a value of "try-error" which you can handle accordingly. next will go to the next item in the loop.

Related

Extract column from multiple csv files and merge into new data frame in R

I want to extract column called X1 out of 168 different .csv files, called table3_2, table3_3, table3_4, table3_5..., table3_168, all held in one folder (folder1). Then, merge into one new df. Contents of the column is factor.
Trying this code but can't get it to work.
folder1 <- "folder1"
folder2 <- "folder2" # destination folder
write_to <- function(file.name) {
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
df <- read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = "/")[X1]
write.csv(df, file = past(folder2, file.name, sep= "/"))
}
files <- list.files(path = folder1, pattern = "*.csv")
lapply(X = paste(folder1, files, sep= "/"), write_to)
This comes up with the error:
Error in file(file, "rt") : cannot open the connection
In addition: warning message:
In file(file, "rt") :
cannot often file folder1/folder1.csv: No such file or directory
So, I am not calling in the correct names of the table, and maybe not directing R to the correct folder (I've set the wd to folder1).
Any suggestions would be greatly appreciated.
Many thanks
There are a few minor issues that stand out, e.g. you have a typo in file = past(folder2, file.name, sep= "/") (should be paste() not past()) but perhaps a simpler approach would suit, e.g. using vroom:
library(vroom)
files <- fs::dir_ls(glob = "table3_*csv")
data <- vroom(files, id = "ID", col_select = c(ID, X1))
data
vroom_write(data, file = "~/xx/folder2/new_df.csv")
# replace "xx" with your path
Does this approach solve your problem?

R: Importing Entire Folder of Files

I am using the R programming language (in R Studio). I am trying to import an entire folder of ".txt" files (notepad files) into R and "consistently" name them.
I know how to do this process manually:
#find working directory:
getwd()
[1] "C:/Users/Documents"
#import files manually and name them "consistently":
df_1 <- read.table("3rd_file.txt")
df_2 <- read.table("file_1.txt")
df_3 <- read.table("second_file.txt")
Of course, this will take a long time to do if there are 100 files.
Right now, suppose these files are in a folder : "C:/Users/Documents/files_i_want"
Is there a way to import all these files at once and name them as "df_1", "df_2", "df_3", etc.?
I found another stackoverflow post that talks about a similar problem: How to import folder which contains csv file in R Studio?
setwd("where is your folder")
#
#List file subdirectories
folders<- list.files(path = "C:/Users/Documents/files_i_want")
#
#Get all files...
files <- rep(NA,0)
for(i in c(1:length(folders)))
{
files.i <- list.files(path = noquote(paste("C:/Users/Documents/files_i_want/",folders[i], "/", sep = "")))
n <- length(files.i)
files.i <- paste(folders[i], files.i, sep = "/")
files <- c(files, files.i)
}
#
#
#Read first data file (& add file name as separate column)
T1 <- read.delim(paste("C:/Users/Documents/files_i_want", files[1], sep = ""), sep = "", header=TRUE)
T1 <- cbind(T1, "FileName" = files[1])
But this produces the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
Is this because there is a problem in the naming convention?
Thanks
You can try the following :
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, read.table)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)

How do I run a for loop on multiple word files in R

I have 44 doc files. From each file, I need to extract the customer name and amount. I am able to this for one file using the read_document command and using the grep to extract the amount and customer name. When I do this for 44 files, I am getting an error. Not sure where I am wrong:
ls()
rm(list = ls())
files <- list.files("~/experiment", ".doc")
files
length(files)
for (i in length(files)){
library(textreadr)
read_document(files[i])
}
Here is the full code that I run on one file:
file <- "~/customer_full_file.docx"
library(textreadr)
full_customer_file <- read_document(file, skip = 0, remove.empty = TRUE, trim = TRUE)
#checking file is read correctly
head(full_customer_file)
tail(full_customer_file)
# Extracting Name
full_customer_file <- full_customer_file[c(1,4)]
amount_extract <- grep("Amount", full_customer_file, value = T)
library(tm)
require(stringr)
amount_extract_2 <- lapply(amount_extract, stripWhitespace)
amount_extract_2 <- str_remove(marks_extract_2, "Amount")
name_extract <- grep("Customer Name and ID: ", full_customer_file, value = T)
name_extract
name_extract_2 <- lapply(name_extract, stripWhitespace)
name_extract_2 <- str_remove(name_extract_2, "Customer Name and ID: ")
name_extract_2 <- as.data.frame(name_extract_2)
names(name_extract_2)[1] <- paste("customer_full_name")
amount_extract_2 <- as.data.frame(amount_extract_2)
names(amount_extract_2)[1] <- paste("amount")
amount_extract_2
customer_final_file <- cbind(name_extract_2, amount_extract_2)
write.table(customer_final_file, "~/customer_amount.csv", sep = ",", col.names = T, append = T)
Here is the code that I run on 44 file
ls()
rm(list = ls())
files <- list.files("~/experiment", ".doc")
files
length(files)
library(textreadr)
for (i in 1:length(files)){
read_document(files[i])
}
Here is the error that I am getting:
> library(textreadr)
> for (i in 1:length(files)){
+ read_document(files[i])
+ }
Warning messages:
1: In utils::unzip(file, exdir = tmp) :
error 1 in extracting from zip file
2: In utils::unzip(file, exdir = tmp) :
error 1 in extracting from zip file
3: In utils::unzip(file, exdir = tmp) :
error 1 in extracting from zip file
4: In utils::unzip(file, exdir = tmp) :
error 1 in extracting from zip file
5: In utils::unzip(file, exdir = tmp) :
error 1 in extracting from zip file
I could give you my code, which I used to analyze different word files through the sentimentr package in R. I guess you can use the same structure that I have and just change the for in function to loop the extraction for every docx.
And this is the code:
library(sentimentr)
folder_path <- "C:\\Users\\yourname\\Documents\\R\\"
# Get a list of all the docx files in the folder
docx_files <- list.files(path = folder_path, pattern = "\\.docx$", full.names = TRUE)
# Create an empty data frame to store the results
results <- data.frame(file = character(0), sentiment = numeric(0))
# Loop over the list of files
for (file in docx_files) {
# Read the docx file
sample_data <- read_docx(file)
# Extract the content and create a summary
content <- docx_summary(sample_data)
law <- content[sapply(strsplit(as.character(content$text),""),length)>5,]
# Calculate the sentiment of the summary (or in your case extraction)
sentiment <- sentiment_by(as.character(law$text))
# Add a row to the data frame with the results for this file
results <- rbind(results, data.frame(file = file, sentiment = sentiment$ave_sentiment))
}
# View the results data frame
View(results)
I hope that is near enough to your problem to solve it

Calculate the mean of one column from several CSV files

I have over 300 CSV files in a folder (named 001.csv, 002.csv and so on). Each contains a data frame with a header. I am writing a function that will take three arguments: the location of the files, the name of the column you want to calculate the mean (inside the data frames), and the files to use in the calculation.
Here is my function:
pollutantmean2 <- function(directory = getwd(), pollutant, id = 1:332) {
# add one or two zeros to ID so that they match the CSV file names
filenames <- sprintf("%03d.csv", id)
# path to specdata folder
# if no path is provided, default is working directory
filedir <- file.path(directory, filenames)
# get the data from selected ID or IDs from the specified path
dataset <- read.csv(filedir, header = TRUE)
# calculate mean removing all NAs
polmean <- mean(dataset$pollutant, na.rm = TRUE)
# return mean
polmean
}
It appears there are two things wrong with my code. To break it down, I separated the function into two separate function to handle the two tasks: 1) get the required files and 2) calculate the mean of the desired column (aka pollutant).
Task 1: Getting the appropriate files - It works as long as I only want one file. If I select a range of files, such as 1:25 I get an error message that says Error in file(file, "rt") : invalid 'description' argument. I have Googled this error but still have no clue how to fix it.
# function that obtains csv files and stores them
getfile <- function(directory = getwd(), id) {
filenames <- sprintf("%03d.csv", id)
filedir <- file.path(directory, filenames)
dataset <- read.csv(filedir, header = TRUE)
dataset
}
If I run getfile("specdata", 1) it works fine, but if I run getfile("specdata", 1:10) I get the following error: Error in file(file, "rt") : invalid 'description' argument.
Task 2: Calculating mean of specified named column - Assuming I have a usable data frame, I then try to calculate the mean with the following function:
calcMean <- function(dataset, pollutant) {
polmean <- mean(dataset$pollutant, na.rm = TRUE)
polmean
}
But if I run calcMean(mydata, "sulfate") (where mydata is a data frame I loaded manually) I get an error message:
Warning message:
In mean.default(dataset$pollutant, na.rm = TRUE) :
argument is not numeric or logical: returning NA
The odd thing is that if I run mean(mydata$sulfate, na.rm = TRUE) in the console, it works fine.
I have researched this for several days and after endless tweaking, I have run out of ideas.
You do not need more functions. The solution can be simpler from my understanding in 6 lines:
pollutantmean <- function(directory, pollutant, id = 1:10) {
filenames <- sprintf("%03d.csv", id)
filenames <- paste(directory, filenames, sep="/")
ldf <- lapply(filenames, read.csv)
df=ldply(ldf)
# df is your list of data.frames
mean(df[, pollutant], na.rm = TRUE)
}
I think your major problem is listing the files in your working directory and reading them into R. Try list.files function in R Example code which may work for you is
files <- list.files(pattern = ".csv") ## creates a vector with all file names in your folder
polmean <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
polmean[i] <- mean(data$pollutant)
}
result <- cbind(files,polmean)
write.csv(result,"result_polmeans.csv")
This program gives you the data with name of file in the first column and corresponding means in the second column.

Converting twitteR results to data frame

I have a simple for loop to write the past 100 tweets of a few usernames to .csv files:
library(twitteR)
mclist <- read.table('usernames.txt')
for (mc in mclist)
{
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), row.names = F)
}
I mostly followed what I've read on StackOverflow but I continue to get this error message:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
Where did I go wrong?
I just cleaned up the code a bit, and everything started working.
Step 1: Let's set the working directory and load the 'twitteR' package.
library(twitteR)
setwd("C:/Users/Dinre/Desktop") # Replace with your desired directory
Step 2: First, we need to load a list of user names from a flat text file. I'm assuming that each line in the text file has one username, like so:
[contents of usernames.txt]
edclef
notch
dkanaga
Let's load it using the 'scan' function to read each line into an array:
mclist <- scan("usernames.txt", what="", sep="\n")
Step 3: We'll loop through the usernames, just like you did before, but we're not going to refer to the directory, since we're going to use the same directory for output as input. The original code had a syntax error in attempting to referring to the desktop directory, and we're just going to sidestep that.
for (mc in mclist){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste(mc, ".csv", sep = ""), row.names = F)
}
I end up with three files on the desktop, and all the data seems to be correct.
edclef.csv
notch.csv
dkanaga.csv
Update: If you really want to refer to different directories within your code, use the '.' character to refer to the parent directory. For instance, if your working directory is your Windows user profile, you would refer to the 'Desktop' folder like so:
setwd("C:/Users/Dinre")
...
write.csv(df, file=paste("./Desktop/". mc, ".csv", sep = ""), row.names = F)
There's a convenience function in the package twListToDF which will handle the conversion of the list of tweets to a data.frame.
Since your mclist is a data.frame, you can replace your for by apply
apply( mclist, 1,function(mc){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), ##!! Change Desktop to
## something like Desktop/tweets/
row.names = F)
})
PS :
The userTimeline function will only work if the user requested has a
public timeline, or you have previously registered a OAuth object
using registerTwitterOAuth

Resources