I am trying to read in a number of Excel files into R using read.xlsx using the xlsx package but when I do so I am getting the following error:
Error in loadWorkbook(file) : Cannot find id100.xlsx
First I list the files in the directory:
> files <- list.files(datDir, pattern = ".xlsx")
Then I use read.xlsx to read them all in:
for (i in seq_along(files)) {
assign(paste("id", i, sep = "."), read.xlsx(files[i],1,as.data.frame=TRUE,
header=FALSE, stringsAsFactors=FALSE, na.strings=" "))
}
I checked to see if the file was even in the list and it is:
> files
[1] "id100.xlsx" "id101.xlsx" etc...
> files[1]
[1] "id100.xlsx"
I have used this code many times before today and for some reason it is just not working. I keep getting that error. Does anyone have any suggestions?
Thanks!
If your working directory is different from datDir you should use full.names=T like this:
files <- list.files(datDir, pattern = ".xlsx",full.names=T)
Related
I have a tab delimited file that is saved as a .txt with " " around the string variables. The file can be found here.
I am trying to read it into Spark-R (version 3.1.2), but cannot successfully bring it into the environment. I've tried variations of the read.df code, like this:
df <- read.df(path = "FILE.txt", header="True", inferSchema="True", delimiter = "\t", encoding="ISO-8859-15")
df <- read.df(path = "FILE.txt", source = "txt", header="True", inferSchema="True", delimiter = "\t", encoding="ISO-8859-15")
I have had success with bringing in CSVs with read.csv, but many of the files I have are over 10GB, and is not practical to convert them to CSV before bring them into Spark-R.
EDIT: When I run read.df I get a laundry list of errors, starting with this:
I am able to bring in csv files used in a previous project with both read.df and read.csv, so I don't think it's a java issue.
If you don't need to specifically use Spark R, then base R read.table should work just fine for the .txt you provided. Note that it is tab-delimited, and so this should be specified.
Something like this should work:
dat <- read.table("FILE.TXT",
sep="\t",
header=TRUE)
I am using this answer to load in a folder of Excel Files:
# Get the list of files
#----------------------------#
folder <- "path/to/files"
fileList <- dir(folder, recursive=TRUE) # grep through these, if you are not loading them all
# use platform appropriate separator
files <- paste(folder, fileList, sep=.Platform$file.sep)
So far, so good.
# Load them in
#----------------------------#
# Method 1:
invisible(sapply(files, source, local=TRUE))
#-- OR --#
# Method 2:
sapply(files, function(f) eval(parse(text=f)))
But the source function (Method 1) gives me the error:
Error in source("C:/Users/Username/filename.xlsx") :
C:/Users/filename :1:3: unexpected input
1: PK
^
For method 2 get the error:
Error in parse(text = f) : <text>:1:3: unexpected '/'
1: C:/
^
EDIT: I tried circumventing the issue by setting the working directory to the directory of the folder, but that did not help.
Any ideas why this happens?
EDIT 2: It works when doing the following:
How can I read multiple (excel) files into R?
setwd("...")
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
just to provide a proper answer outside of the comment section...
If your target is to read many Excel files, you shouldn't use source.
source is dedicated to run external R code.
If you need to read many Excel files you can use the following code and the support of one of these libraries: readxl, openxlsx, tidyxl (with unpivotr).
filelist <- dir(folder, recursive = TRUE, full.names = TRUE, pattern = ".xlsx$|.xls$", ignore.case = TRUE)
l_df <- lapply(filelist, readxl::read_excel)
Note that we are using dir to list the full paths (full.names = TRUE) of all the files that ends with .xlsx, .xls (pattern = ".xlsx$|.xls$"), .XLSX, .XLS (ignore.case = TRUE) in the folder folder and all its subfolders (recursive = TRUE).
readxl is integrated with tidyverse. It is pretty easy to use. It is most likely what you're looking for.
Personally, I advice to use openxlsx if you need to write (rather than read) customized Excel files with many specific features.
tidyxl is the best package I've seen to read Excel files, but it may be rather complicated to use. However, it's really careful in the types preservation.
With the support of unpivotr it allows you to handle complicated Excel structures.
For example, when you find multiple headers and multiple left index columns.
I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.
I have a list of 15 files stored in an object FILELIST. The task is to read all the files from FILELIST from a particular directory and append one below other.
In below code, object called 'dataset' will have the final appended file. The issue I am facing is if one or more files present in FILELIST is not present in directory, I am getting an error as below. What I need is if 1 or more out of 15 files are not present in the directory, code should proceed appending rest of the files.
I have tried with try exception handling method, but still getting below error and the code doesn't process rest of the files.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'PREDICTION_2016_Q4_Wk13.csv': No such file or directory
Code:
for (file in FILELIST) {
try(
if (!exists("dataset")) {
dataset <- read.table(file, header=TRUE, sep=",")
}
if (exists("dataset")) {
temp_dataset <-read.table(file, header=TRUE, sep=",")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
},
silent = T
)
}
I would not use exception handling for this. Instead do something like this:
for (file in intersect(FILELIST, list.files())) {
Combination of the two other answers, using readr + dplyr for speed:
library(dplyr)
library(readr)
# existing files
f <- intersect(FILELIST, list.files())
# or identically:
# f <- intersect(FILELIST, dir())
# f <- FILELIST[ file.exists(FILELIST) ]
# combine in a single dataset
d <- bind_rows(lapply(f, read_csv))
First use file.exists and Filter to reduce FILELIST to the ones that exist and then read each one and rbind them together at the end.
Note that this works both in the situation that FILELIST contains file names from the current directory and also works if the files are located elsewhere and path/filenames are specified in FILELIST.
No packages are used.
do.call("rbind", lapply(Filter(file.exists, FILELIST), read.csv))
Update: Improved code.
I have many data in same format in different directories and also I have one of function for processing those data.
I want to load all of my data and then process those data using my function and then store those data in CSV file.
When I use one of my data, code look like
ENFP_0719 <- f_preprocessing2("D:/DATA/output/ENFP_0719")
write.csv(ENFP_0719, "D:/DATA/output2/ENFP_0719.csv")
And everything is OK, file ENFP_0719.csv was created correctly.
But when I try to use looping, code looks like
setwd("D:/DATA/output")
file_list <- list.files()
for (file in file_list){
file <- f_preprocessing2(print(eval(sprintf("D:/DATA/output/%s",file))))
print("Storing data to csv....")
setwd("D:/DATA/output2")
write.csv(file, sprintf("%s.csv",file))
}
I got error like this
[1] "D:/DATA/output/ENFP_0719"
[1] "Storing data to csv...."
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
I've tried also to use paste paste('data', file, 'csv', sep = '.')
But I got same error. I am so confused with that error because nothing wrong with my function, I already show to you when I tried to use one data everything is ok.
So, whats wrong with my code, is it I have wrong in my loop code or in I have wrong when put parameters for write.csv.
I will wait for your light.
Thank you
I think you could make it a lot simpler by using the full.names argument to list.files and making a few other changes like this:
path = 'data/output'
file_list <- list.files('data/output', full.names=TRUE)
for (file in file_list) {
file_proc <- f_preprocessing2(file)
new_path <- gsub('output', 'output2', file)
write.csv(file_proc, new_path)
}