Parsing issue, unexpected character when loading a folder - r

I am using this answer to load in a folder of Excel Files:
# Get the list of files
#----------------------------#
folder <- "path/to/files"
fileList <- dir(folder, recursive=TRUE) # grep through these, if you are not loading them all
# use platform appropriate separator
files <- paste(folder, fileList, sep=.Platform$file.sep)
So far, so good.
# Load them in
#----------------------------#
# Method 1:
invisible(sapply(files, source, local=TRUE))
#-- OR --#
# Method 2:
sapply(files, function(f) eval(parse(text=f)))
But the source function (Method 1) gives me the error:
Error in source("C:/Users/Username/filename.xlsx") :
C:/Users/filename :1:3: unexpected input
1: PK
^
For method 2 get the error:
Error in parse(text = f) : <text>:1:3: unexpected '/'
1: C:/
^
EDIT: I tried circumventing the issue by setting the working directory to the directory of the folder, but that did not help.
Any ideas why this happens?
EDIT 2: It works when doing the following:
How can I read multiple (excel) files into R?
setwd("...")
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

just to provide a proper answer outside of the comment section...
If your target is to read many Excel files, you shouldn't use source.
source is dedicated to run external R code.
If you need to read many Excel files you can use the following code and the support of one of these libraries: readxl, openxlsx, tidyxl (with unpivotr).
filelist <- dir(folder, recursive = TRUE, full.names = TRUE, pattern = ".xlsx$|.xls$", ignore.case = TRUE)
l_df <- lapply(filelist, readxl::read_excel)
Note that we are using dir to list the full paths (full.names = TRUE) of all the files that ends with .xlsx, .xls (pattern = ".xlsx$|.xls$"), .XLSX, .XLS (ignore.case = TRUE) in the folder folder and all its subfolders (recursive = TRUE).
readxl is integrated with tidyverse. It is pretty easy to use. It is most likely what you're looking for.
Personally, I advice to use openxlsx if you need to write (rather than read) customized Excel files with many specific features.
tidyxl is the best package I've seen to read Excel files, but it may be rather complicated to use. However, it's really careful in the types preservation.
With the support of unpivotr it allows you to handle complicated Excel structures.
For example, when you find multiple headers and multiple left index columns.

Related

Creating objects from all .xlsx documents in working directory

I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline

R - read_csv without using paste

Use the read_csv function to read each of the files you got in the files object with code below:
path <- system.file("extdata", package = "dslabs")
files <- list.files(path)
files
I tried this code below but I get "vroom_ error. Please help.
for (f in files){
read_csv(f)
}
First, in list.files you should set full.names=TRUE to include the whole path. Next, if you look into files, there are also .xls and .pdf files included. You may want to filter just for .csv files, which can easily be done using grep.
files <- list.files(path, full.names=TRUE)
files <- grep('.csv$', files, value=TRUE)
However, even then readr::read_csv complains about column issues.
lst <- readr::read_csv(files)
# Error: Files must all have 2 columns:
# * File 2 has 57 columns
To avoid editing the columns by hand, I recommend to use rio::import_list instead, which gives just a warning, that a column name was guessed and can be changed if needed. You may even include the .xls in the grep.
files <- grep('.csv$|.xls', files, value=TRUE)
lst <- rio::import_list(files)
Note that rio::import_list (as well as readr::read_csv) is vectorized, so you won't need a loop.
Data:
path <- system.file("extdata", package="dslabs")

Error comes while importing files by data.table

I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.

Searching for a code snippet a list of .R files

Suppose I've discovered that in my package, a small piece of code needs to be changed and I cannot recall all the file names where that code may exist.
Is there a package development tool that can identify all the files that contain the problem code given the list of files in the R folder?
Right now, for 14 files in the R directory I'm using
> c(sapply(list.files("R", full.names = TRUE), function(x){
grep("data/", readLines(x, warn = FALSE), value = TRUE)
}), recursive = TRUE)
# R/load-event.R
# " on.exit(file.remove(paste0(\"data/\", list.files(\"data\"))))"
But this could be time-consuming if the file list is long, and the files themselves are big.
It sounds like you are looking for grep. The following command will list all files which contain the string data/.
grep -l 'data/' R/*

Proper phrasing for a loop to convert all .dta files to .csv in a directory

So I have a single instance of dta to csv conversion, and I need to repeat it for all files in a directory. Great help on SO, but I'm still not quite there. Here's the single instance
#Load Foreign Library
library(foreign)
## Set working directory in which dtw files can be found)
setwd("~/Desktop")
## Single File Convert
write.csv(read.dta("example.dta"), file = "example.csv")
From here, I figure I use something like:
## Get list of all the files
file_list<-dir(pattern = ".dta$", recursive=F, ignore.case = T)
## Get the number of files
n <- length(file_list)
## Loop through each file
for(i in 1:n) file_list[[i]]
But I'm not sure of the proper syntax, expressions, etc. After reviewing the great solutions below, I'm just confused (not necessarily getting errors) and about to do it manually -- quick tips for an elegant way to go through each file in a directory and convert it?
Answers reviewed include:
Convert Stata .dta file to CSV without Stata software
applying R script prepared for single file to multiple files in the directory
Reading multiple files from a directory, R
THANKS!!
Got the answer: Here's the final code:
## CONVERT ALL FILES IN A DIRECTORY
## Load Foreign Library
library(foreign)
## Set working directory in which dtw files can be found)
setwd("~/Desktop")
## Convert all files in wd from DTA to CSV
### Note: alter the write/read functions for different file types. dta->csv used in this specific example
for (f in Sys.glob('*.dta'))
write.csv(read.dta(f), file = gsub('dta$', 'csv', f))
If the files are in your current working directory, one way would be to use Sys.glob to get the names, then loop over this vector.
for (f in Sys.glob('*.dta'))
write.csv(read.dta(f), file = gsub('dta$', 'csv', f))

Resources