Read files by folder in R - r

I was trying to read all files in a folder using R, but I always got an error such like that:
>folder<-"/Volumes/cphg/projects/PROVIDE/freeze" #working directory
>filelist<-list.files(folder) #all files in the directory
>data<-vector("list", length(filelist)) #empty list
>names(data)<-filelist
>for (name in filelist) {
+ data[[name]]<-read.table(paste(folder, name, sep="/"), header=T)
+}
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Does any body know what' wrong here and how to fix it?

You can use tryCatch and return NULL if reading the file fails. Then you can Filter the results to exclude the NULLs
L <- setNames(lapply(filelist, function(x) {
tryCatch(read.table(file.path(folder, name)), error=function(e) NULL)
}), filelist)
data <- Filter(NROW, L)

Just to make it clear... and to close the question properly
The problem is that there is at least one file empty. Check the file name when it through the error.

Related

What can be the problem when "eem_import_dir" is working properly?

eem_import_dir is supposed to "Reads Rdata and RDa files with one eemlist each. The eemlists are combined into one and returned." However, in my case, only one file is read at the time, and thus no combination is happening...
I don't expect any error in the function which is part of the staRdom package. I guess my limited R knowledge limits my understanding of the function and what could be wrong.
All files are the same class (eemlist) and in the same format. Tried changing the folder, filenames, etc. Can someone please help me understand the requirements of the function? Why is only one file read at the time and not all combined?
function (dir)
{
eem_files <- dir(dir, pattern = ".RData$|.RDa$", ignore.case = TRUE) %>%
paste0(dir, "/", .)
for (file in eem_files) {
file <- load(file)
if (get(file) %>% class() == "eemlist") {
if (exists("eem_list"))
eem_list <- eem_bind(eem_list, get(file))
else eem_list <- get(file)
}
else {
warning(paste0(file, " is no object of class eemlist!"))
}
NULL
}
eem_list
}

Importing a password protected xlsx file into R

I found an old thread (How do you read a password protected excel file into r?) that recommended that I use the following code to read in a password protected file:
install.packages("excel.link")
library("excel.link")
dat <- xl.read.file("TestWorkbook.xlsx", password = "pass", write.res.password="pass")
dat
However, when I try to do this my R immediately crashes. I've tried removing the write.res.password argument, and that doesn't seem to be the issue. I have a hunch that excel.link might not work with the newest version of R, so if you know of any other ways to do this I'd appreciate the advice.
EDIT: Using read.xlsx generates this error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "newInstance", .jfindClass(class), :
org.apache.poi.poifs.filesystem.OfficeXmlFileException:
The supplied data appears to be in the Office 2007+ XML.
You are calling the part of POI that deals with OLE2 Office Documents.
You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
You can remove the password of the excel file without knowing it with the following function (adapted version of code available at https://www.r-bloggers.com/2018/05/remove-password-protection-from-excel-sheets-using-r/)
remove_Password_Protection_From_Excel_File <- function(dir, file, bool_XLSXM = FALSE)
{
initial_Dir <- getwd()
setwd(dir)
# file name and path after removing protection
if(bool_XLSXM == TRUE)
{
file_unlocked <- stringr::str_replace(basename(file), ".xlsm$", "_unlocked.xlsm")
}else
{
file_unlocked <- stringr::str_replace(basename(file), ".xlsx$", "_unlocked.xlsx")
}
file_unlocked_path <- file.path(dir, file_unlocked)
# create temporary directory in project folder
# so we see what is going on
temp_dir <- "_tmp"
# remove and recreate _tmp folder in case it already exists
unlink(temp_dir, recursive = TRUE)
dir.create(temp_dir)
# unzip Excel file into temp folder
unzip(file, exdir = temp_dir)
# get full path to XML files for all worksheets
worksheet_paths <- list.files(paste0(temp_dir, "/xl/worksheets"), full.name = TRUE, pattern = ".xml")
# remove the XML node which contains the sheet protection
# We might of course use e.g. xml2 to parse the XML file, but this simple approach will suffice here
for(ws in worksheet_paths)
{
file_Content <- readLines(ws, encoding = "windows1")
# the "sheetProtection" node contains the hashed password "<sheetProtection SOME INFO />"
# we simply remove the whole node
out <- str_replace(file_Content, "<sheetProtection.*?/>", "")
writeLines(out, ws)
}
worksheet_Protection_Paths <- paste0(temp_dir, "/xl/workbook.xml")
file_Content <- readLines(worksheet_Protection_Paths , encoding = "windows1")
out <- stringr::str_replace(file_Content, "<workbookProtection.*?/>", "")
writeLines(out, worksheet_Protection_Paths)
# create a new zip, i.e. Excel file, containing the modified XML files
old_wd <- setwd(temp_dir)
files <- list.files(recursive = T, full.names = F, all.files = T, no.. = T)
# as the Excel file is a zip file, we can directly replace the .zip extension by .xlsx
zip::zip(file_unlocked_path, files = files) # utils::zip does not work for some reason
setwd(old_wd)
# clean up and remove temporary directory
unlink(temp_dir, recursive = T)
setwd(initial_Dir)
}
Once the password is removed, you can read the Excel file. This approach works for me.

source file specified as string in R

I want to, programmatically, source all .R files contained within a given array retrieved with the Sys.glob() function.
This is the code I wrote:
# fetch the different ETL parts
parts <- Sys.glob("scratch/*.R")
if (length(parts) > 0) {
for (part in parts) {
# source the ETL part
source(part)
# rest of code goes here
# ...
}
} else {
stop("no ETL parts found (no data to process)")
}
The problem I have is I cannot do this or, at least, I get the following error:
simpleError in source(part): scratch/foo.bar.com-https.R:4:151: unexpected string constant
I've tried different combinations for the source() function like the following:
source(sprintf("./%s", part))
source(toString(part))
source(file = part)
source(file = sprintf("./%s", part))
source(file = toString(part))
No luck. As I'm globbing the contents of a directory I need to tell R to source those files. As it's a custom-tailored ETL (extract, transform and load) script, I can manually write:
source("scratch/foo.bar.com-https.R")
source("scratch/bar.bar.com-https.R")
source("scratch/baz.bar.com-https.R")
But that's dirty and right now there are 3 different extraction patterns. They could be 8, 80 or even 2000 different patterns so writing it by hand is not an option.
How can I do this?
Try getting the list of files with dir and then using lapply:
For example, if your files are of the form t1.R, t2.R, etc., and are inside the path "StackOverflow" do:
d = dir(pattern = "^t\\d.R$", path = "StackOverflow/", recursive = T, full.names = T)
m = lapply(d, source)
The option recursive = T will search all subdirectories, and full.names = T will add the path to the filenames.
If you still want to use Sys.glob(), this works too:
d = Sys.glob(paths = "StackOverflow/t*.R")
m = lapply(d, source)

Conditionally process (bgzip, tabix) files using loop and if else statement

I have some .vcf files. I have selected those files from my directory and want to convert them to two other formats.
I am a bit confused using if and else if here. I want to do it like this: if there isn't .bgz file for [i]th .vcf file, I want to convert it to .bgz file keeping the original file.
If there is already .bgz file, but not .bgz.tbi file for [i] th .bgz file, then I want to convert .bgz file to .bgz.tbi file keeping the original .bgz that I get from .vcf file.
Can someone please help me finish this loop? It works for if condition, but don't know how to proceed from there.
path.file<-"/mypath/for/files/"
all.files <- list.files("/mypath/for/files")
all.files <- all.files[grepl(".vcf$",all.files)]
for (i in 1:length(all.files)){
if(!exists(paste0(all.files[i],".bgz"))){
bgzip(paste0(path.file,all.files[i]), overwrite=FALSE)
}else{(!exists(paste0(all.files[i],".bgz",".tbi"))){
#if(!exists(paste0(all.files[i],".bgz",".tbi"))){
indexTabix(paste0(paste0(path.file,all.files[i]),".bgz"), format="vcf")
}
}
Try this (not tested):
#get VCF files with path
all.files <- list.files("/mypath/for/files", pattern = "*.vcf$",
full.names = TRUE)
for (i in all.files) {
#make output names, so we don't mess about with paste
file_bgz <- paste0(i, ".bgz")
file_bgz_tbi <- paste0(i, ".bgz.tbi")
#if bgz exists don't zip else zip
if(!exists(file_bgz))
bgzip(i, paste0(i, ".bgz"))
#if tbi exists don't index else tabix
if(!exists(file_bgz_tbi))
indexTabix(file_bgz, format = "vcf")
}

file.copy in R not working

I am using the command file.copy in R and it throws an error, but I can't spot the reason.
file.copy(from="Z:/Ongoing/Test", to = "C:/Users/Darius/Desktop", overwrite = TRUE, recursive = TRUE)
Warning message:
In file.copy(from = "Z:/Ongoing/Test",:
problem copying Z:/Ongoing/Test to C:/Users/Darius/Desktop/Test: No such file or directory
Can anyone see the problem? The command line doesn't work even though it only gives you a warning message.
Actually, I don't think there is any straight forward way to copy a directory. I have written a function which might help you.
This function takes input two arguments:
from: The complete path of directory to be copied
to: The location to which the directory is to be copied
Assumption: from and to are paths of only one directory.
dir.copy <- function(from, to){
## check if from and to directories are valid
if (!dir.exists(from)){
cat('from: No such Directory\n')
return (FALSE)
}
else if (!dir.exists(to)){
cat('to: No such Directory\n')
return (FALSE)
}
## extract the directory name from 'from'
split_ans <- unlist(strsplit(from,'/'))
dir_name <- split_ans[length(split_ans)]
new_to <- paste(to,dir_name,sep='/')
## create the directory in 'to'
dir.create(new_to)
## copy all files in 'to'
file_inside <- list.files(from,full.names = T)
file.copy(from = file_inside,to=new_to)
## copy all subdirectories
dir_inside <- list.dirs(path=from,recursive = F)
if (length(dir_inside) > 0){
for (dir_name in dir_inside)
dir.copy(dir_name,new_to)
}
return (TRUE)
}
The file.copy() doesn't create directories. So it'll only work if you're copying to folders that already exist.
Had similar issue:
This blog was helpful. Slightly modified the code by adding full.names=T and overwrite = T.
current.folder <- "E:/ProjectDirectory/Data/"
new.folder <- "E:/ProjectDirectory/NewData/"
list.of.files <- list.files(current.folder, full.names = T)
# copy the files to the new folder
file.copy(list.of.files, new.folder, overwrite = T)

Resources