fread() in R unable to open a file - r

I am trying to open a file in R as shown below:
data0 <- filename_a %>% map_df(~fread(., sep=",", skip=1))
Let us assume that fread fails to read this file for various reasons. Such as the file is under use by other program or the file does not exist. In such a case I would like to read filename_b instead.
At this moment, as soon as the above step fails, the code stops executing. How can I read filename_b when filename_a fails to read?

You can try using tryCatch as follows :
library(data.table)
data <- tryCatch(fread(filename_a, sep=",", skip=1),
error = function(e) return(fread(filename_b, sep=",", skip=1)))

Related

NetCDF: HDF error only inside a loop in R

I have a script to loop through a selection of net cdf files. The files are opened, data extracted, then closed again. I have used this many times before and it works with no issue. I was recently sent a new selection of files to run through the same code. I can check the files individually using the ncdf4 package and nc_open() function. The files look fine and are not corrupt. However, when I run through the loop the function will not let me open the files and I get this error:
Error in R_nc4_open: NetCDF: HDF error
When I run though the loop to check, all is fine and the file opens. It just cannot open in the loop. There is no issue with the code.
Has anyone come across this before with non-corrupt net cdf files getting this error only on occasion. Even outside the loop I can run the code and get the error first time, then run it again without changing anything and the connection works.
Not sure how to trouble shoot this one, so just looking for advice as to why this might be happening.
Code snippet:
targetYear <- '2005-2019'
variables <- c('CHL','SSH')
ncNam <- list.files(folderdir, '.nc', recursive = TRUE)
for(v in 1:length((variables)))
{
varNam <- unlist(unique(variables))[v]
# Get names corresponding to variable
varLs <- ncNam[grep(varNam, basename(ncNam))]
varLs <- varLs[grep(targetYear, varLs)]}
varLs <- varLs[1]
export <- paste0(exportdir,varNam,'/')
dir.create(export, recursive = TRUE)
if(varNam == 'Proximity1km' | varNam == 'Proximity200m'| varNam ==
'ProximityCoast'| varNam == 'Bathymetry'){
fileNam <- varLs
ncfilename <- paste0(folderdir, fileNam)
print(ncfilename)
# Read ncfile
ncfile <- nc_open(ncfilename)
nc_close(ncfile)
gc()
} else {
fileNam <- varLs
ncfilename <- paste0(folderdir, fileNam)
print(ncfilename)
# Read ncfile
ncfile <- nc_open(ncfilename)
nc_close(ncfile)
gc()}`
I figured out the issue. It was to do with the error detection filer in the .nc files.
I removed the filter and the files work fine inside the loop. Still a bit strange.
Perhaps the ncdf4 package is not up to date with this filtering.

How do I load a remote .Rdata file in R?

I am attempting to load a .Rdata file without having to download the actual file to my computer.
Here is what I have so far:
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118546/suppl/GSE118546_macaque_fovea_all_10X_Jan2018.Rdata.gz"
con <- gzcon(url(primate_URL))
load(con)
When I run the script, it returns this error:
Error in load(con) :
the input does not start with a magic number compatible with loading from a connection
Any tips on what I might be doing wrong?
Try following. I tried it with a small file from the same source of same type.
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118992/suppl/GSE118992_supplementaryData.Rdata.gz"
download.file(primate_URL, "GSE118992_supplementaryData.Rdata.gz")
file.x <- gzfile("GSE118992_supplementaryData.Rdata.gz", open = "r")
file.x.data <- readLines(file.x, encoding="UTF-8")

Error comes while importing files by data.table

I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.

Reading a csv file from aws datalake

I am trying to read a csv file from aws datalake using R.
I used the below code to read the data, unfortunately I am getting an error
Error in read.table(file = file, header = header, sep = sep, quote =
quote, : no lines available in input
I am using the below code,
aws.signature::use_credentials()
c<- get_object("s3://datalake-1/x-data/")
cobj<- rawToChar(c)
con<- textConnection(cobj)
data <- read.csv(con)
close(con)
data
It looks like the file is not present at the address/URI provided. Unable to reproduce this error so, maybe look for your CSV's correct location.
Apart from that I'd also put the read statement within tryCatch as referenced in an already existing answer linked here

Storing file in CSV format in R looping

I have many data in same format in different directories and also I have one of function for processing those data.
I want to load all of my data and then process those data using my function and then store those data in CSV file.
When I use one of my data, code look like
ENFP_0719 <- f_preprocessing2("D:/DATA/output/ENFP_0719")
write.csv(ENFP_0719, "D:/DATA/output2/ENFP_0719.csv")
And everything is OK, file ENFP_0719.csv was created correctly.
But when I try to use looping, code looks like
setwd("D:/DATA/output")
file_list <- list.files()
for (file in file_list){
file <- f_preprocessing2(print(eval(sprintf("D:/DATA/output/%s",file))))
print("Storing data to csv....")
setwd("D:/DATA/output2")
write.csv(file, sprintf("%s.csv",file))
}
I got error like this
[1] "D:/DATA/output/ENFP_0719"
[1] "Storing data to csv...."
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
I've tried also to use paste paste('data', file, 'csv', sep = '.')
But I got same error. I am so confused with that error because nothing wrong with my function, I already show to you when I tried to use one data everything is ok.
So, whats wrong with my code, is it I have wrong in my loop code or in I have wrong when put parameters for write.csv.
I will wait for your light.
Thank you
I think you could make it a lot simpler by using the full.names argument to list.files and making a few other changes like this:
path = 'data/output'
file_list <- list.files('data/output', full.names=TRUE)
for (file in file_list) {
file_proc <- f_preprocessing2(file)
new_path <- gsub('output', 'output2', file)
write.csv(file_proc, new_path)
}

Resources