Read a zip file in r

Read a zip file in r - r

I want to read a zip file from the web, my code is as followed
temp<-tempfile()
download<-download.file("http://depts.washington.edu/control/LARRY/TE/IDVs/idv1.zip",temp)
data<-read.table(unz(temp,"r.dat"),head=FALSE)
unlink(temp)
But it shows an error
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file 'r.dat' in zip file 'C:\Users\CHENGF~2\AppData\Local\Temp\RtmpgtJShr\file361c5d0a55eb'
I don't know why it can't read the data, hope someone can help me!

this works for all the idv files except the idv1 which appears to be corrupted. You would need to unzip idv1.zip using another tool and read it in...
readrdat <- function(n) {
fname <- paste0("idv",n)
zipname <- paste0(fname,".zip")
weblink <- paste0("http://depts.washington.edu/control/LARRY/TE/IDVs/",zipname)
download.file(weblink,zipname)
data <- read.table(unz(zipname,paste0(fname,"/r.dat")),header=FALSE)
unlink(zipname)
return(data)
} #readrdat
lsdata <- lapply(1:15, function(n) {
tryCatch(readrdat(n), error=function(e) NULL)
})
lapply(lsdata, is.null)

Related

Reading a vcf.bgz file in R

I downloaded some data from gnomad - https://gnomad.broadinstitute.org/downloads. it comes in the form of VCF.bgz file and I would like to read it as a vcf file.
i wrote some code:
install.packages("R.utils")
library("R.utils")
df=gunzip("gnomad.exomes.r2.1.1.sites.4.vcf.bgz", "gnomad.exomes.r2.1.1.sites.4.vcf")
install.packages("vcfR")
library("vcfR")
vc=read.vcfR("gnomad.exomes.r2.1.1.sites.4.vcf.bgz")
but it doesn't work -- it doesn't convert it to appropriate VCF file.
Warning message:
In file.remove(filename) :
cannot remove file 'gnomad.exomes.r2.1.1.sites.4.vcf.bgz', reason 'Permission denied'
Error in read.vcfR(df) :
File: gnomad.exomes.r2.1.1.sites.4.vcf does not appear to be a VCF file.
First line of file:
gnomad.exomes.r2.1.1.sites.4.vcf
Should begin with:
##fileformat=VCFv
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE, :
embedded nul(s) found in input
would appreciate any help, thank you:)

Loading rds from Dropbox links using readRDS( )

I have been using readRDS(gzcon(url("my dropbox links")))for a long time to load my saved .rds field from Dropbox without any issues. But ever wonder why readRDS("my dropbox links") does not do the same thing. I got an error like
Error in gzfile(file, "rb") : cannot open the connection
It seems like a fair, simple question, but I couldn't figure it myself. Many thanks in advance~

another alternative would be to first download the file and then read the RDS:
download.file("https:www.somesite.com/somefile.rds",
"data.rds",
method = "curl")
The readRDS - function checks if the argument is a connection and but doesn't create URLS itself:
> readRDS
function (file, refhook = NULL)
{
if (is.character(file)) {
con <- gzfile(file, "rb")
on.exit(close(con))
}
else if (inherits(file, "connection"))
con <- if (inherits(file, "url"))
gzcon(file)
else file
else stop("bad 'file' argument")
.Internal(unserializeFromConn(con, refhook))
}
<bytecode: 0x5648012c7c50>
<environment: namespace:base>
therefore the url - function is needed.
> link<-"https://www.google.com"
> inherits(link,"connection")
[1] FALSE
> link2<-url("https://www.google.com")
> inherits(link2,"connection")
[1] TRUE

Download files from FTP folder using Loop

I am trying to download all the files inside FTP folder
temp <- tempfile()
destination <- "D:/test"
url <- "ftp://XX.XX.net/"
userpwd <- "USER:Password"
filenames <- getURL(url, userpwd = userpwd,ftp.use.epsv = FALSE,dirlistonly = TRUE)
filenames <- strsplit(filenames, "\r*\n")[[1]]
When I am printing "filenames" I am getting all the file names which are inside the FTP folder - correct output till here
[1] "2018-08-28-00.gz" "2018-08-28-01.gz"
[3] "2018-08-28-02.gz" "2018-08-28-03.gz"
[5] "2018-08-28-04.gz" "2018-08-28-05.gz"
[7] "2018-08-28-08.gz" "2018-08-28-09.gz"
[9] "2018-08-28-10.gz" "2018-08-28-11.gz"
[11] "2018-08-28-12.gz" "2018-08-28-13.gz"
[13] "2018-08-28-14.gz" "2018-08-28-15.gz"
[15] "2018-08-28-16.gz" "2018-08-28-17.gz"
[17] "2018-08-28-18.gz" "2018-08-28-23.gz"
for ( i in filenames ) {
download.file(paste0(url,i), paste0(destination,i), mode="w")
}
I got this error
trying URL 'ftp://XXX.net/2018-08-28-00.gz'
Error in download.file(paste0(url, i), paste0(destination, i), mode = "w") :
cannot open URL 'ftp://XXX.net/2018-08-28-00.gz'
In addition: Warning message:
In download.file(paste0(url, i), paste0(destination, i), mode = "w") :
InternetOpenUrl failed: 'The login request was denied'
I modified the code to
for ( i in filenames )
{
#download.file(paste0(url,i), paste0(destination,i), mode="w")
download.file(getURL(paste(url,filenames[i],sep=""), userpwd =
"USER:PASSWORD"), paste0(destination,i), mode="w")
}
After that, I got this error
Error in function (type, msg, asError = TRUE) : RETR response: 550

Without a minimal, complete, and verifiable example it is a challenge to directly replicate your problem. Assuming the file names don't include the URL, you'll need to combine them to access the files.
download.file() requires a file to be read, an output file, as well as additional flags regarding whether you want a binary download or not.
For example, I have data from Alberto Barradas' Pokémon Stats kaggle.com data set stored on my Github site. To download some of the files to the test subdirectory of my R Working Directory, I can use the following code:
filenames <- c("gen01.csv","gen02.csv","gen03.csv")
fileLocation <- "https://raw.githubusercontent.com/lgreski/pokemonData/master/"
# use ./ for subdirectory of current directory, end with / to work with paste0()
destination <- "./test/"
# note that these are character files, so use mode="w"
for (i in filenames){
download.file(paste0(fileLocation,i),
paste0(destination,i),
mode="w")
}
...and the output:
The paste0() function concatenates text without spaces, which allows the code to generate a fully qualified path name for the url of each source file, as well as the subdirectory where the destination file will be stored.
To illustrate what's happening with paste0() in the for() loop, we can use message() to print to the R console.
> # illustrate what paste0() does
> for (i in filenames){
+ message(paste("Source is: ",paste0(fileLocation,i)))
+ message(paste("Destination is:",paste0(destination,i)))
+ }
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen01.csv
Destination is: ./test/gen01.csv
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen02.csv
Destination is: ./test/gen02.csv
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen03.csv
Destination is: ./test/gen03.csv
>

R: Error in file(con, "r") : cannot open the connection

i'm trying to run a API request for a number of parameters with the lapply function in R.
However, when i run this function, i get the error " Error in file(con, "r") : cannot open the connection"
Google suggests using setInternet2(TRUE) to fix this issue, however, i get the error: Error: 'setInternet2' is defunct.
See help("Defunct"
localisedDestinationNameForGivenLang <- function (LocationId) {
gaiaURL <- paste0("https://URL/",LocationId, "?geoVersion=rwg&lcid=", "1036",
"&cid=geo&apk=explorer")
print(LocationId)
localisation <- fromJSON(gaiaURL)
}
lapply(uniqueLocationId, localisedDestinationNameForGivenLang)
Can someone suggest a fix please?

Here's a sample of how you could identify which sites are throwing errors while still getting response from the ones that don't:
urls = c("http://citibikenyc.com/stations/test", "http://citibikenyc.com/stations/json")
grab_data <- function(url) {
out <- tryCatch(
{fromJSON(url)},
error=function(x) {
message(paste(url, x))
error_msg = paste(url, "threw an error")
return(error_msg)
})
return(out)
}
result <- lapply(urls, grab_data)
result will be a list that contains API response for urls that work, and error msg for those that don't.

Error / Exception Handling with empty files in R

I heard about exception handling first time in python two days ago and consequently I want to apply here in R. I had a look at a number of questions post either here in stack overflow or some other online Q&As but I am still really confused in using it.
I would really appreciate if someone can answer it with this simple example so later on I can apply it to my questions.
For example I have 3 data files with file names shown below; and the first file is a 0 bytes empty file. What I can do to continue run the loop for all files and the number extracted from the empty file can be expressed as NA?
> output_names_hdf5_list[1:5]
[1] "simulation-results fL=0.1,fks=1,fno=0.1,fnc=0.1,fr=0.1,fs=0.1.hdf5"
[2] "simulation-results fL=0.1,fks=1,fno=0.1,fnc=0.1,fr=0.1,fs=1.05.hdf5"
[3] "simulation-results fL=0.1,fks=1,fno=0.1,fnc=0.1,fr=0.1,fs=2.hdf5"
for (i in 1:5){
channelflow_outlet[,i]=h5read(paste(outputdir, output_names_hdf5_list[i], sep=""),"Channel")$Qc_out[460,][2:100]
}
With try function I can manage to run the program without stuck in an error message but when I replace the argunment with channelflow_outlet[,i]= h5read(....) inside try function, it just returns error.
for (i in 1:5){
try(h5read(paste(outputdir, output_names_hdf5_list[i], sep=""),"Channel")$Qc_out[460,][2:100])
}
Without error handling, it will have a error message like this.
> h5read(paste(outputdir, output_names_hdf5_list[1], sep=""),"Channel")$Qc_out[460,][2:100]
HDF5: unable to open file
Error in h5checktypeOrOpenLoc(file, readonly = TRUE) :
Error in h5checktypeOrOpenLoc(). File 'D:/Data/Mleonard/pytopkapi.staged.makefile/RunModel/Output/3x6-729-04072014/simulation-results fL=0.1,fks=1,fno=0.1,fnc=0.1,fr=0.1,fs=0.1.hdf5' is not a valid HDF5 file.
>

I hope my code helps. For those messages in the code, you can delete them if you want. They are here purely to help you see where it shows warning or error.
setwd("D:/Dropbox/Test/"); outputdir = "D:/Dropbox/Test/"
output_names_hdf5_list=c("simulation-results fL=0.1,fks=1,fno=1.05,fnc=1.05,fr=1.05,fs=1.05.hdf5",
"simulation-results fL=0.1,fks=1,fno=1.05,fnc=2,fr=1.05,fs=1.05.hdf5",
"simulation-results fL=0.1,fks=1,fno=2,fnc=1.05,fr=0.1,fs=1.05.hdf5",
"simulation-results fL=0.1,fks=1,fno=2,fnc=1.05,fr=2,fs=2.hdf5",
"simulation-results fL=0.5,fks=1,fno=2,fnc=2,fr=0.1,fs=1.05.hdf5")
channelflow_outlet = matrix(NA, nrow=100, ncol=5)
hdf5_list_reading_tool= function(output_names_hdf5_list) {
out = tryCatch(
{
message("This is the 'try' part")
h5read(paste(outputdir, output_names_hdf5_list, sep=""),"Channel")$Qc_out[460,][2:100]
},
error=function(cond) {
message("Here's the original error message:")
message(cond)
return(rep(NA,100))
},
warning=function(cond) {
message("Here's the original warning message:")
message(cond)
return(rep(NA,100))
},
finally={
message(paste("Processed URL:", output_names_hdf5_list))
message("Some other message at the end")
}
)
return(out)
}
channelflow_outlet=sapply(output_names_hdf5_list, hdf5_list_reading_tool)