Error downloading GPL file with getGEO - r

Using OSX 10.11 and R 3.3.0 I get this error using GEOQuery package:
library(GEOquery)
GSE56045 <- getGEO("GSE56045")
It downloads the GSE file but not the GPL:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL10558&form=text&view=full'

It looks like the GPL file was redirected and the download method auto set in GEOquery fails to follow the redirect: setting options('download.file.method.GEOquery'='auto')
I was able to get it working by running this in R: options('download.file.method.GEOquery' = 'libcurl')
Also, I had to delete the old downloaded GPL file - which was just the redirect message. It's easier to just set a download directory instead of finding the temp file, using destdir = for the getGEO command.

Related

Error in openxlsx::read.xlsx/download.file after updating R to 4.2.1

After update my Rgui to 4.2.1 version I found an enconding issue which was apparently solved by Problem with opening scripts in updated version of R. Briefly: my R files were opening empty and the solution was simply change default encoding to UTF-8.
But other issue came with (or just was there from the beginning). When I tried to read some xlsx directly from url, I got this error message:
In download.file(url = xlsxFile, destfile = tmpFile, cacheOK = FALSE, :
URL 'https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.xlsx': Timeout of 100000 seconds was reached
When I went back to R 4.0.5, the problem disappeared!
Reproducible example
url_maddison = "https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.xlsx"
read.xlsx(url_maddison, sheet = 3)

downloading csv file works via libcurl yet does not via curl method

OS: Win 7 64 bit
RStudio Version 1.1.463
As per Getting and Cleaning Data course, I attempted to download a csv file with method = curl:
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./cameras.csv", method = "curl")
Error in download.file(fileUrl, destfile = "./cameras.csv", method =
"curl") : 'curl' call had nonzero exit status
However, method = libcurl resulted a successful download:
download.file(fileUrl, destfile = "./cameras.csv", method = "libcurl")
trying URL
'https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD'
downloaded 9443 bytes
changing from *http***s** to http produced exactly the same results for curl and libcurl, respectively.
Is there anyway to make this download work via method = curl as per the course?
Thanks
As you can see from ?download.file:
For methods "wget" and "curl" a system call is made to the tool given
by method, and the respective program must be installed on your system
and be in the search path for executables. They will block all other
activity on the R process until they complete: this may make a GUI
unresponsive.
Therefore, you should install curlfirst.
See this How do I install and use curl on Windows? to learn how.
Best!
I believe there were a few issues here:
Followed the steps in the link quoted by #JonnyCrunch
a) Reinstalled Git for windows;
b) added C:\Program Files\Git\mingw64\bin\ to the 'PATH' variable;
c) Disabled Use Internet Explorer library/proxy for HTTP in RStudio in: Tools > Options > Packages
d) Attempted steps in 'e)' below and added data.baltimorecity.gov
website to exclusions as per Kaspersky anti-virus' prompt;
e) Then in RStudio:
options(download.file.method = "curl")
download.file(fileUrl, destfile="./data/cameras.csv")
Success!
Thank you

Extracting file from LZMA archive with R

I am trying to extract a file from a LZMA archive downloaded from an API containing JSON files, using R. On my computer I can extract the file manually in Windows Explorer with no problems.
Here's my code currently (API details removed):
tempFile <- tempfile()
destDir <- "extracted-files"
if (!dir.exists(destDir)) dir.create(destDir)
download.file("api_url.tar.xz", destfile = tempFile)
untar(tempFile, exdir = destDir)
When I attempt to extract the file, I receive the following error messages:
/usr/bin/tar: This does not look like a tar archive
/usr/bin/tar: Skipping to next header
/usr/bin/tar: Exiting with failure status due to previous errors
Warning messages:
1: running command 'tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"' had status 2
2: In untar(tempFile, exdir = destDir) :
‘tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"’ returned error code 2
I am using Windows 10 with R version 3.3.1 (2016-06-21).
Using library(archive) one can also read in a particular csv file within an archive without having to UNZIP it first :
library(archive)
library(readr)
read_csv(archive_read("api_url.tar.xz", file = 1), col_types = cols()) # adjust file=XX as appropriate
This is quite a bit faster.
To unzip everything one can use
archive_extract("api_url.tar.xz", dir=XXX)
That worked very well for me & is faster than the unbuilt untar(). It also works on all platforms. It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.
SOLVED:
While it seemed to work perfectly on Mac, for it to work on Windows you need to open the compressed .xz file connection for reading in binary mode, before passing it to untar():
download.file(url, tmp)
zz <- xzfile(tmp, open = "rb")
untar(zz, exdir = destDir)
An alternative, and even simpler solution is to specify the 'mode' parameter for download.file() as follows:
download.fileurl, destfile = tmp, mode = "wb")

Unable to downlad file to load data on R

OS- Windows 7
R version 3.0.3
What I typed on R console :
if(!file.exists("data")){dir.create("data")}
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx",method="curl")
dateDownloaded <- date()
What I got on R console
Warning messages:
1: running command 'curl "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD" -o "./data/cameras.xlsx"' had status 127
2: In download.file(fileUrl, destfile = "./data/cameras.xlsx", method = "curl") :
download had nonzero exit status
How do I make it right ?
Your code works for me. You probably don't have cURL installed on your system. See ?download.file:
For methods "wget", "curl" and "lynx" a system call is made to the tool given by method, and the respective program must be installed on your system and be in the search path for executables.
Install cURL and make sure it is in your path.
After removing the method="curl" parameter, it works well on Windows.

Error when using getGEO() in package GEOquery

I'M running the following code in R:
library(GEOquery)
mypath <- "C:/Users/Farzin/Desktop/BIOC"
GDS1 <- getGEO('GDS1',destdir=mypath)
But I'm getting the following error:
Using locally cached version of GDS1 found here:
C:/Users/Farzin/Desktop/BIOC/GDS1.soft.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument
Could anyone please tell me how I could get rid of this error?
I have had the same error using GEOquery (version 2.23.5) with R and Bioconductor from ubuntu (12.04), whatever GDS file I queried. Could it be that the GEOquery package is faulty ?
In my experience, getGEO is extremely finicky. I commonly experience issues connecting to the GEO server. If this happens during download, getGEO leaves a partial file. But since the partial file is there, when you try to re-download, it will use this cached, partially downloaded file, and run into the error you see (which you want, because its not the full file).
To solve this, delete the cached SOFT file and retry the download.

Resources