I want access files on the internet, but I get the following error message:
Error in function (type, msg, asError = TRUE) : Access denied: 530
In addition: Warning messages:
1: In strsplit(str, "\\\r\\\n") : input string 1 is invalid in this locale
This is my code from this post:
library(RCurl)
url<'ftp://ftp.address'
userpwd <- "user:password"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv=FALSE, dirlistonly = TRUE)
Any idea how to solve this?
Thanks a lot for your help!
Try:
library(RCurl)
filenames <- getURL(url="ftp://user:password#ftp.address",ftp.use.epsv=FALSE, dirlistonly = FALSE)
Related
I am getting this error when i run this code below to download data
Error in function (type, msg, asError = TRUE) : Failed to connect to course1.winona.edu port 80: Timed out
Any help would be appreciated
My code
library(RCurl)
urlfile <-'http://course1.winona.edu/bdeppa/Stat%20425/Data/Boston_Housing.csv'
downloaded <- getURL(urlfile, ssl.verifypeer=FALSE)
connection <- textConnection(downloaded)
dataset <- read.csv(connection, header=FALSE)
I don't think you need anything beyond read.csv. This works for me:
urlfile <-'http://course1.winona.edu/bdeppa/Stat%20425/Data/Boston_Housing.csv'
dataset <- read.csv(urlfile)
From the IMF I want to read a .xls file from an URL directly into R, but all attempts fail so far. Weirdly, I can download the file manually or by download.file() and open it without problems in Microsoft Outlook or in a text editor. However, even then I can't read the data into R.
I always try with both https and http.
myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
1. Classic approach – fails.
imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
2. Several packages – fails.
imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument
imf <- gdata::read.xls(myUrl2) # <---------------------------------------------- THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
3. Tempfile approach – fails.
temp <- tempfile()
download.file(myUrl, temp) # THIS WORKS...
## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
# even not...
tmp1 <- readLines(temp)
# Warning message:
# In readLines(temp) :
# incomplete final line found on
# 'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
# chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject
# Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...
4. SDMX
I also tried the SDMX the IMF offer, but also without success. Probably this would be a more sophisticated approach, but I never used SDMX.
link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"
temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE) :
# Could not resolve host: C
# imf <- rsdmx::readSDMX(unzip(temp)) # runs forever and crashes R
unlink(temp)
Now... does anybody know what's going on, and how I may load the data into R?
Why not just use fill=TRUE?
imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)
from ?read.table
fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.
I am trying to download all .gz files from this link:
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/
So far I tried this and I am not getting any results:
require(RCurl)
url= "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\r\n")
filenames = unlist(filenames)
I am getting this error:
Error in function (type, msg, asError = TRUE) :
Operation timed out after 300552 milliseconds with 0 out of 0 bytes received
Can someone please help with this?
Thanks
EDIT:
I tried to run with with filenames provided to me bellow so in my r script I have:
require(RCurl)
my_url <-"ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
my_filenames= c("bed_chr_11.bed.gz", ..."bed_chr_9.bed.gz.md5")
my_filenames <- strsplit(my_filenames, "\r\n")
my_filenames = unlist(my_filenames)
for(my_file in my_filenames){
download.file(paste0(my_url, my_file), destfile = file.path('/mydir', my_file))
}
And when I run the script I get these warnings:
trying URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz'
Error in download.file(paste0(my_url, my_file), destfile = file.path("/mydir", :
cannot open URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz'
In addition: Warning message:
In download.file(paste0(my_url, my_file), destfile = file.path("/mydir", :
URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz': status was 'Timeout was reached'
Execution halted
The file names you're trying to access are
filenames <- c("bed_chr_11.bed.gz", "bed_chr_11.bed.gz.md5", "bed_chr_12.bed.gz",
"bed_chr_12.bed.gz.md5", "bed_chr_13.bed.gz", "bed_chr_13.bed.gz.md5",
"bed_chr_14.bed.gz", "bed_chr_14.bed.gz.md5", "bed_chr_15.bed.gz",
"bed_chr_15.bed.gz.md5", "bed_chr_16.bed.gz", "bed_chr_16.bed.gz.md5",
"bed_chr_17.bed.gz", "bed_chr_17.bed.gz.md5", "bed_chr_18.bed.gz",
"bed_chr_18.bed.gz.md5", "bed_chr_19.bed.gz", "bed_chr_19.bed.gz.md5",
"bed_chr_20.bed.gz", "bed_chr_20.bed.gz.md5", "bed_chr_21.bed.gz",
"bed_chr_21.bed.gz.md5", "bed_chr_22.bed.gz", "bed_chr_22.bed.gz.md5",
"bed_chr_AltOnly.bed.gz", "bed_chr_AltOnly.bed.gz.md5", "bed_chr_MT.bed.gz",
"bed_chr_MT.bed.gz.md5", "bed_chr_Multi.bed.gz", "bed_chr_Multi.bed.gz.md5",
"bed_chr_NotOn.bed.gz", "bed_chr_NotOn.bed.gz.md5", "bed_chr_PAR.bed.gz",
"bed_chr_PAR.bed.gz.md5", "bed_chr_Un.bed.gz", "bed_chr_Un.bed.gz.md5",
"bed_chr_X.bed.gz", "bed_chr_X.bed.gz.md5", "bed_chr_Y.bed.gz",
"bed_chr_Y.bed.gz.md5", "bed_chr_1.bed.gz", "bed_chr_1.bed.gz.md5",
"bed_chr_10.bed.gz", "bed_chr_10.bed.gz.md5", "bed_chr_2.bed.gz",
"bed_chr_2.bed.gz.md5", "bed_chr_3.bed.gz", "bed_chr_3.bed.gz.md5",
"bed_chr_4.bed.gz", "bed_chr_4.bed.gz.md5", "bed_chr_5.bed.gz",
"bed_chr_5.bed.gz.md5", "bed_chr_6.bed.gz", "bed_chr_6.bed.gz.md5",
"bed_chr_7.bed.gz", "bed_chr_7.bed.gz.md5", "bed_chr_8.bed.gz",
"bed_chr_8.bed.gz.md5", "bed_chr_9.bed.gz", "bed_chr_9.bed.gz.md5"
)
The files are big, so I didn't check that this whole loop, but this worked at least for the first file. Add this to the end of your code.
my_url <- 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/'
for(my_file in filenames){ # loop over the files
# download each file, saving in a directory that you need to create on your own computer
download.file(paste0(my_url, my_file), destfile = file.path('c:/users/josep/Documents/', my_file))
}
When I want to use the getGEO function to download GSE dataset with this code:
series <- "GSE85358"
gset <- getGEO(series , GSEMatrix =TRUE, AnnotGPL=TRUE, destdir = "....." )
I encounter this error:
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2553/matrix/ Error
in function (type, msg, asError = TRUE) : error:1407742E:SSL
routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
How can i fix it?
The following R code works fine from my Windows 8 laptop:
> inst<- "https://www.sec.gov/Archives/edgar/data/51143/000104746916010329/ibm-20151231.xml"
> options(stringsAsFactors = FALSE)
> xbrl.vars <- xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = NULL, verbose=TRUE)
However, when I attempt to run it from my Ubuntu 16.04 machine, I receive the following output:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, method = "auto", quiet = !verbose) :
cannot download all files
In addition: Warning message:
In download.file(file, cached.file, method = "auto", quiet = !verbose) :
URL 'https://www.sec.gov/Archives/edgar/data/51143/000104746916010329/ibm-20151231.xsd': status was '404 Not Found'
It's finding the initial xml file but then cannot find the referenced schemas. Any help would be appreciated. Thanks in advance.