I am trying to fetch some data in R from a github repo, but the connection fails.
remote.file = function(URL) {
temporaryFile <- tempfile()
download.file(URL,destfile=temporaryFile, method="curl")
return( temporaryFile )
}
URL = "https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/ratings.dat"
Ratings = read.table( remote.file(URL), sep = ":", header=FALSE ),c(1,3,5,7)]
I get the following error:
Related
I am generating a function to download a bunch of CSV database from a "click and download" webpage. It was working wonderful:
mydownloadBCA <- function(start_date, end_date) {
start_date <- as.Date(start_date)
end_date <- as.Date(end_date)
dates <- as.Date("1970-01-01") + (start_date : end_date)
for(i in 1:length(dates)){
string_date <- as.character(dates[i])
myfile <- paste0("./BCA/BCA", string_date, ".csv")
myurl <- paste("https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20BCA%20MDA%20Dia%20", string_date, "%20v2017%2003%2022_09%2033%2019.csv", sep = "")
download.file(url = myurl, destfile = myfile, quiet = TRUE)
}
}
For a first "chunk" the url only varies given a date:
[2016-01-29] https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20SIN%20MDA%20Dia%202016-01-29%20v2017%2003%2022_10%2033%2019.csv
[2016-10-31]
https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20SIN%20MDA%20Dia%202016-10-31%20v2017%2003%2022_10%2033%2019.csv
Afterwards the webpage has been updated on a daily basis generating a changing url without a pattern.
After 2017-03-30 the url not only changed in an as.Date basis but in a numerical non-logical way. The problem is the last part "%XXXX%XXXX_XX%XXXX%XXXX.csv"
for example:
<<url>> = https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20BCA%20MDA
2017-03-30 <<url>>%20Dia%202017-03-30%20v2017%2003%2029_13%2029%2051.csv
2017-04-01 <<url>>%20Dia%202017-04-01%20v2017%2003%2031_13%2044%2042.csv
2017-04-02 <<url>>%20Dia%202017-04-02%20v2017%2004%2001_12%2057%2041.csv
## Problems here ^^^^^^^^^^^^^^^^^^^^^^
I tried to account for it with a loop but so far it has not been working:
mydownloadSIN <- function(start_date, end_date) {
start_date <- as.Date(start_date)
end_date <- as.Date(end_date)
dates <- as.Date("1970-01-01") + (start_date : end_date)
f <- factor(0:9)
number <- as.numeric(f)
for(i in 1:length(dates)){
for(j in 0:length(number)){
string_date <- as.character(dates[i])
X <- as.character(number[j])
myfile <- paste0("./SIN/SIN", string_date, ".csv")
myurl <- paste("https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20SIN%20MDA%20Dia%20",string_date,"%20v2017%",X,X,X,X,"%",X,X,X,X,"_",X,X,"%",X,X,X,X,"%",X,X,X,X,".csv", sep = "")
download.file(url = myurl, destfile = myfile, quiet = TRUE)
}
}
}
When trying using the function I get the following error:
Error in download.file(url = myurl, destfile = myfile, quiet = TRUE) :
cannot open URL
'https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20SIN%20MDA%20Dia%202017-03-24%20v2017%%_%%.csv'
In addition: Warning message: In download.file(url = myurl, destfile =
myfile, quiet = TRUE) : cannot open URL
'https://www.cenace.gob.mx/DocsMEM/OpeMdo/CantidAsig/MDA/ImportacionExportacion/Resultados_ImpExp%20SIN%20MDA%20Dia%202017-03-24%20v2017%%_%%.csv':
HTTP status was '400 Bad Request'
This is the "general" webpage where the user select the years they need and then click on the CSV archive:
https://www.cenace.gob.mx/SIM/VISTA/REPORTES/H_RepCantAsignadas.aspx?N=135&opc=divCssCantAsig&site=Cantidades%20asignadas/MDA/De%20Importación%20y%20Exportación&tipoArch=C&tipoUni=BCN&tipo=De%20Importación%20y%20Exportación&nombrenodop=MDA
Is there a way I can account for this change in the url for my function?
Thanks
I tried to retrieve data from sftp with the below code:
library(RCurl)
protocol <- "sftp"
server <- "xxxx#sftp.xxxx.com"
userpwd <- "xxx:yyy"
tsfrFilename <- "cccccc.tsv"
ouptFilename <- "out.csv"
opts = list(
#ssh.public.keyfile = "true", # file name
ssh.private.keyfile = "xxxxx.ppk",
keypasswd = "userpwd"
)
# Run #
## Download Data
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, .opts = opts, userpwd=userpwd)
and i received an error message:
Error in function (type, msg, asError = TRUE) : Authentication failure
What am I doing wrong?
Thanks
With a private key you do not need a password with you username. So your getURL statement will be:
data <- getURL(url = url, .opts = opts, username="username")
I had exactly the same problem and have just spent an hour trying different things out. What worked for me was changing the format of the private key to OpenSSH.
To do this, I used the key generator package puttygen. Go to the menu item "Conversions" to import the original private key and export to the OpenSSH format. I exported the converted key to the same folder that my original key was in with a new filename. I kept the *.ppk extension
Then I used the following commands:
opts <- list(
ssh.private.keyfile = "<path to my new OpenSSH Key>.ppk"
)
data <- getURL(url = URL, .opts = opts, username = username, verbose = TRUE)
This seemed to work fine.
I am working with AIMS model developed by APEC Climate center. The model downloads data from ftp server and then calls the LoadCmip5DataFromAdss function from datasource.R to load data into the model.
#do.call("LoadCmip5DataFromAdss", parameters)
On github I found the source code for LoadCmip5DataFromAdss which gives the path of an ftp server to download data
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
I want to get the data from a local directory instead of downloading because that takes a lot of time. How do I do that?
Where do I find the file containing LoadCmip5DataFromAdss on my PC, because in the setup only datasource.R is given.
All that function does is copy the zip file (cmip5_daily_ + whatever you specified for NtlCode + .zip) to the directory you specified for dbdir after it downloads it then unzips it and removes the ZIP file. Here's the whole function from rSQM:
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
srcfname <- paste(adss, fname, sep="")
dstfname <- paste(dbdir, "/", fname, sep = "")
download.file(srcfname, dstfname, mode = "wb")
unzip(dstfname, exdir = dbdir)
unlink(dstfname, force = T)
cat("CMIP5 scenario data at",NtlCode,"is successfully loaded.\n")
}
You can just do something like:
unzip(YOUR_LOCAL_NtlCode_ZIP_FILE, exdir = WHERE_YOUR_dbdir_IS)
vs use that function.
I'm trying to get lyrics from the chartlyrics API. I write an R function that works but not inside a loop. My script is:
library(httr)
library(RCurl)
library(XML)
df <- data.frame(artist = c('Led Zeppellin', 'Adele'), song = c('Rock´n roll', 'Hello'), stringsAsFactors = F)
make.querye <- function(xx) {
names_ok <- gsub(" ", "&", xx)
names_ok2 <- paste("\'", names_ok, "\'", sep = '')
querye <- paste("http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist=", names_ok[1],"&song=", names_ok[2], sep='')
data <- GET(querye)
aa <- content(data, "text")
doc <- htmlParse(aa, asText=TRUE)
plain.text <- xpathSApply(doc, "//lyric//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
if (length(plain.text)==0) {
plain.text2 <- 'Lyrics not found'
} else {
plain.text2 <- iconv(plain.text, from = "UTF-8", to = "latin1", sub = NA, mark = TRUE, toRaw = FALSE)
}
return(plain.text2)
}
names <- c(df$artist[1], df$song[1])
make.querye(names) #- it works
names <- c(df$artist[2], df$song[2])
make.querye(names) #- it also works
But my function doesn't work inside a loop
for (ii in 1:2){
names <- c(df$artist[ii], df$song[ii])
print(names)
make.querye(names)
}
I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer
The RETRY function was introduced in June 2016 and allows you to retry a request multiple times until it succeeds.
Use it with the parameter verb = "GET" instead of directly using GET, i.e.,
data <- RETRY("GET", query)
You can also define the maximum number of attempts with the times parameter.
I tried to download several thousand SEC files via the command:
download.file(link, folder, method = "internal", quiet = FALSE,
mode = "wb", cacheOK = TRUE,
extra = getOption("download.file.extra"))
After a while I get the following message that I cannot interpret:
https://dl.dropboxusercontent.com/u/4149177/Capture.PNG
It seems that the files are downloaded successfully, however I want to know what the message means.
Can you tell me what R tries to tell me?
Full code:
setInternet2(use = FALSE)
destinationfolder <- getwd()
startyear <- 2000
stopyear <- 2000
startquarter <- 1
stopquarter <- 2
filetype <- "10-Q"
func.getsecindexfile<- function(year, quarter) {
#### download the zipped index file from the SEC website
tf <- tempfile()
result <- try(download.file(url=paste("http://www.sec.gov/Archives/edgar/full-index/", year,"/QTR", quarter, "/company.zip",sep=""), destfile=tf))
#### if we didn't encounter and error downloading the file, parse it and return as a R data frame
if (!inherits(result, "try-error")) {
#### small function to remove leading and trailing spaces
trim <- function (string) {
string <- enc2native(string)
gsub("^\\s*(.*?)\\s*$","\\1", string, perl=TRUE)
}
#### read the downloaded file
raw.data <- readLines(con=(zz<- unz(description=tf, filename="company.idx")))
close(zz)
#### remove the first 10 rows
raw.data <- raw.data[11:length(raw.data)]
#### parse the downloaded file and return the extracted data as a data frame
company_name <- trim(substr(raw.data,1,62))
form_type <- trim(substr(raw.data,63,74))
cik <- trim(substr(raw.data,75,86))
date_filed <- as.Date(substr(raw.data,87,98))
file_name <- trim(substr(raw.data,99,150))
rm(raw.data)
return(data.frame(company_name, form_type, cik, date_filed, file_name))
}
else {return(NULL)}
}
#### add index files to database
func.addindexfiletodatabase <- function(data){
if (is.null(data)) return(NULL)
rs <- dbWriteTable(sqlite, "filings", data, append=TRUE)
return(rs)
}
dbGetQuery(sqlite, "DROP TABLE IF EXISTS filings")
for (year in startyear:stopyear){
for (quarter in startquarter:stopquarter){
func.addindexfiletodatabase(func.getsecindexfile(year, quarter))
}
}
selection <- paste("SELECT * FROM filings WHERE form_type IN ('", filetype, "')", sep = "")
index <- dbGetQuery(sqlite, selection)
pre <- c("ftp://ftp.sec.gov/")
index <- cbind(index,pre)
temp <- paste(index$pre, index$file_name, sep = "")
index <- cbind(index,temp)
index$name_new <- index$temp
index$name_new <- gsub("ftp://ftp.sec.gov/edgar/data/","",index$name_new)
index$name_new <- gsub("/","-",index$name_new)
name <- paste(index$name_new)
link <- paste(index$temp, sep = "")
index$pre <- NULL
index$temp <- NULL
#### define download function
func.download_files <- function(link,name) {
folder <- paste(destinationfolder, "\\", name, sep="")
download.file(link, folder, method="internal", quiet = FALSE, mode = "wb", cacheOK = TRUE, extra = getOption("download.file.extra"))
}
#### download the files
mapply(FUN = func.download_files,link=link,name=name)
The "error" was a notification that the files was successfully downloaded. Thank your for your help.