I am trying to import a RDS file into RStudio in Windows, I tried following this example, which is for Rdata, and I tried both methods:
Method 1:
githubURL <- ("https://github.com/derek-corcoran-barrios/LastBat/blob/master/best2.My.Lu2.rds")
BestMyyu <- readRDS(url(githubURL))
Method 2:
githubURL <- ("https://github.com/derek-corcoran-barrios/LastBat/blob/master/best2.My.Lu2.rds")
download.file(githubURL,"best2.My.Lu2.rds")
BestMyyu <- readRDS("best2.My.Lu2.rds")
I've looked for other threads and I have not found any other example
In 2nd method you just need to add method="curl" and also change the url to point to raw (Download link on the page)
githubURL <- ("https://raw.githubusercontent.com/derek-corcoran-barrios/LastBat/master/best2.My.Lu2.rds")
download.file(githubURL,"best2.My.Lu2.rds", method="curl")
BestMyyu <- readRDS("best2.My.Lu2.rds")
If you don't have curl installed, you can get it from here
Related
I want to use GET() function from httr package, because this is just an example file and in the original file I need to write in user name and password i.e.
library(httr)
filename<-"filename_in_url.xls"
URL <- "originalurl"
GET(URL, authenticate("usr", "pwd"), write_disk(paste0("C:/Temp/temp/",filename), overwrite = TRUE))
As a test, I tried to import one of the files from I want to import one of the files from https://www.nordpoolgroup.com/historical-market-data/ and do not save it to the disk, but save it to the environment in order to see the data. However, it also does not work.
library(XML)
library(RCurl)
excel <- readHTMLTable(htmlTreeParse(getURL(paste("https://www.nordpoolgroup.com/4a4c6b/globalassets/marketdata-excel-files/elspot-prices_2021_hourly_eur.xls")), useInternalNodes=TRUE))[[1]]
Or if there are other ways how to import data (functions where login information can be as an input)m it will be great to see them
I need to download a file, save it in a folder while keeping the original filename from the website.
url <- "http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1"
From a web browser, if you click on that link, you get to download an excel file with this filename:
AfiliadosMuni-02-2015.xlsx
I know I can easily download it with the command download.file in R like this:
download.file(url, "test.xlsx", method = "curl")
But what I really need for my script is to download it keeping the original filename intact. I also know I can do this with curl from my console like this.
curl -O -J $"http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1"
But, again, I need this within an R script. Is there a way similar to the one above but in R? I have looked into the RCurl package but I couldn't find a solution.
You could always do something like:
library(httr)
library(stringr)
# alternate way to "download.file"
fil <- GET("http://www.seg-social.es/prdi00/idcplg?IdcService=GET_FILE&dID=187112&dDocName=197533&allowInterrupt=1",
write_disk("tmp.fil"))
# get what name the site suggests it shld be
fname <- str_match(headers(fil)$`content-disposition`, "\"(.*)\"")[2]
# rename
file.rename("tmp.fil", fname)
I think basename() would be the simplest option https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/basename
e.g.
download.file(url, basename(url))
I am getting an error from fread:
Internal error: ch>eof when detecting eol
when trying to read a csv file downloaded from an https server, using R 3.2.0. I found something related on Github, https://github.com/Rdatatable/data.table/blob/master/src/fread.c, but don't know how I could use this, if at all. Thanks for any help.
Added info: the data was downloaded from here:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
then I used
download.file(fileURL, "Idaho2006.csv", method = "Internal")
The problem is that download.file doesn't work with https with method=internal unless you're on Windows and set an option. Since fread uses download.file when you pass it a URL and not a local file, it'll fail. You have to download the file manually then open it from a local file.
If you're on Linux or have either of the following already then do method=wget or method=curl instead
If you're on Windows and don't have either and don't want to download them then do setInternet2(use = TRUE) before your download.file
http://www.inside-r.org/r-doc/utils/setInternet2
For example:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
download.file(fileURL, tempf, method = "curl")
DT <- fread(tempf)
unlink(tempf)
Or
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
setInternet2 = TRUE
download.file(fileURL, tempf)
DT <- fread(tempf)
unlink(tempf)
fread() now utilises curl package for downloading files. And this seems to work just fine atm:
require(data.table) # v1.9.6+
fread(fileURL, showProgress = FALSE)
The easiest way to fix this problem in my experience is to just remove the s from https. Also remove the method you don't need it. My OS is Windows and i have tried the following code and works.
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileURL, "Idaho2006.csv")
I run an automated script to download 3 .xls files from 3 websites every hour. When I later try to read in the .xls files in R to further work with them, R produces the following error message:
"Error: IOException (Java): block[ 2 ] already removed - does your POIFS have circular or duplicate block references?"
When I manually open and save the .xls files this problem doesn't appear anymore and everything works normal, but since the total number of files is increasing with 72 every day this is not a nice work around.
The script I use to download and save the files:
library(httr)
setwd("WORKDIRECTION")
orig_wd <- getwd()
FOLDERS <- c("NAME1","NAME2","NAME3") #representing folder names
LINKS <- c("WEBSITE_1", #the urls from which I download
"WEBSITE_2",
"WEBSITE_3")
NO <- length(FOLDERS)
for(i in 1:NO){
today <- as.character(Sys.Date())
if (!file.exists(paste(FOLDERS[i],today,sep="/"))){
dir.create(paste(FOLDERS[i],today,sep="/"))
}
setwd(paste(orig_wd,FOLDERS[i],today,sep="/"))
dat<-GET(LINKS[i])
bin <- content(dat,"raw")
now <- as.character(format(Sys.time(),"%X"))
now <- gsub(":",".",now)
writeBin(bin,paste(now,".xls",sep=""))
setwd(orig_wd)
}
I then read in the files with the following script:
require(gdata)
require(XLConnect)
require(xlsReadWrite)
wb = loadWorkbook("FILEPATH")
df = readWorksheet(wb, "Favourite List" , header = FALSE)
Does anybody have experience with this type of error, and knows a solution or workaround?
The problem is partly resolved by using the readxl package available in the CRAN library. After installation files can be read in with:
library(readxl)
read_excel("PathToFile")
The only problem is, that the last column is omitted while reading in. If I find a solution for this I'll update the awnser.
I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon
You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.