Downloading a pdf using R generates corrupted file - r

I'd like to download the pdf file from this website using R. The file is being downloaded but I can't open it as it says the pdf is corrupted. Here is my code:
url <- "https://www.bchousing.org/research-centre/housing-data/new-homes-data"
download.file(url, 'New Homes Registry Report - June 2020.pdf', mode="wb")
Can you please let me know the issue?

Related

How to read csv from a zip file?

I am trying to automate fetch of chat transcripts using an API provided by the vendor. On a successful request to the API, the response contains a link from which the chat transcripts can be downloaded as a zip containing 1 csv file with the required data.
Following the steps in the link here, I was able to download the zip successfully from the link in R and store it in the temp folder. However I wasn't able to extract the csv from the zip file
temp = tempfile(pattern = "", fileext = ".zip")
download.file(download_link,temp, mode = "wb")
file_name <- as.character(unzip(temp, list = TRUE)$Name)
con <- unz(temp,file_name)
chatsData <- read.csv(con, header = T)
I received the following error on the last line-
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot open zip file 'C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpqWLYGf\file4a5435b13659:2021-04-05T10:00_2021-04-06T10'
On checking the temp location, I was able to locate, unzip the file and read its content using WinRar. Just clueless as to why this cant be replicated in code in R.
You can download a sample of the zipfile that I am trying to extract the csv from the following link
There is a special package on CRAN that brings everything necessary to zip and unzip archives:
https://cran.r-project.org/package=zip
If you are sure you downloaded your zip file to a local directory you might be able to use unzip() function this package provides to extract your desired CSV.
You could download your file, unzip it, read the contained csv and delete both the original zip and the csv if you want to keep your harddrive "clean"...

Download NASA satellite data using RCurl in R

I am trying to download a ncdf file using rCurl. Can anyone provide any advice on why this is not working?
require(RCurl)
require(ncdf4)
url <- "https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Seasonal_Climatology/4km/sst/"
filename <-"A20021722014263.L3m_SCSU_NSST_sst_4km.nc"
download.file(paste0(url, filename),destfile = paste0("~/Desktop/", filename), method="curl")
setwd("~/Desktop/")
files<-dir(pattern="*.nc")
f<-nc_open(files[1])
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open(files[1]) :
Error in nc_open trying to open file A20021722014263.L3m_SCSU_NSST_sst_4km.nc
It appears that the file downloaded is an error file in XML format? If you open it in Notepad, you'll see it contains stuff like
Sorry, an error has occurred. Use the back button to return to the previous page or go to the Ocean Color Home Page
Are you sure that the filename you're wanting to download actually exists in that URL?

Download .pdf file to R, getting error message

I'm having trouble download a .pdf from the internet into Rstudio. I would like to analyse the .pdf using the pdftools package. I have a directory called files that I want the .pdf to go to. I'm using this code.
download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files')
I get this error:
Warning messages:
1: In download.file("https://www2.gov.scot/Resource/Doc/352649/0118638.pdf", :
URL https://www2.gov.scot/Resource/Doc/352649/0118638.pdf: cannot open destfile 'files', reason 'Is a directory'
2: In download.file("https://www2.gov.scot/Resource/Doc/352649/0118638.pdf", :
download had nonzero exit status
Is there way to get around this message?
The destfile has to be the filename (not the directory name) for the downloaded file.
For example, if we were to download the file above and save it as "Commission.pdf" in the files folder we would do the following:
download.file(url='https://www2.gov.scot/Resource/Doc/352649/0118638.pdf',
destfile="files/Commission.pdf")
You're passing in file to the destfile, which prompts R to throw the error warning that the argument you specified is a directory.
You miss the function assignature. It is
download.file(url, destfile, ...)
Therefore, when you're using download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files'), you are downloading the file https://www2.gov.scot/Resource/Doc/352649/0118638.pdf and saving it with the name files.
What you need to do is to modify the second argument to encopass the complete file path. It can be something like this:
download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files/0118638.pdf')

r unable to download file from server

I am trying to download a pdf file that is stored in an internal server.
The url for this file is like this below
file_location <- "file://dory.lisa.org/research/data/test.pdf"
I tried downloading this file using the download.file option
download.file(file_location, "test.pdf",method='curl')
and i am getting an error.
curl: (37) Couldn't open file /research/data/test.pdf
Warning message:
In download.file(file_location, "test.pdf", method = "curl") :
download had nonzero exit status
I tried
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')
And i have no problem downloading this file, but somehow it shows an error when I try to use the same approach to download a file on my server.
I'm guessing that the file does not exist at that location on your local drive. When I executed the couple of lines that downloaded from CRAN I get a pdf file in my User directory/folder. I then get success with this code:
url <- ('file://~/introductionToR.pdf')
download.file(url, 'NewintroductionToR.pdf')

R download.file, downloading excel file does not work

I try to download an excel file using download.file().
If I go directly to the link using the browser, I can download the file without problems.
However, using download.file does only download a broken file with Excel error: "The file you are trying to open is in a different format than specified by the file extension."
Here is my code:
url <- "http://obieebr.banrep.gov.co/analytics/saw.dll?Download&Format=excel2007&Extension=.xlsx&BypassCache=true&path=%2Fshared%2fSeries%20Estad%c3%adsticas%2F1.%20Tasa%20Interbancaria%20%28TIB%29%2F1.1.TIB_Serie%20hist%C3%B3rica%20IQY&lang=es&NQUser=publico&NQPassword=publico&SyncOperation=1"
download.file(url, destfile = paste0(base_dir, "test.xls"), mode = "wb", method="libcurl")
Any ideas how to download this file?
Many thanks for your help!
Try this, it works for me:
download.file(url,destfile = "./second.xlsx",mode = "wb")
The file you are trying to download is simply not an excel file. Actually what you obtain is an html file (try to change the file extension to '.html', then open in your browser). So your code is not the problem.

Resources