Download .pdf file to R, getting error message - r

I'm having trouble download a .pdf from the internet into Rstudio. I would like to analyse the .pdf using the pdftools package. I have a directory called files that I want the .pdf to go to. I'm using this code.
download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files')
I get this error:
Warning messages:
1: In download.file("https://www2.gov.scot/Resource/Doc/352649/0118638.pdf", :
URL https://www2.gov.scot/Resource/Doc/352649/0118638.pdf: cannot open destfile 'files', reason 'Is a directory'
2: In download.file("https://www2.gov.scot/Resource/Doc/352649/0118638.pdf", :
download had nonzero exit status
Is there way to get around this message?

The destfile has to be the filename (not the directory name) for the downloaded file.
For example, if we were to download the file above and save it as "Commission.pdf" in the files folder we would do the following:
download.file(url='https://www2.gov.scot/Resource/Doc/352649/0118638.pdf',
destfile="files/Commission.pdf")
You're passing in file to the destfile, which prompts R to throw the error warning that the argument you specified is a directory.

You miss the function assignature. It is
download.file(url, destfile, ...)
Therefore, when you're using download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files'), you are downloading the file https://www2.gov.scot/Resource/Doc/352649/0118638.pdf and saving it with the name files.
What you need to do is to modify the second argument to encopass the complete file path. It can be something like this:
download.file('https://www2.gov.scot/Resource/Doc/352649/0118638.pdf', 'files/0118638.pdf')

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Download NASA satellite data using RCurl in R

I am trying to download a ncdf file using rCurl. Can anyone provide any advice on why this is not working?
require(RCurl)
require(ncdf4)
url <- "https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Seasonal_Climatology/4km/sst/"
filename <-"A20021722014263.L3m_SCSU_NSST_sst_4km.nc"
download.file(paste0(url, filename),destfile = paste0("~/Desktop/", filename), method="curl")
setwd("~/Desktop/")
files<-dir(pattern="*.nc")
f<-nc_open(files[1])
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open(files[1]) :
Error in nc_open trying to open file A20021722014263.L3m_SCSU_NSST_sst_4km.nc
It appears that the file downloaded is an error file in XML format? If you open it in Notepad, you'll see it contains stuff like
Sorry, an error has occurred. Use the back button to return to the previous page or go to the Ocean Color Home Page
Are you sure that the filename you're wanting to download actually exists in that URL?

Do you need Microsoft Office (Excel) to read .csv files with R?

I just downloaded R onto my computer, but when I try to read a .csv file I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '[filename].csv': No such file or directory
This happens both when I command the program to read the .csv or when I try to import it manually (although that error code is slightly different).
Could this be because I don't have Microsoft Office (and subsequently, Excel) on my computer? The .csv file exists and is not corrupt because I have no problem uploading the file to Google sheets and displaying the data there. Either that, or I didn't install R correctly. Any suggestions?
No, you don't need any special packages to read plain text CSV files in R.
Looking at your error message:
Warning message: In file(file, "rt") : cannot open file '[filename].csv': No such file or directory
I would guess that the problem is your path to the file in question. As a quick fix, you can try using the fully qualified path to your file. For example, on Windows you might try this:
data <- read.csv(file="c:/path/to/filename.csv", header=TRUE, sep=",")
If this works, then the issue was the location of your CSV file. Then, if you want to continue using a relative path, you'll have to figure out what that would be given your current R setup.

r unable to download file from server

I am trying to download a pdf file that is stored in an internal server.
The url for this file is like this below
file_location <- "file://dory.lisa.org/research/data/test.pdf"
I tried downloading this file using the download.file option
download.file(file_location, "test.pdf",method='curl')
and i am getting an error.
curl: (37) Couldn't open file /research/data/test.pdf
Warning message:
In download.file(file_location, "test.pdf", method = "curl") :
download had nonzero exit status
I tried
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')
And i have no problem downloading this file, but somehow it shows an error when I try to use the same approach to download a file on my server.
I'm guessing that the file does not exist at that location on your local drive. When I executed the couple of lines that downloaded from CRAN I get a pdf file in my User directory/folder. I then get success with this code:
url <- ('file://~/introductionToR.pdf')
download.file(url, 'NewintroductionToR.pdf')

Permission Denied Error when downloading a file

I am trying to download an excel zip file into a R Project folder that I have started. I'm new to R so I'm a little puzzled at the error message that I'm receiving.
The file is an excel file and so first I created a variable for the file:
excel2file="http://op.nsf.data/dataFiles/Housing2013EXCEL.zip"
Then I used the coding:
download.file(excel2file, destfile= "~/Home/Documents/Data")
I receive this error message:
Error in download.file(excel2file, destfile = "~/Home/Documents/Data") :
cannot open destfile '~/Home/Documents/Data', reason 'Permission denied'
I tried looking at other examples of permission denied and I think it may be my destination file but I am not sure why or the steps to trouble shoot it.
destfile should be a filename, not a directory. For example:
download.file(excel2file, destfile= "~/Home/Documents/Data/Housing2013EXCEL.zip")
Also, that URL doesn't seem to be valid, but that's a different (non-R) problem.
Make sure you don't have the file already open in the destfile, as the Permission denied error also comes from being unable to overwrite an open file.
This tripped me up for a while.
You could use the package readr and its function read_csv. It allows you to download the file, but it has to be a downloaded files such as a CSV file, something like this worked for me
library(readr)
newz <- read_csv("https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2020-financial-year-provisional/Download-data/annual-enterprise-survey-2020-financial-year-provisional-csv.csv")
View(newz)
about the "Permission Denied" I still doesn`t solved that problem
You can also use basename with paste, which would be useful if downloading a bunch of files.
For example:
(excel2file="http://op.nsf.data/dataFiles/Housing2013EXCEL.zip")
(file_name <- basename(excel2file))
download.file(excel2file, destfile= paste("~/Home/Documents/Data",file_name, sep="/"))
add a name in destfile like /downloadedfile.csv"

Resources