R markdown cannot open URL when using download.file - r

*Note this problem only occurs on Windows.
I have the following code that runs properly out of a normal script or the console:
tdir <- tempdir()
stateurl <- "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip"
if(file.exists(paste(tdir,"/cb_2018_us_state_500k.shp",sep=""))==F){
download.file(stateurl, destfile = file.path(tdir, "States.zip"))
unzip(file.path(tdir,"States.zip"),exdir=tdir)}
But when placing that same script in a chunk and trying to knit to HTML in Rmarkdown, I am left with the warning "could not open URL connection."
I am lost as to the potential issue why something simple like downloading a file would run in the console but not in RMarkdown.

I could reproduce the error about 50% of the time with the provided code without obvious pattern (i.e. repeateadly running "Knit to HTML" from the same session will randomly fail/work).
For me, the problem goes away if I explicitly specify method = "libcurl" as argument to download.file (instead of the default method = "auto", which uses "wininet" on Windows)
tdir <- tempdir()
stateurl <- "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip"
if(file.exists(paste(tdir,"/cb_2018_us_state_500k.shp",sep=""))==F){
download.file(stateurl, destfile = file.path(tdir, "States.zip"), method = "libcurl")
unzip(file.path(tdir,"States.zip"),exdir=tdir)}
With this "Knit to HTML" is working consistently (at least for my 10+ tests).

Related

Problem with saving kable table (install_phantomjs)

Let's consider very simple table created by kable
library(knitr)
library(kableExtra)
x <- data.frame(1:3, 2:4, 3:5)
x <- kable(x, format = "pipe", col.names = c("X_1", "X_2", "X_3"), caption = "My_table")
I want to save this table into .pdf format
x %>% save_kable("My_table.pdf")
But I get error:
PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
However, when trying to install it by proposed command:
webshot::install_phantomjs()
I get error:
Error in utils::download.file(url, method = method, ...) :
cannot open URL 'https://github.com/wch/webshot/releases/download/v0.3.1/phantomjs-2.1.1-windows.zip'
So my question is - Is there any possbility to save kable table without using phanomjs?
The command works for me and the URL is also available.
I suspect that the file (it's a .zip file) is being blocked by your firewall or anti-virus software.

How to avoid a rmd file to execute the entire R script every time I knit it to pdf?

Here is the question:
In file.r I ran an extensive analysis based on a huge dataset.
Every time I open the file I just need to load the libraries and everything is ready.
I don't need to download anymore any of the dataset inputs I need.
Now I have created a RMD file.rmd with the same code of file.r to present its findings.
I'm trying to get a preview of how the pdf will look like.
The problem is that when I click "Knit to pdf", it starts to download all the packages and datasets again. I have to wait hours to see the effects of small changes in code.
And there is more:
Some objects created in R file simply are not working in the rmd file.
Ex: in R file I coded:
edx2 <- edx2 %>% mutate(timeRr = yearRating - release)
When I try to run the same code in the rmd file I get the message:
Error in Func(x[[i]],...) : object 'timeRr' not found calls: f -> scales_add_defaults - > lapply - > fun
The same libraries loaded in both files (r and rmd)
What am I doing wrong?
1) At the end of the data analysis (file.R), save the data you need for the Notebook in a .RDS file.
For example, if you generated 3 results : res1, res2 and res3
results <- list(res1 = res1, res2 = res2, res3 = res3)
saveRDS(file = 'results.RDS', results)
2) instead of sourcing the analysis script, just read the results in the Notebook (.Rmd)
data <- readRDS('results.RDS')
# Results available for further use in the Notebook
data$res1
data$res2
data$res3
The error you get with edx2 is probably due to the fact that a new session is opened during generation of a notebook : are you sure that file.R really generates edx2, or is it only available in your current session?

r what doesn't curl_download like about a filename

I want to download some files (NetCDF although I don't think that matters) from a website
and write them to a specified data directory on my hard drive. Some code that illustrates my problems follows
library(curl)
baseURL <- "http://gsweb1vh2.umd.edu/LUH2/LUH2_v2f/"
fileChoice <- "IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc"
destDir <- paste0(getwd(), "/data-raw/")
url <- paste0(baseURL, fileChoice)
destfile <- paste0(destDir, "test.nc")
curl_download(url, destfile) # this one works
destfile <- paste0(destDir, fileChoice)
curl_download(url, destfile) #this one fails
The error message is
Error in curl_download(url, destfile) :
Failed to open file /Users/gcn/Documents/workspace/landuse/data-raw/IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc.curltmp.
It turns out the curl_download internally adds .curltmp to destfile and then removes it. I can't figure out what is writing
It turns out that the problem is the fileChoice variable includes a new directory; IMAGE_SSP1_RCP19. Once I created the directory the process worked fine. I'm posting this because someone else might make the same mistake I did.

Difference between "Compile PDF" and knit2pdf

I have a .Rnw file that I am able to compile into a PDF using the "Compile PDF" button in RStudio (or Command+Shift+k). However, when I use knit2pdf the graphics are not created and the complete PDF is not created. Why would this happen? How do you specifically set where the images will be stored so that pdflatex can find them?
Here is an example. I am aware that this question that I posted a few days ago has a similar example, but in my mind these are two different questions.
This file will run just fine and produce a PDF if I hit "Compile". I don't get any errors, the figure is produced in the /figure directory, and all is well.
%test.Rnw
\documentclass{article}
\usepackage[margin=.5in, landscape]{geometry}
\begin{document}
This is some test text!
<<setup, include=FALSE, results='hide', cache=FALSE>>=
opts_chunk$set(echo=FALSE, warning = FALSE, message = FALSE,
cache = FALSE, error = FALSE)
library(ggplot2)
#
<<printplotscreen, results='asis'>>=
ggplot(diamonds) +
geom_bar(aes(x = color, stat = "bin"))
#
\end{document}
However, when I run this script that is intended to do exactly the same thing as hitting "Compile" (is it?) the figure is not created and I get the not-surprising error below about not being able to find it.
#test.R
library("knitr")
knit2pdf(input = "~/Desktop/thing/test.Rnw",
output=paste0('~/Desktop/thing/test','.tex'))
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'test.tex' failed.
LaTeX errors:
! LaTeX Error: File `figure/printplotscreen-1' not found.
NOTE: If you are trying to reproduce this (and thanks!) then make sure you run the knit2pdf script FIRST to see that it doesn't create the figures. If you hit "Compile" first then the figures will be there for knit2pdf to use, but it will not accurately represent the situation.
The solution: Make sure to set the working directory to the project directory before using knit2pdf, then shorten the "input" path to just the .Rnw file. Thus...
test.R
library("knitr")
diamonds = diamonds[diamonds$cut != "Very Good",]
setwd("/Users/me/Desktop/thing")
knit2pdf(input = "test.Rnw", output = "test.tex")
Here are some references on this issue:
Changing working directory will impact the location of output #38
;
make sure the output dir is correct (#38)
It seems that when using knit2pdf(), it automatically set your output files to the directory where your input file in. And the author doesn't recommend us changing work-directory during the middle of a project.
So the current solution for me is to save the working directory as old one(getwd()), change the working directory to where you want to save the output files, use knit2pdf() to output files, and change the working directory to the original one finally.

R Markdown - How to prevent Knitr from repeatedly downloading a file?

When working on an R Markdown Rmd., can I prevent Knitr from downloading a file each time the Markdown is knitted?
My code chunk is:
download.file(url = paste('https://d396qusza40orc.cloudfront.net/',
'repdata/data/StormData.csv.bz2',
sep = ''),
destfile = './storm.csv.bz2',
method = 'curl'))
The system time of the chunk isn't that significant in and by itself:
user system elapsed
0.893 1.139 28.825
But perhaps there's a way to cache the download or something so I can review the HTML quicker.
You need to check if the file exists before attempting to download.
destfile <- './storm.csv.bz2'
if (!file.exists(destfile))
{
your code
}
Use httr, GET and write_disk since, if destfile exists, write_disk will not let GET perform the download (acts like a mini-cache operation). GET also uses RCurl under the covers.
library(httr)
try(GET(url, write_disk(destfile)))

Resources