Download png/jpg with R - r

i would like to download all of images from this site but after downloading photos all are corrupted. What i should do to download them successfully?
My code:
library(XML)
dir.create('c:/photos')
urls<-paste("http://thedevilsguard.tumblr.com/page/",1:1870,sep="")
doc<-htmlParse(urls[1])
links<-unique(unlist(xpathApply(doc,'//div[#class="timestamp"]/a',xmlGetAttr,'href')))
for (i in 1:length(links)){
doc2<-htmlParse(links[i])
link<-xpathApply(doc2,'//div[#class="centre photopage"]//p//img',xmlGetAttr,'src')[[1]][1]
download.file(link,paste("C:/photos/",basename(link),""))
}

So it looks you are under Windows. When you download binary files, you have to specify the mode to be binary, e.g.
download.file(link, ..., mode = 'wb')
see ?download.file for details.

First, try and download one. Do this:
link = "http://29.media.tumblr.com/tumblr_m0q2g8mhGK1qk6uvyo1_500.png"
download.file(link,basename(link))
Does that work?
I notice its a PNG and NOT a JPEG, so maybe you are trying to read it in as a JPEG.

Related

Saving pptx as pdf in R

I have created powerpoint files using officer package and I would also like to save them as pdf from R (dont want to manualy open and save as pdf each file). Is this possible?
you can save the powerpoint object edited using the code which is posted here: create pdf in addition to word docx using officer.
You will need to first install pdftools and libreoffice
library(pdftools)
office_shot <- function( file, wd = getwd() ){
cmd_ <- sprintf(
"/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to pdf --outdir %s %s",
wd, file )
system(cmd_)
pdf_file <- gsub("\\.(docx|pptx)$", ".pdf", basename(file))
pdf_file
}
office_shot(file = "your_presentation.pptx")
Note that the author of the officer package is the one who referred someone to this response.
Note that the answer from Corey Pembleton has the LibreOffice iOS path. (Which I personally didn't initially notice). The Windows path would be something like "C:/Program Files/LibreOffice/program/soffice.exe".
Since the initial answer provided by Corey, an example using docxtractr::convert_to_pdf can now be found here.
The package and function are the ones John M commented in Corey initial answer.
An easy solution to this question is to use convert_to_pdf function from docxtractr package. Note: this solution requires to download LibreOffice from here. I used the following order.
First, I need to set the path to LibreOffice and soffice.exe
library(docxtractr)
set_libreoffice_path("C:/Program Files/LibreOffice/program/soffice.exe")
Second, I set the path of the PowerPoint document I want to convert to pdf.
pptx_path <- "G:/My Drive/Courses/Aysem/Certifications/September17_Part2.pptx"
Third, convert it using convert_to_pdf function.
pdf <- convert_to_pdf(pptx_path, pdf_file = tempfile(fileext = ".pdf"))
Be careful here. The converted pdf file is saved in a local temporary folder. Here is mine to give you an idea. Just go and copy it from the temporary folder.
"C:\\Users\\MEHMET~1\\AppData\\Local\\Temp\\RtmpqAaudc\\file3eec51d77d18.pdf"
EDIT: A quick solution to find where the converted pdf is saved. Just replace the third step with the following line of code. You can set the path where you want to save. You don't need to look for the weird local temp folder.
pdf <- convert_to_pdf(pptx_path, pdf_file = sub("[.]pptx", ".pdf", pptx_path))

Is there any way to download csv file from "website button click" using R

download.file(URL, destfile = "../data.csv", method="curl") need exact url of csv but i need to download the CSV file from a website having "Download Click" option.
"http://apps.who.int/gho/data/view.main.MHSUICIDEASDRREGv?lang=en" --> Link
You can hit F12 and see the code behind that page, and pretty much any page, except maybe flash elements. Then, do something like this.
getit <- read.csv("http://apps.who.int/gho/athena/data/GHO/MH_12?filter=COUNTRY:-;REGION:*&x-sideaxis=REGION;SEX&x-topaxis=GHO;YEAR&profile=crosstable&format=csv")
head(getit)
getit <- fread("http://apps.who.int/gho/athena/data/GHO/MH_12?filter=COUNTRY:-;REGION:*&x-sideaxis=REGION;SEX&x-topaxis=GHO;YEAR&profile=crosstable&format=csv")
getit <- read_csv("http://apps.who.int/gho/athena/data/GHO/MH_12?filter=COUNTRY:-;REGION:*&x-sideaxis=REGION;SEX&x-topaxis=GHO;YEAR&profile=crosstable&format=csv")
You can find lots and lots of other ideas from the link below.
https://www.datacamp.com/community/tutorials/r-data-import-tutorial
copy paste this :
download.file(url = "http://apps.who.int/gho/athena/data/GHO/MH_12?filter=COUNTRY:-;REGION:*&x-sideaxis=REGION;SEX&x-topaxis=GHO;YEAR&profile=crosstable&format=csv", destfile = "H:/test.csv")
i have used destination file path as "H:/test.csv" you can wherever u want to save the file

Download URL links using R

I am new to R and would like to seek some advice.
I am trying to download multiple url links (pdf format, not html) and save it into pdf file format using R.
The links I have are in character (took from the html code of the website).
I tried using download.file() function, but this requires specific url link (Written in R script) and therefore can only download 1 link for 1 file. However I have many url links, and would like to get help in doing this.
Thank you.
I believe what you are trying to do is download a list of URLs, you could try something like this approach:
Store all the links in a vector using c(), ej:
urls <- c("http://link1", "http://link2", "http://link3")
Iterate through the file and download each file:
for (url in urls) {
download.file(url, destfile = basename(url))
}
If you're using Linux/Mac and https you may need to specify method and extra attributes for download.file:
download.file(url, destfile = basename(url), method="curl", extra="-k")
If you want, you can test my proof of concept here: https://gist.github.com/erickthered/7664ec514b0e820a64c8
Hope it helps!
URL
url = c('https://cran.r-project.org/doc/manuals/r-release/R-data.pdf',
'https://cran.r-project.org/doc/manuals/r-release/R-exts.pdf',
'http://kenbenoit.net/pdfs/text_analysis_in_R.pdf')
Designated names
names = c('manual1',
'manual2',
'manual3')
Iterate through the file and download each file with corresponding name:
for (i in 1:length(url)){
download.file(url[i], destfile = names[i], mode = 'wb')
}

how to download and display an image from an URL in R?

My goal is to download an image from an URL and then display it in R.
I got an URL and figured out how to download it. But the downloaded file can't be previewed because it is 'damaged, corrupted, or is too big'.
y = "http://upload.wikimedia.org/wikipedia/commons/5/5d/AaronEckhart10TIFF.jpg"
download.file(y, 'y.jpg')
I also tried
image('y.jpg')
in R, but the error message shows like:
Error in image.default("y.jpg") : argument must be matrix-like
Any suggestions?
If I try your code it looks like the image is downloaded. However, when opened with windows image viewer it also says it is corrupt.
The reason for this is that you don't have specified the mode in the download.file statement.
Try this:
download.file(y,'y.jpg', mode = 'wb')
For more info about the mode is see ?download.file
This way at least the file that you downloaded is working.
To view the image in R, have a look at
jj <- readJPEG("y.jpg",native=TRUE)
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(jj,0,0,1,1)
or how to read.jpeg in R 2.15
or Displaying images in R in version 3.1.0
this could work too
here
library("jpeg")
library("png")
x <- "http://upload.wikimedia.org/wikipedia/commons/5/5d/AaronEckhart10TIFF.jpg"
image_name<- readJPEG(getURLContent(x)) # for jpg
image_name<- readPNG(getURLContent(x)) # for png
After downloading the image, you can use base R to open the file using your default image viewer program like this:
file.show(yourfilename)

Problems with Downloading pdf file using R

I would like to download a pdf file from the internet and save it in the local HD. After download, the pdf output file has lots of empty pages. What can I do to fix it?
Example:
require(XML)
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')
Thanks in advance.
Try with wb-mode like this:
download.file(url, 'introductionToR.pdf', mode="wb").
For me it works that way.
you can download pdfs and export tables as data.frame using tabulizer package
https://ropensci.org/tutorials/tabulizer_tutorial.html
install.packages("devtools")
# on 64-bit Windows
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"), INSTALL_opts = "--no-multiarch")
# elsewhere
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"))
library(tabulizer)
f2 <- "https://github.com/leeper/tabulizer/raw/master/inst/examples/data.pdf"
extract_tables(f2, pages = 1, method = "data.frame")

Resources