I am using the very useful magick library to read and annotate PDF files, and overlay an image on the result. I can generate a PDF file that looks as I would expect it to look. However, when I open the file, the header, which I would expect to read something like %PDF-1.7, reads ‰PNG like this.
It looks to me as if magick is looking at the most recent operation, which is image_composite for a PNG file, and using this for the header. If so, is this a bug? The PDF file that is output appears otherwise well-formed, so it doesn't seem to be causing problems, but I am curious. The following code should enable the issue to be reproduced.
require(magick)
require(pdftools)
pdf_file <- "https://web.archive.org/web/20140624182842/http://www.gnupdf.org/images/d/db/Hello.pdf"
image_file <- "https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/PDF_file_icon.svg/200px-PDF_file_icon.svg.png"
my_image <- image_read(image_file,density = 300)
pdfimage <- image_read_pdf(pdf_file,density = 300)
pdfimage2 <- image_annotate(pdfimage, "test",
location = "+400+700", style = "normal", weight = 400,
size=42)
pdfimage3 <- image_composite(pdfimage2,my_image,operator="atop",
offset = "+100+100")
image_write(pdfimage3, path = "C:/temp/test.pdf", density = 300, flatten = TRUE)
I have held off from answering this because the solution is embarrassingly obvious. In retrospect, I just assumed that, because I used image_read_pdf it should and would save in PDF format. What I needed to do was specify it explicitly. Adding a format = "pdf" argument to the image_write call achieved that.
image_write(pdfimage3, path = "C:/temp/test.pdf", density = 300, format = "pdf", flatten = TRUE)
This results in a well-formed PDF. Problem solved. Lesson learned.
Related
I am doin some cross tabulation with the sjPlot package that produces wonderful tables in HTML.
library(sjPlot)
iris<-iris
tab_xtab(iris$Species,iris$Sepal.Width,show.row.prc = TRUE,show.col.prc = TRUE)
While the output looks great in the consol, I would like to export it and put them in a report. Ideally, I would like to export them into a latex file, but also as a .png as a .doc file would be ok.
Does anyone know how I could do this?
thanks a lot for your help
Best
One option is to use webshot::webshot():
library(sjPlot)
library(webshot)
# location to write html version to
my_html <- tempfile(fileext = ".html")
# write html to temp file
tab_xtab(iris$Species,
iris$Sepal.Width,
show.row.prc = TRUE,
show.col.prc = TRUE,
file = my_html)
# location to write png version to
my_png <- tempfile(fileext = ".png")
# take a webshot of html and save to png
webshot::webshot(my_html, my_png, vheight = 300)
Note, you will likely have to install PhantomJS after installing the webshot package. This can be done with webshot::install_phantomjs().
You may also have to play around with the vwidth and vheight arguments to avoid extra white-space in your png.
(optional read) Greater Objective: PowerBI Web doesn't support a few R packages when published on the internet. It throws the below error ("Missing R Package"). Hence, I am working towards saving the output from R as an image (.jpeg) to a remote location (such as FTP) or cloud storage (secure and open source) and then import it to PowerBI. This workaround might resolve the package conflict (hoping).
Specific Objective*: The below code illustrates a trivial way of saving an R output(.jpeg) image locally. However, is there a way to save the image directly to the FTP server, provided I have the username/password etc? (unfortunately, I cannot share the server details)
library(outbreaks)
library(incidence)
cases = subset(nipah_malaysia, select = c("perak", "negeri_sembilan", "selangor",
"singapore"))
i = as.incidence(cases, dates = nipah_malaysia$date, interval = 7L)
jpeg(file = "plot.jpeg")
plot(i)
dev.off()
I did come across this post on employing ftpUpload function from the "rcurl" package. However, to upload it to FTP, I might still need to save it locally which defeats my purpose in this use-case.
Any suggestions would be helpful.
If saving a temporary file (as you suggested in a comment) is an option, then you can do that with the following code:
library(outbreaks)
library(incidence)
library(RCurl)
cases = subset(nipah_malaysia, select = c("perak", "negeri_sembilan", "selangor",
"singapore"))
i = as.incidence(cases, dates = nipah_malaysia$date, interval = 7L)
jpeg(file = filename <- tempfile())
plot(i)
dev.off()
ftpUpload(filename, "ftp://User:Password#FTPServer/destfile.jpeg")
If you're ok with having the output in PNG format (EDIT: I updated the code to show output to JPEG format) try the code below, with chunks borrowed from this answer that discusses how to save an image in memory:
EDIT: Updated to output to jpeg format
library(outbreaks)
library(incidence)
cases = subset(nipah_malaysia, select = c("perak", "negeri_sembilan", "selangor",
"singapore"))
orig_i = as.incidence(cases, dates = nipah_malaysia$date, interval = 7L)
plot(orig_i)
#### This section adapted from
#### https://stackoverflow.com/questions/7171523/in-r-how-to-plot-into-a-memory-buffer-instead-of-a-fileinstead-of-a-file
#### loads image data to memory rather than a file
library(Cairo)
library(png)
library(ggplot2)
Cairo(file='/dev/null')
plot(orig_i) #your plot
# hidden stuff in Cairo
i = Cairo:::.image(dev.cur())
r = Cairo:::.ptr.to.raw(i$ref, 0, i$width * i$height * 4)
dev.off()
dim(r) = c(4, i$width, i$height) # RGBA planes
# have to swap the red & blue components for some reason
r[c(1,3),,] = r[c(3,1),,]
# now use the jpeg library to write the raw vector
library(jpeg)
p = writeJPEG(r, raw()) # raw JPEG bytes
#DEBUGGING - check that this actually works
#Note: Windows 10 has an error that might report this as a file system error
#In windows, drag and drop the file into an open chrome window to see the image
writeBin(p, con= "yourpathhere/check_output.jpg")
#adapted code from #tfehring's example for the updload
library(RCurl)
ftpUpload(p, "ftp://User:Password#FTPServer/destfile.jpg")
I'm using RStudio and knitr to create reproducable PDF reports on work. However, figures are not pulled into the document - instead there is "figure/unnamed-chunk-" where the image should be.
Images are produced and saved to 'home/figure/'.
The code I use to create the PDF is:
Rfile = "/Users/user/Documents/folder/file.R"
setwd(dirname(Rfile))
spin(Rfile, format = 'Rmd', report=F)
render(paste(substring(Rfile,0,(nchar(file)-1)),"md",sep=""), pdf_document(toc = TRUE, toc_depth=6, number_sections= TRUE),
output_file = paste(substring(file,0,(nchar(file)-2)),".pdf",sep=""))
In the md file, there is a line for each figure that is
figure/unnamed-chunk-X-X.pdf
I've tried adding the lines below after reading the answers at https://groups.google.com/forum/#!topic/knitr/_sw4sAtLkoQ - but they don't make a difference.
opts_knit$set(base.dir = dirname(file))
opts_knit$set(fig.path = '/figure/')
I'm sure there is a simple fix to this but I can't see what it might be.
So hello, I am trying to read a PDF but have issues with the format.
The doc and code is below.
The issue is the output doc doesn't respect the original PDF lines. The last item from line 4 appears in line 5. Is that something I can correct?
The reason I am asking is I need to read 1000's of files like this and have this issue in most files.
When using a pdf to excel on the web I dont have this same issue.
thanks
URL="http://www.arb.ca.gov/cc/capandtrade/offsets/issuance/cals5047-a-b.pdf"
destfile="filetoconvert.pdf"
download.file(URL,destfile)
doc=readPDF(control = list(text = "-layout"))(elem = list(uri = destfile),
language = "en",
id = "id1")
issuance2=NULL
issuance2delim=NULL
doc = c(as.character(doc))
I'm trying to practice making word clouds in R and I've seen the process nicely explained in sites like this (http://www.r-bloggers.com/building-wordclouds-in-r/) and in some videos on YouTube. So I thought I'd pick some random long document to practice myself.
I chose the script for Good Will Hunting. It is available here (https://finearts.uvic.ca/writing/websites/writ218/screenplays/award_winning/good_will_hunting.html). What I did is copy that into Notepad++ and start removing blank lines, names, etc. to try to clean up the data before saving. Saving as a .csv file doesn't seem to be an option so I saved it as a .txt file and R doesn't seem to want to read it in.
Both of the following lines return errors in R.
goodwillhunting <- read.csv("C:/Users/MyName/Desktop/goodwillhunting.txt", sep="", stringsAsFactors=FALSE)
goodwillhunting <- read.table("C:/Users/MyName/Desktop/goodwillhunting.txt", sep="", stringsAsFactors=FALSE)
My question is based on an html document what is the best way to save it to be read in to be used for something like this? I know with the rvest package you can read in webpages. The tutorials for word clouds have used .csv files so I'm not sure if that's what my end goal needs to be.
This might be a way to read in the data going that route?
test = read_html("https://finearts.uvic.ca/writing/websites/writ218/screenplays/award_winning/good_will_hunting.html")
text = html_text(test)
Any help is appreciated!
Here's one way:
library(rvest)
library(wordcloud)
test <- read_html("https://finearts.uvic.ca/writing/websites/writ218/screenplays/
award_winning/good_will_hunting.html")
text <- html_text(test)
content <- stringi::stri_extract_all_words(text, simplify = TRUE)
wordcloud(content, min.freq = 10, colors = RColorBrewer::brewer.pal(5,"Spectral"))
Which gives:
Here is a simple example:
library(wordcloud)
text = scan("fulltext.txt", character(0), strip.white = TRUE)
frequency_table = as.data.frame(table(text))
wordcloud(frequency_table$text, frequency_table$Freq)