R pdftools does not get all layers of pdf when converting to image - r

I have just started trying to use pdftools to extract images from pdfs. However I have found that not all layers are reproduced. For example in the code below the lines are reproduced in the png but not the points. Obviously in this example I could just save the png directly but I'm just using it to highlight the problem I am having for other data when I don't have the source code/data creating the pdf.
Warnings the code below creates files in the C:\temp directory
library(tidyverse)
library(pdftools)
set.seed(5)
df <- data.frame(Date = rep(as.Date(1:50, origin = "1990-01-01"),2), value = c(1:50,1:50)+c(rnorm(50),rnorm(50,sd=5)), var = rep(c("a","b"),each = 50))
plt1 <- ggplot(df, aes(x = Date, y = value, colour = var))+
geom_line()+
geom_point()
ggsave(plt1, filename = "C:/temp/testplot.pdf", width = 5, height = 4)
This creates pdf with points and lines as expected
However when I convert I do no get points, only lines
pdf_convert("C:/temp/testplot.pdf", format = "png", filenames = "C:/temp/testpng.png")
#> Converting page 1 to C:/temp/testpng.png...
#> PDF error: No display font for 'ArialUnicode'
#> done!
#> [1] "C:/temp/testpng.png"
Created on 2019-11-19 by the reprex package (v0.3.0)
I have also tried using pdftools::pdf_render_page and the image_read_pdf and image_convert from the magick package with the same results. However I understand that the magick functions are actually using pdftools, so the problem must be there

Suggested work-around:
Open pdf file in Adobe Acrobat.
Select "File" -> "Print" -> "Microsoft Print to PDF" -> "Advanced" -> check in front of "Print As Image" -> "OK" -> "Print"
Then, perform the "pdf_convert" on the new .pdf copy you just created.

Related

webshot2::webshot is trimming the right side off of a huxtable in R

I am trying to convert an html table created in R via the huxtable package to a png file using webshot2::webshot. Unfortunately, the very right side of the output seems to get trimmed off and I can't figure out how to fix this generically.
I don't want to manually tweak the cliprect parameter because I'll need to do this for many tables and its not scalable if its not generic. However, it is possible to achieve it with this parameter so I wonder why its failing with the other workflow.
Here's an example illustrating the problem:
library(magrittr)
library(huxtable)
set.seed(1337)
data <- matrix(rnorm(25), 35, 10)
my_hux <- as_hux(data) %>%
set_outer_borders(0.4) %>%
map_background_color(by_rows("grey95", "white")) %>%
map_text_color(by_quantiles(c(0.1, 0.9), c("red", "black", "green3")))
quick_html(my_hux, file = "ahuxtable.html", open = FALSE)
webshot2::webshot(url = "ahuxtable.html", file = "ahuxtable.png",
zoom = 5, selector = "table")
I tried this with webshot::webshot, however the webshot package seems to be webshot2's predecessor so I'd prefer a webshot2 solution if there is one.

Save ggplot object as image in the environment as object/value

I have a ggplot object. Let's call it plot. I would like to convert it to png format, but I don't want to save it to a file on my local drive. I'm trying to work with that png object but I want to keep everything in the environment. Everything I've found, including ggsave, appears to force one to save the image as a file on the local drive first. I know image files can be stored as values, but I can't seem to get over the "save as" image and "import" image steps.
Here's some code for repoducibility:
library(tidyverse)
df <- as.data.frame(Titanic)
gg <- ggplot(data = df, aes(x = Survived, y = Freq))
plot <- gg + geom_bar(stat = "identity")
Now, I'd like to convert plot to a png to png without having to save it to a file. Something like:
png <- save.png(plot)
Thanks for the help!
It looks like the goal here would be to convert plot (the ggplot object) directly to a Magick image that you can operate on with functions in the magick package. Something like this:
mplot = image_graph(width=400, height=500)
plot
dev.off()
image_graph opens a graphics device that produces a Magick image and assigns it to mplot so that you'll have the object available in your environment. Then, when you type mplot in the console, you'll see the following:
format width height colorspace matte filesize density
1 PNG 400 500 sRGB TRUE 0 +72x+72
However, when I try to display the mplot image (type mplot in the console), I see the following:
even though the original plot looks like this:
I'm not sure what's going wrong, but hopefully someone with greater familiarity with magick will drop by and provide a solution.
I was faced with a similar issue and followed #eipi12 approach of using magick. The code bellow should work:
library(ggplot2)
library(magrittr)
ggsave_to_variable <- function(p, width = 10, height = 10, dpi = 300){
pixel_width = (width * dpi) / 2.54
pixel_height = (height * dpi) / 2.54
img <- magick::image_graph(pixel_width, pixel_height, res = dpi)
on.exit(utils::capture.output({
grDevices::dev.off()}))
plot(p)
return(img)
}
p <- data.frame(x = 1:100, y = 1:100) %>%
ggplot(aes(x = x, y = y)) +
geom_line()
my_img <- ggsave_to_variable(p)
my_img %>%
magick::image_write("my_img.png")

How to store r ggplot graph as html code snippet

I am creating an html document by creating various objects with ggplotly() and htmltools functions like h3() and html(). Then I submit them as a list to htmltools::save_html() to create an html file.
I would like to add ggplot charts directly as images, rather than attaching all the plotly bells and whistles. In the end, I will create a self-contained html file (no dependencies), and the plotly stuff would make that file excessively large.
Is there some function that converts a ggplot object into some html-type object? Or do I have to save the ggplot as a .png file, then read the .png file into some object that I add to the list in the save_html() function?
My R code looks something like this:
library("tidyverse")
library("plotly")
library("htmltools")
HTMLOut <- "c:/Users/MrMagoo/My.html")
df <- data.frame(x=1:25, y=c(1:25*1:25))
g7 <- ggplot(df,aes(x=x, y=y)) + geom_point()
p7 <- ggplotly(g7) # I would like to use something other than ggplotly here. Just capturing the ggplot as an image would be fine.
# create other objects to add to the html file
t7 <- h2(id="graph7", "Title for graph #7")
d7 <- p("description of graph 7")
save_html(list(t7, p7, d7), HTMLOut)
# of course, the real code has many more objects in that list – more graphs, text, tables, etc.
I would like to replace the plotly object (p7) with something that just presents g7 in a way that would not cause an error in the save_html function.
I had hoped to find a function that could directly Base64 encode a ggplot object, but it seems that I first need to output the 'ggplot' object as a .png file (or SVG, per Teng L, below), then base64-encode it. I was hoping there was a more direct way, but I may end up doing that, as in https://stackoverflow.com/a/33410766/3799203 , ending it with
g7img <- "<img src=\"data:image/png;base64,(base64encode string)\""
g7img <- htmltools::html(g7img)
If you want to save the plot as a dynamic plotly graph, you could use htmlwidgets::saveWidget. This will produce a stand-alone html file.
Here is a minimal example:
library(tidyverse);
library(plotly);
library(htmlwidgets);
df <- data.frame(x = 1:25, y = c(1:25 * 1:25))
gg <- ggplot(df,aes(x = x, y = y)) + geom_point()
# Save ggplotly as widget in file test.html
saveWidget(ggplotly(gg), file = "test.html");
I ended up generating a temparory image file, then base64 encoding it, within a function I called encodeGraphic() (borrowing code from LukeA's post):
library(ggplot2)
library(RCurl)
library(htmltools)
encodeGraphic <- function(g) {
png(tf1 <- tempfile(fileext = ".png")) # Get an unused filename in the session's temporary directory, and open that file for .png structured output.
print(g) # Output a graphic to the file
dev.off() # Close the file.
txt <- RCurl::base64Encode(readBin(tf1, "raw", file.info(tf1)[1, "size"]), "txt") # Convert the graphic image to a base 64 encoded string.
myImage <- htmltools::HTML(sprintf('<img src="data:image/png;base64,%s">', txt)) # Save the image as a markdown-friendly html object.
return(myImage)
}
HTMLOut <- "~/TEST.html" # Say where to save the html file.
g <- ggplot(mtcars, aes(x=gear,y=mpg,group=factor(am),color=factor(am))) + geom_line() # Create some ggplot graph object
hg <- encodeGraphic(g) # run the function that base64 encodes the graph
forHTML <- list(h1("My header"), p("Lead-in text about the graph"), hg)
save_html(forHTML, HTMLOut) # output it to the html file.
I think what you want may be close to one of the following:
Seems you are creating an HTML report but hasn't checked out RMarkdown. It comes with Base64 encode. When you create an RMarkdown report, pandoc automatically converts any plots into an HTML element within the document, so the report is self-contained.
SVG plots. This is less likely to be what you might want, but SVG plots are markup-language based and may be easily portable. Specify .svg extension when you use ggsave() and you should be getting an SVG image. Note that SVG is an as-is implementation of the plot, so if can be huge in file size if you have thousands of shapes and lines.
This is an extension to the Maurits Evers post. In this answer I'm showing how to combine multiple plotly plots in the same html file in an organized fashion:
library("plotly")
library("htmltools")
# a small helper function to avoid repetition
my_func <- function(..., title){
## Description:
## A function to add title to put multiple gg plotly objects under a html heading
##
## Arguments:
## ...: a list of gg objects
## title: a character vector to specify the heading text
# get the ... in list format
lst <- list(...)
# create the heading
tmp_title <- htmltools::h1(title)
# convert each ggplot to ggplotly and put them under the same div html tag
tmp_plot <- lapply(lst, ggplotly) |>
htmltools::div()
# return the final object as list
return(list(tmp_title, tmp_plot))
}
# a toy data
df <- data.frame(x = 1:25, y = c(1:25 * 1:25))
# the ggplot object using the toy data
gg <- ggplot(df,aes(x = x, y = y)) + geom_point()
# put everything in order
final_list <- list(my_func(obj = list(gg, gg, gg), title = "The first heading"),
my_func(obj = list(gg, gg), title = "The second heading"))
# write to disk as a unified HTML file
htmltools::save_html(html = final_list,
file = "index.html"))
Disclaimer: I specifically did this to avoid using widgetframe R package and to be completely on par with the documentation of plotly-r. You can read the link if you are comfortable with adding extra dependency and extra abstraction layer. I prefer to use packages if and only if necessary. :)

Multiple timeseries zoo objects in one panel; blogdown serve_site() doesn't load plot

I've several zoo objects generated through a loop. I'd like to plot all objects in one panel. I suppose it can be done by first merging zoo objects to a matrix-like zoo object and and supply plot.type = "multiple" and screens = ncol(merged-zoo-object) arguments in plot.zoo(), but I can't figure out how to merge.
library(zoo)
for (i in 1:3) {
value <- rnorm(n = 12, mean = i)
index <- seq(as.Date("2000/1/1"), by = "month", length.out = 12)
ts <- zoo(x = value, order.by = index)
plot.zoo(ts)
}
UPDATE
I've managed to create the plot (answered) and I want to create a blogpost with blogdown.
The problem you had with blogdown is that you were using an absolute local path /home/rsl/r-plots/sample.png. In general, it is a bad idea to use absolute paths, since they are not portable. In this specific case, when you publish your post to a web server, the meaning of /home/rsl/r-plots/sample.png will change. It indicates the file /home/rsl/r-plots/sample.png under the root directory of your website. For example, if your website is http://example.com, the file path means http://example.com/home/rsl/r-plots/sample.png, which is definitely not what you actually mean. The web server knows nothing about your local files on your computer, and certainly cannot find any files on your local disk, so the plot won't load on the web page.
In short, remove this:
ggsave(filename = "sample.png", path = "~/r-plots")
When you author a document using knitr, or any packages based on knitr, such as rmarkdown, bookdown, and blogdown, there is no need to manually save plots using ggsave() or R graphical devices. R plots will be automatically save behind the scenes.
This sort of works, but code could've been cleaner.
require(zoo)
require(ggfortify)
merged.zoo <- zoo()
for (i in 1:3) {
value <- rnorm(n = 12, mean = i)
index <- seq(as.Date("2000/1/1"), by = "month", length.out = 12)
ts <- zoo(x = value, order.by = index)
merged.zoo <- merge.zoo(merged.zoo, ts)
}
autoplot.zoo(object = merged.zoo, geom = "line")
ggsave(filename = "sample.png", path = "~/r-plots")
I now create a new post with blogdown::new_post(title = "title") and add the below text in *title.rmd file which is created by new_post command.
---
title: title
author: ~
date: '2017-10-05'
slug: title
categories: []
tags: []
---
![I want to see this plot](/home/rsl/r-plots/sample.png)
I expect to see the plot in the post named title when serve_site() is executed followed by build_site() with default settings. But plot doesn't load.

Why can't the pdf file created by gage (a R packge) be opened

I am trying to use Gage package implemented in R to analyze my RNA-seq data. I followed the tutorial and got my data.kegg.p file and I used the following script to generate the heatmap for the top gene set
for (gs in rownames(data.kegg.p$greater)[1]) {
outname = gsub(" |:|/", "_", substr(gs, 10, 100))
geneData(genes = kegg.gs[[gs]], exprs = essData, ref = 1,
samp = 2, outname = outname, txt = T, heatmap = T,
Colv = F, Rowv = F, dendrogram = "none", limit = 3, scatterplot = T)
}
I did get a pdf file named "NOD-like_receptor_signaling_pathway.geneData.heatmap.pdf", but when I open this file with acrobat reader or photoshop, it gives the error information that this file has been disrupted and cannot be recovered. Could anyone help check this file (https://www.dropbox.com/s/wrsml6n1pbrztnm/NOD-like_receptor_signaling_pathway.geneData.heatmap.pdf?dl=0) to see whether it is really disrupted and is it possible to find a way to recover it?
I also attached the R workspace file (https://www.dropbox.com/s/6n5m9x5hyk38ff1/A549.RData?dl=0). The object "a4" is the data with the format ready for gage analysis. It contains the data of the reference sample (nc) the treated sample (a549). It can be accepted by gage for analysis but generate the heatmap pdf file which cannot be opened (above). Would you mind helping me check whether these data can be properly used to generated the correct gage result?
Best regards.
I'm running into a similar problem myself. Not 100% sure but I think this problem occurs when there is no heatmap to plot. In my case, I was doing as.group comparison with ref and sample selections. I think the software treats this circumstance as a sample n of 1 and can't really show a differential heatmap. When I tried using 1ongroup setting, I was able to visualize the pdf file.

Resources