The dilemma of plot saving formats - R/base plots - r

In my research work, when papers are to be communicated, the format could be either LaTeX or DOC/DOCX. This sends me into a dilemma.
I have generated PDFs (they can be easily included into a latex file) for certain plots using the base plot method. However, I would also like to have the PNG versions of the same plots (since MS Word does not accepts PDFs), and no, I do not want to rewrite codes! Further, the convert utility of ImageMagick is also not a preferable option, as there is a severe degradation in terms of resolution when one executes convert myFile.pdf myFile.png
What is the best way? Can we save a plot into a variable and then regenerate the plot to a png / jpg / tiff file?

Save to EPS format (see HowTo here). It is a vector format, and it should be recognizable by MS Word (you will need to import it as a picture) as well as LaTeX.

Related

How to prepare publication-quality plots and use calligraphic fonts in Gnuplot?

I use Gnuplot for most of my plots and save the plots as a png. But the resolution of the plots are not so good to put in research papers. So, I need help regarding the following two things:
How to prepare publication-quality plots (eps) in Gnuplot?
How to use calligraphic fonts in the plot, like those written using \mathcal{} in latex?
I searched on the internet regarding these two things, but could not get any ideas.
Thanks in advance.
Since you are stating LaTeX code in your question, I suppose that a solution including LaTeX is suitable for you. I am using gnuplot for producing publication-quality plots (and even TOC-figures!) too, and for me the most convenient method is to use the cairolatex standalone terminal, use LaTeX syntax (e.g. \mathcal{}) in the labels, plot titles and so on, and to compile the figures with pdfLaTeX. Often enough, journals accept figures not only in .eps, but also in .pdf format. If a journal was to refuse .pdf, I would simply convert the figure in the end (i.e. right before submission) to .eps, .png or whatever.

PDF image is 1.3 MB, but after removing text in Illustrator becomes ~35 Mb (still vector in PDF)

I have created a figure for my scientific work using ggplot (an R package for plotting data). It's a scatterplot that contains ~25.000 data points in a normal x-y-style plot. Each data point has a border and a color fill. The output vector PDF is 1.3 Mb in size. Now, I would like to make some final adjustments regarding font size and text position and merge it with other panels in a bigger figure which I normally do in Illustrator. So I add/embed the scatterplot to the rest of my figures which nicely loads all elements correctly. However, when I then simply save this file as .ai or .pdf, the output will be more than ~30 Mb. How is it possible, that all elements are still preserved in the original (small) PDF, but after Illustrator it is inflated to much? It is critical for me to keep the file size small.
I tried many things, including different PDF exporting options in Illustrator and macOS Preview PDF file compression, but nothing worked. I even tried merging all those ~25.000 overlapping dots together in one or at least few shapes, but either Illustrator crashes in the process (Illustrator > Pathfinder unite/merge) or the resulting PDF shows some erratic behaviour, i.e. become black/white in Word (Illustrator > Flatten Transparency) What am I missing here?
Any help is appreciated!
When saving, make sure you're not enabling Illustrator editing capabilities. Leaving Illustrator editing capabilities enabled will essentially cause a copy of the Illustrator file (as an AI version) to be written into the PDF that's being saved. This often causes the PDF to increase dramatically in size, especially for files with many vector or path elements.
I had the same issue. What worked was me was this:
Export as eps instead of pdf in ggplot. You may need to use device=cairo_ps as an option (I did).
In Adobe Illustrator, create a new document and select the web option
Combine all your figures into this new figure by dragging and dropping them there. Use "Embed" to embed those figures into the new one.
Make all changes you need
Save as pdf with default options (I used preset {Smallest File Size (PDF 1.6)}.
This preserved the small file size for me. I think the only thing that matters here is the use of eps instead of pdf when exporting from ggplot.

Is it possible to import a raster of a PDF file?

Our office does scanning of data entry forms, and we lack any proprietary software that is able to do automated double-entry (primary entry is done by hand, of course). We are hoping to provide a tool for researchers to highlight regions on forms and use scanned versions to determine what participant entry was.
To do this, all I need for a very rough attempt is a file to read in PDFs as raster files, with coordinates as X, Y components, and B&W white "intensities" as a Z-axis.
We use R mainly for statistical analysis and data management, so options in R would be great.
You could use the raster package from R. However, it doesnt support .pdf files, but .tif,.jpg,.png (among many others).
But coverting your pdfs into pngs shouldn't be a big problem: Look here for more information.
Once you have your png files ready, you can do the following:
png <- raster("your/png/file.png")
and then use the extract() function to get your brigthness value from the picture. I.e. let's say your png is 200x200px and you want to extract a pixel value from row 100 and column 150:
value <- extract(png, c(150,100))

Extracting (approximate) data from a PostScript file containing plot generated by gnuplot

Suppose that I have a PostScript file containing a plot which was generated using gnuplot. However, I do not have the source data, nor do I have the gnuplot commands that were used to generate the plot.
Do you know of any way to somehow extract data from a graphic representation (i.e., a PostScript file)? Such code would have to literally "read off of the graph" (in particular, I have a smoothed line/scatter XY plot) from the pixel representation, and I know that the results would be approximate at best (but this would still be very highly desirable).
Do you have any experience with this? Thank you for your time!
PostScript is nothing but a programming language to describe pages. A PostScript file is a plain text file containing a program that descbrise a page which gets interpreted by a printer or viewer.
As such it is amendable to programmatic manipulation, albeit in a low-level way.
I would approach this task in the following way.
Learn PostScript. (The reference manual will come in handy.)
Study the output from gnuplot. How does the gnuplot outputs the graph? Is this systematic? etcetera.
Parse and extract the needed information.
g3data, available here, looks like a possibility. It runs on Linux.

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Resources