Extracting (approximate) data from a PostScript file containing plot generated by gnuplot - plot

Suppose that I have a PostScript file containing a plot which was generated using gnuplot. However, I do not have the source data, nor do I have the gnuplot commands that were used to generate the plot.
Do you know of any way to somehow extract data from a graphic representation (i.e., a PostScript file)? Such code would have to literally "read off of the graph" (in particular, I have a smoothed line/scatter XY plot) from the pixel representation, and I know that the results would be approximate at best (but this would still be very highly desirable).
Do you have any experience with this? Thank you for your time!

PostScript is nothing but a programming language to describe pages. A PostScript file is a plain text file containing a program that descbrise a page which gets interpreted by a printer or viewer.
As such it is amendable to programmatic manipulation, albeit in a low-level way.
I would approach this task in the following way.
Learn PostScript. (The reference manual will come in handy.)
Study the output from gnuplot. How does the gnuplot outputs the graph? Is this systematic? etcetera.
Parse and extract the needed information.

g3data, available here, looks like a possibility. It runs on Linux.

Related

How to prepare publication-quality plots and use calligraphic fonts in Gnuplot?

I use Gnuplot for most of my plots and save the plots as a png. But the resolution of the plots are not so good to put in research papers. So, I need help regarding the following two things:
How to prepare publication-quality plots (eps) in Gnuplot?
How to use calligraphic fonts in the plot, like those written using \mathcal{} in latex?
I searched on the internet regarding these two things, but could not get any ideas.
Thanks in advance.
Since you are stating LaTeX code in your question, I suppose that a solution including LaTeX is suitable for you. I am using gnuplot for producing publication-quality plots (and even TOC-figures!) too, and for me the most convenient method is to use the cairolatex standalone terminal, use LaTeX syntax (e.g. \mathcal{}) in the labels, plot titles and so on, and to compile the figures with pdfLaTeX. Often enough, journals accept figures not only in .eps, but also in .pdf format. If a journal was to refuse .pdf, I would simply convert the figure in the end (i.e. right before submission) to .eps, .png or whatever.

Rasterize plot when using PDF output device

Hello everybody out there using R,
When putting multiple plots with thousands of data points into a single PDF file, this file can get huge and take a long time to open.
The following post describes exactly the same problem in Matplotlib, as well as a nice fix for it:
Matplotlib: multipage PDF with rasterized plots
Particularly nice about it is, that it only rasterizes the points without rasterizing the labels.
http://www.astrobetter.com/blog/2014/01/17/slim-down-your-bloated-graphics/ contains a nice example of it.
I am now looking for a similar solution in R.

The dilemma of plot saving formats - R/base plots

In my research work, when papers are to be communicated, the format could be either LaTeX or DOC/DOCX. This sends me into a dilemma.
I have generated PDFs (they can be easily included into a latex file) for certain plots using the base plot method. However, I would also like to have the PNG versions of the same plots (since MS Word does not accepts PDFs), and no, I do not want to rewrite codes! Further, the convert utility of ImageMagick is also not a preferable option, as there is a severe degradation in terms of resolution when one executes convert myFile.pdf myFile.png
What is the best way? Can we save a plot into a variable and then regenerate the plot to a png / jpg / tiff file?
Save to EPS format (see HowTo here). It is a vector format, and it should be recognizable by MS Word (you will need to import it as a picture) as well as LaTeX.

TikZ takes more than max LaTeX memory for complex R plot

I have a very complex plot, containing about 56,000 data points. It doesn't look right if I downsample it, so I really need to keep all of them. I would additionally like to add LaTeX captions to the figure. (The expression syntax, IMO, does not produce satisfactory rendering.)
After doing some digging around, TikZ seemed like the way to do it. But I found that it ran out of memory trying to plot the figure. I followed all of the advice I could find for TikZ memory management: this amounted to (1) using externalize and (2) increasing the main_memory for LaTeX to the maximum value (~12M). (I am using MacTeX 2014.) Neither of these solutions seemed to work.
At this point, having looked over SO and some other message boards, I am aware of only two options:
Switch to an alternate TeX interpreter, such as LuaTeX, which will allow me to use more memory, or
Use the native R plot, and then manually superimpose the desired labels onto the figures.
I consider (1) to be an acceptable solution, but the fact that I would need an alternate product makes me wonder if I am missing something. I wonder if there is a way to render complex native R plots, which happen to have TeX-style labels in them.

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Resources