How to save a pdf in R with a lot of points - r

So I have to save a pdf plot with a lot of points in it. That is not a problem. The problem is that when I open it. It takes forever to plot all those points. How can I save this pdf in such a way that it doesn't have to draw point by point when someone opens it. I'm OK if the quality of the picture goes down a bit.
Here's a sample. I don't think this would crash your computer but be careful with the parameter length if you have an old machine. I am using many more points than that in my real problem by the way.
pdf("lots of points.pdf")
x <- seq(0,100, length = 100000)
y <- 0.00001 * x
plot(x, y)
dev.off()

I had a similar problem and there is a sound solution. The drawback is that this solution is not generic and does not involve programming (always bad).
For draft purposes, png or any other graphic format may be sufficient, but for presentation purposes this is often not the case. So the way to go is to combine vector graphics for fonts, axis etc and bitmap for your zillions of points:
1) save as pdf (huge and nasty)
2) load into illustrator or likewise ( must have layers )
3) separate points from all other stuff by dragging other stuff to new layer - save as A
4) delete other stuff and export points only as bitmap (png, jpg) and save as B
5) load B into A; scale and move B to exact overlap; delete vector points layer, and export as slender pdf.
done. takes you 30 minutes.
As said this has nothing to do with programming, but there is simply no way around the fact that as vector graphics each and every point (even those that are not visible, since covered by others) are single elements and its a pain handling pdfs with thousands of elements.
So there is need for postprocessing. I know ImageMagick can do alot, but AFAIK the above cant be done by an algorithm.
The only programming way to (partly) solve this is to eliminate those points that will not display because the are covered by others. But thats beyond me.
Only go this way if you really and desperately need extreme scalability, otherwise go with #Ben and #inform and use a bitmap --in whatever container you need it (png,pdf,bmp,jpg,tif, even eps).

Related

rectangles on Grace plots

I have been using Grace (xmgrace) plotting for many years. I recently had an important idea for my work, and it involves rectangles on my plots. Grace supports rectangles (called "boxes"), but when I use a filled "box" it blocks my data curves. I want the curves to show over the filled rectangles. This is driving me nuts. Does anyone know how to put the filled rectangles in the background so they don't block data curves? Thanks.
Unfortunately there is no option in the xmgrace graphical interface that allows you to modify the z order of drawing objects such as boxes:
I also saved the graph as an .agr file and viewed it in a text editor. There doesn't seem to be any flag within the file format to modify z position, either.
Same story if you save a parameter file and check it in a text editor.
So it looks like it is really not possible in xmgrace.
One workaround would be to print to a postscript, EPS or SVG file and open it inside a vector graphics program such as Inkscape (results vary, you might need to experiment with filetypes to see which works best). Then you can easily alter the z order of objects.

Reduce PDF file size of plots by filtering hidden objects

While producing scatter plots of many points in R (using ggplot() for example), there might be many points that are behind the others and not visible at all. For instance see the plot below:
This is a scatter plot of several hundreds of thousands points, but most of them are behind the other points. The problem is when casting the output to a vector file (a PDF file for example), the invisible points make the file size so big, and increase memory and cpu usage while viewing the file.
A simple solution is to cast the output to a bitmap picture (TIFF or PNG for example), but they lose the vector quality and can be even larger in size. I tried some online PDF compressors, but the result was the same size as my original file.
Is there any good solution? For example some way to filter the points that are not visible, possibly during generating plot or after it by editing PDF file?
As a start you can do something like this:
set.seed(42)
DF <- data.frame(x=x<-runif(1e6),y=x+rnorm(1e6,sd=0.1))
plot(y~x,data=DF,pch=".",cex=4)
PDF size: 6334 KB
DF2 <- data.frame(x=round(DF$x,3),y=round(DF$y,3))
DF2 <- DF[!duplicated(DF2),]
nrow(DF2)
#[1] 373429
plot(y~x,data=DF2,pch=".",cex=4)
PDF size: 2373 KB
With the rounding you can control how many values you want to remove. You only need to modify this to handle the different colours.
Simply saving the plot as a high-res png file will very drastically cut the size, while keeping the quality more than good enough. At least I've never had journals complain about any of the png's I sent them, just keep sure to use > 600 dpi.
I think it might be done with some post-processing of the pdf-file. In linux, if I have to reduce a pdf, I would do
pdf2ps input.pdf output.ps
ps2pdf output.ps output.pdf
which for some reason works quite efficiently.
You can see some discussion at https://askubuntu.com/questions/113544/how-to-reduce-pdf-filesize.

Plot two large Raster Data Sets in a Scatter Plot

i have a problem with plotting two Raster Data Sets in R.
I use two different IRS LISS III Scenes (with the same Extent) and what i want is to plot the pixel values of both scenes in one Scatterplot (x= Layer1 and y=Layer2).
My problem is now the handling of the big amount of data. Each Scene has about 80.000.000 pixels due reclassification and other processing i was able to scale down the values to a amount of 12.000.000 in each raster. But when i try to import these values e.g. in a data.frame or load them from an ascii file i always got problems with my memory.
Is it possible two plot such an amount of data, and when yes it would be great if someone could help me, i was trying it for two days now and right now im desperated.
Many thanks,
Stefan
Use the raster package, there's a good chance it will work out of the box since it has good "out-of-memory" handling. If it doesn't work with the ASCII grids, convert them to something more efficient (like an LZW-compressed and tiled GeoTIFF) with GDAL. And if they are still too big resize them, that's all the graphics rendering process will do anyway. (You don't say how you resized originally, or give any details on how you are trying to read them).

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Changing resolution of bitmaps

I am making some graphs with R and I am coping them to Word. I was coping them as metafiles but Word doesn't seem to be able to cope with them. The other option in R to copy graphs is a bitmap, but when I use this the quality of the graphs in word is terrible.
I saw some answers about changing the resolution in this website but only if I saved the graphs which I would like to avoid. Is there a way of changing the resolution for copied graphs?
Thanks,
sbg
When the graphs are onscreen, they are drawn for a screen resolution (i.e. 72dpi). For print, you need to use at least 300dpi, or switch to a vector format. Word can import graphs in Windows Metafile (.wmf) format; but your other option is to save the plot using, e.g.,
png("my plot.png", res = 300)
plot(1:5)
dev.off()
This saves to disk, which you said you wanted to avoid, but you can always delete it again later (programatically even, with file.remove).
I'd also like to make the case that when you copy and paste, your work isn't as easily reproducible as when you use code. There is no trace of what you have done, and when your data changes, you need to go through the rigmarole of clicking again, rather than just executing your updated script.

Resources