Hello everybody out there using R,
When putting multiple plots with thousands of data points into a single PDF file, this file can get huge and take a long time to open.
The following post describes exactly the same problem in Matplotlib, as well as a nice fix for it:
Matplotlib: multipage PDF with rasterized plots
Particularly nice about it is, that it only rasterizes the points without rasterizing the labels.
http://www.astrobetter.com/blog/2014/01/17/slim-down-your-bloated-graphics/ contains a nice example of it.
I am now looking for a similar solution in R.
Related
I use Gnuplot for most of my plots and save the plots as a png. But the resolution of the plots are not so good to put in research papers. So, I need help regarding the following two things:
How to prepare publication-quality plots (eps) in Gnuplot?
How to use calligraphic fonts in the plot, like those written using \mathcal{} in latex?
I searched on the internet regarding these two things, but could not get any ideas.
Thanks in advance.
Since you are stating LaTeX code in your question, I suppose that a solution including LaTeX is suitable for you. I am using gnuplot for producing publication-quality plots (and even TOC-figures!) too, and for me the most convenient method is to use the cairolatex standalone terminal, use LaTeX syntax (e.g. \mathcal{}) in the labels, plot titles and so on, and to compile the figures with pdfLaTeX. Often enough, journals accept figures not only in .eps, but also in .pdf format. If a journal was to refuse .pdf, I would simply convert the figure in the end (i.e. right before submission) to .eps, .png or whatever.
I have a huge scatter plot matrix to generate and save into a zoom-able image. I takes a bunch of hours to draw and then I got some errors like:
"Server Error Unabe to establish connection with R session".
Any ideas? The problem is obviously memory, but there must be a way to get around this.
I've managed to save the file as a pdf format of 28.7 MB, it takes a lot of time to display and makes inkscape crash. I know that people who generate fractals are able to make images of infinite resolution without consuming a lot of memory since the image is generate as u zoom into it. Problem is fractals are self similar and scatterplots are not, so I'm not sure if there's a smart way to get around this issue.
A possible way to get around this "information overload" is to plot variables in pairs using qplot() and then save the file using ggsave(), for example in bmp on jpeg files.
One interesting feature of RStudio is it allows to save multiple plots generated from a script. This however opens up the problem of how to edit multiple plots. My issue at the moment is adding lines to histograms using the abline() function. This function was designed however to work with the last plot generated by the environment. One way of course would be ad the lines as soon as the plot is generated, however I have to calculate the coordinates at the end of the algorithm, by then I have transformed the data and generated multiple plots from it. So I was wondering if there isn't a way to tell R to search for a given plot and add the line to it. I read abline() documentation but found nothing regarding it. One can always save the data necessary to generate the plot and generate it at the end of the script, but I was wondering if there isn't a less consuming memory method.
One way to get around this issue is:
1.Save your graphics as variables, for ex: hist_1=hist(x, plot=FALSE)
2.Write any code u like, for ex: very complicated code give y as a number for output
3.plot(hist_1)
4.abline(hist_1, v=y)
gives a general idea of how to edit multiple plots without having to save multiple copies of datasets and without overloading Rstudio interface. Works well with the R ubuntu terminal too.
I'd like to create a heat map in R that I want to use on a website. I stumbled upon the SVGAnnotation package which seems to be very nice to process SVG graphics in R to make them more interactive. First, I was planning to add tool tips for each cell in the heatmap - if the user hovers over the cell, the value of this cell should pop up. However, I am fighting with SVGAnnotation for more than 3 hours now, reading and trying things, and I can't get it to work.
I would appreciate any help on the SVGAnnotation tool tip function. But I would also very much appreciate alternatives to SVGAnnotation to add some activity to my R SVG heatmap.
So, what I have got so far looks like this:
library(SVGAnnotation)
data(mtcars)
cars <- as.matrix(mtcars)
map <- svgPlot(heatmap(cars))
addToolTips(map, ...) # problem
saveXML(map, "cars.svg")
My problem is the addToolTips function itself, I guess. Intuitively, I would simply insert the data matrix, i.e., cars, but this does not work and R gets stuck (it's calculating, but doesn't return anything, I waited 50 minutes)
EDIT:
After some more online research, I found a good example of what I want to achieve: http://online.wsj.com/article/SB125993225142676615.html#articleTabs=interactive
This heat map looks really great, and the interactive features (tool tips) work very well. I am wondering how they did that. To me, it looks like the graphic was done in R using the ggplot package.
I wrote a command line tool that can do exactly that if you are still interested to add tool tips to your heat map. It runs in Windows/Linux/MacOS terminals. All you need as input is the heat map as svg file and the data table/matrix that you used as input to create your heat map as csv or other text file.
Suppose that I have a PostScript file containing a plot which was generated using gnuplot. However, I do not have the source data, nor do I have the gnuplot commands that were used to generate the plot.
Do you know of any way to somehow extract data from a graphic representation (i.e., a PostScript file)? Such code would have to literally "read off of the graph" (in particular, I have a smoothed line/scatter XY plot) from the pixel representation, and I know that the results would be approximate at best (but this would still be very highly desirable).
Do you have any experience with this? Thank you for your time!
PostScript is nothing but a programming language to describe pages. A PostScript file is a plain text file containing a program that descbrise a page which gets interpreted by a printer or viewer.
As such it is amendable to programmatic manipulation, albeit in a low-level way.
I would approach this task in the following way.
Learn PostScript. (The reference manual will come in handy.)
Study the output from gnuplot. How does the gnuplot outputs the graph? Is this systematic? etcetera.
Parse and extract the needed information.
g3data, available here, looks like a possibility. It runs on Linux.