Is it possible to import a raster of a PDF file? - r

Our office does scanning of data entry forms, and we lack any proprietary software that is able to do automated double-entry (primary entry is done by hand, of course). We are hoping to provide a tool for researchers to highlight regions on forms and use scanned versions to determine what participant entry was.
To do this, all I need for a very rough attempt is a file to read in PDFs as raster files, with coordinates as X, Y components, and B&W white "intensities" as a Z-axis.
We use R mainly for statistical analysis and data management, so options in R would be great.

You could use the raster package from R. However, it doesnt support .pdf files, but .tif,.jpg,.png (among many others).
But coverting your pdfs into pngs shouldn't be a big problem: Look here for more information.
Once you have your png files ready, you can do the following:
png <- raster("your/png/file.png")
and then use the extract() function to get your brigthness value from the picture. I.e. let's say your png is 200x200px and you want to extract a pixel value from row 100 and column 150:
value <- extract(png, c(150,100))

Related

PDF image is 1.3 MB, but after removing text in Illustrator becomes ~35 Mb (still vector in PDF)

I have created a figure for my scientific work using ggplot (an R package for plotting data). It's a scatterplot that contains ~25.000 data points in a normal x-y-style plot. Each data point has a border and a color fill. The output vector PDF is 1.3 Mb in size. Now, I would like to make some final adjustments regarding font size and text position and merge it with other panels in a bigger figure which I normally do in Illustrator. So I add/embed the scatterplot to the rest of my figures which nicely loads all elements correctly. However, when I then simply save this file as .ai or .pdf, the output will be more than ~30 Mb. How is it possible, that all elements are still preserved in the original (small) PDF, but after Illustrator it is inflated to much? It is critical for me to keep the file size small.
I tried many things, including different PDF exporting options in Illustrator and macOS Preview PDF file compression, but nothing worked. I even tried merging all those ~25.000 overlapping dots together in one or at least few shapes, but either Illustrator crashes in the process (Illustrator > Pathfinder unite/merge) or the resulting PDF shows some erratic behaviour, i.e. become black/white in Word (Illustrator > Flatten Transparency) What am I missing here?
Any help is appreciated!
When saving, make sure you're not enabling Illustrator editing capabilities. Leaving Illustrator editing capabilities enabled will essentially cause a copy of the Illustrator file (as an AI version) to be written into the PDF that's being saved. This often causes the PDF to increase dramatically in size, especially for files with many vector or path elements.
I had the same issue. What worked was me was this:
Export as eps instead of pdf in ggplot. You may need to use device=cairo_ps as an option (I did).
In Adobe Illustrator, create a new document and select the web option
Combine all your figures into this new figure by dragging and dropping them there. Use "Embed" to embed those figures into the new one.
Make all changes you need
Save as pdf with default options (I used preset {Smallest File Size (PDF 1.6)}.
This preserved the small file size for me. I think the only thing that matters here is the use of eps instead of pdf when exporting from ggplot.

The dilemma of plot saving formats - R/base plots

In my research work, when papers are to be communicated, the format could be either LaTeX or DOC/DOCX. This sends me into a dilemma.
I have generated PDFs (they can be easily included into a latex file) for certain plots using the base plot method. However, I would also like to have the PNG versions of the same plots (since MS Word does not accepts PDFs), and no, I do not want to rewrite codes! Further, the convert utility of ImageMagick is also not a preferable option, as there is a severe degradation in terms of resolution when one executes convert myFile.pdf myFile.png
What is the best way? Can we save a plot into a variable and then regenerate the plot to a png / jpg / tiff file?
Save to EPS format (see HowTo here). It is a vector format, and it should be recognizable by MS Word (you will need to import it as a picture) as well as LaTeX.

How to use R to read Excel, create tables, get formatted tables back into GIS with coordinate locations

I'm new to R - which will be obvious in a sec here...and I was hoping someone could point me to the some packages to attempt to solve a specific problem:
I get Excel tables from scientists with analytical data for specific GIS point sample locations, we usually copy/paste these tables into the layout of GIS map documents; however quite often the line weights, fonts, etc get messed up...and the data gets updated/revised etc. - tedious copy/paste again...
I'd like to try to read these Excel files, have R create a multiple formatted tables for each sample location, and plot these tables with real world coordinates for use in GIS (ESRI or QGIS, etc.), where the tables would ideally show up offset some distance from the point sample locations in some sort of GIS file format.
I was thinking the export from R might be a .dwg, or even a raster geotiff with a transparent background...a format that would preserve formatting and position - not sure what the possibilities here could be...has anyone ever tried anything like this - I see several excel and geospatial packages, and understand that they can be used for regular geospatial data analysis, but in this case I'm trying to merge graphics (formatted tables from R) and GIS - which is something I'm having a hard time finding any info about.
Hopefully this question is not too vague...
edit - I have the SP package and am reading up on it, I guess I'm really stuck on the whole make several tables with R > get those tables all at once into a format that GIS can read - try this - imagine a georeferenced aerial photo - then imagine a layer of floating boxes on top of the aerial image, these boxes are placed with coordinates (i.e. lat/long, state plane feet, etc.) - can I make a layer like this with R and the geospatial packages?

Successive pictures on R

I have a code to plot a world map with a meteorological field for one moment (or one measure).
Is it possible to successively plot the map for different moments (for i from 1 to 125) in order to view a sort of video when we run the code?
Yes, look at the animation package.
It can creates an animated gif for you (as well as other tricks). There are live examples you can look at as eg Buffon's needle, a CLT demo and much more.
The package abstracts away some of the OS-dependent layers. If you know the basics, you can of course just call the corresponding tool from the imagemagick project which is likely to be available on OS of choice too.

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Resources