convert jpg to greyscale csv using R - r

I have a folder of JPG images that I'm trying to classify for a kaggle competition. I have seen some code in Python that I think will accomplish this on the forums, but was wondering is it possible to do in R? I'm trying to convert this folder of many jpg images into csv files that have numbers showing the grayscale of each pixel, similar to the hand digit recognizer here http://www.kaggle.com/c/digit-recognizer/
So basically jpg -> .csv in R, showing numbers for the grayscale of each pixel to use for classification. I'd like to put a random forest or linear model on it.

There are some formulas for how to do this at this link. The raster package is one approach. THis basically converts the RGB bands to one black and white band (it makes it smaller in size, which I am guessing what you want.)
library(raster)
color.image <- brick("yourjpg.jpg")
# Luminosity method for converting to greyscale
# Find more here http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale/
color.values <- getValues(color.image)
bw.values <- color.values[,1]*0.21 + color.values[,1]*0.72 + color.values[,1]*0.07
I think the EBImage package can also help for this problem (not on CRAN, install it through source:
source("http://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
color.image <- readImage("yourjpg.jpg")
bw.image <- channel(color.image,"gray")
writeImage(bw.image,file="bw.png")

Related

R - Import then export multiband RGB aerial image .tif

I want to import an aerial image in .tif format, and then export in the same colour format. I imagine this is a simple task but I can't find a way to do it. The original files are coloured when viewed in Windows Photo Viewer and Google Earth, but the exports are black and white.
Ultimately, I want to crop/merge multiple images to create a single aerial image map from approx 6 tiles. Of the 6 tiles, I am only interested in about 30% of the area (the bits over the water, plus a ~20 m buffer), so the idea is to cut down on image size by keeping the parts I want as a single file, rather than importing all 6 tiles. I can open one .tif in Google Earth as a coloured aerial image, which is what I want for my custom-boundary map/image.
At the moment, I am struggling to import then export a single .tif in the same Google Earth-readable coloured image format. I'm importing the files into R using raster, and have tried exporting using writeRaster, but the images are black and white when viewed in GE. I believe this might mean the image is rendering only a single (RGB) layer of the image? However, plotRGB() in R is able to plot it in colour.
You can download the file I'm working with from my Google Drive, or search for it on Elvis (satellite image data from inside the Australian Capital Territory, Australia, approx -35.467437, 148.824043). Thanks for any help or direction to some instructions.
This is where I'm at so far...
# import and plot the tif - plots nicely in colour
brick('ACT2017-RGB-10cm_6656073_55_0001_0001.tif') %>%
plotRGB
This is what I see from plotRGB(), and also when I open the original in Google Earth (this is the desired output colour).
# export
brick('ACT2017-RGB-10cm_6656073_55_0001_0001.tif') %>%
writeRaster('my_output.tif')
# then import the export
brick('my_output.tif') %>%
plotRGB
my_export.tif plots in colour in R, but black and white in Google Earth.
Here is how you can do that with terra (the replacement for raster). For this example to work well, you need version 1.4-1 or higher; currently the development version, that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
library(terra)
f <- 'ACT2017-RGB-10cm_6656073_55_0001_0001.tif'
r <- rast(f)
plot(r)
Since this is a "RGB" image, there is no need to call plotRGB explicitly
I create two sub-images for this example
ex <- ext(665000, 665500, 6073000, 6073500)
x <- crop(r, ex)
ey <- ext(665500, 666000, 6073500, 6074000)
y <- crop(r, ey)
They still look good with
plot(x)
plot(y)
Now merge them
m <- merge(x, y)
After merge, the RGB, channels are lost and need to be redeclared
RGB(m) <- 1:3
And you can write to disk
z <- writeRaster(m, "test.tif", overwrite=TRUE)
plot(z)

R grid arrange tiff microscopy RGB

I have a RGB tiff files (from cellProfiler) which I want to import to R, label and arrange - as part of a high throughput analysis. The closest I get is using:
library(tiff)
library(raster)
imageTiff <- tiff::readTIFF(imagePath[i])
rasterTiff <- raster::as.raster(imageTiff)
raster::plot(rasterTiff)
raster::plot plots the image nicely but I can't catch the output and use it with gridExtra or add labels.
In addition I tried rasterVis with levelPlot and multiple other ways importing the tiff and then converting them to grob or ggplots.
However, I can't get anything to work and would like to ask if R is even suited at all for that task?
Thank you very much for your help!
Okay, I think that is the most straight forward way and possible also the most obvious one.
I import JPEG or TIFF files with jpeg::readJPEG or tiff::readTIFF respectively. Both transform the images to a raster format which is compatible with rasterGrid() and following grid.arrange etc.
library(jpeg)
library(tiff)
library(grid)
imageJPEG <- grid::rasterGrob(jpeg::readJPEG("test.jpeg"))
imageTIFF <- grid::rasterGrob(tiff::readTIFF("test.tiff"))
grid.arrange(imageJPEG , imageJPEG , imageJPEG)
grid.arrange(imageTIFF , imageTIFF, imageTIFF)
For my purpose that is perfect since tasterGrob does not alter the raster matrix values. Labeling might be a bit tricky but overall it is a grid/grob problem from here on.

Is it possible to import a raster of a PDF file?

Our office does scanning of data entry forms, and we lack any proprietary software that is able to do automated double-entry (primary entry is done by hand, of course). We are hoping to provide a tool for researchers to highlight regions on forms and use scanned versions to determine what participant entry was.
To do this, all I need for a very rough attempt is a file to read in PDFs as raster files, with coordinates as X, Y components, and B&W white "intensities" as a Z-axis.
We use R mainly for statistical analysis and data management, so options in R would be great.
You could use the raster package from R. However, it doesnt support .pdf files, but .tif,.jpg,.png (among many others).
But coverting your pdfs into pngs shouldn't be a big problem: Look here for more information.
Once you have your png files ready, you can do the following:
png <- raster("your/png/file.png")
and then use the extract() function to get your brigthness value from the picture. I.e. let's say your png is 200x200px and you want to extract a pixel value from row 100 and column 150:
value <- extract(png, c(150,100))

how to import an xy plot in png format, and sample it

I have an xy plot (time v.s synthetic index price) in png format. Because I cannot find the raw data with which I used to generate this plot. I wanna sample this plot and get some estimation.
Unfortunately, I've never deal with importing image files and processing the data, would you please give me some hints? which package, function, or any useful suggestion?
basic info:
file format: png
img background: white
line color: black
Thanks for any hint
I strongly recommend you not try to do this in R. There are several good tools for extracting data from a plot. My favorites are datathief http://www.datathief.org/ and engauge http://digitizer.sourceforge.net/. Run those, then import the data (ascii text) they generate into R

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Resources