How reduce the noise of a image? - r

I am loading text from some images. With some of them, I am having problems, with this type of image
library(magick)
library(tesseract)
image_read(fichero.jpg) %>%
tesseract::ocr(engine = tesseract("eng")) %>%
cat()
Result
I am assuming (correct me if not) that tesseract fail because of the low quality of the image (it is a scanned document), and I don´t know if there is a way to make the image better.
I tried also some convultion methods with several kernels, trying to reduce the noise of the photo, but it was worse.
Is there a way to handle this or I have to assume that is not possible to get the text in this quality-images?
Regards

Looking at this with the experience of a photographer rather than as a programmer, I would guess that the poor focus and camera jiggle make this image pretty well unreadable by most OCR options. I just used the OCR in Adobe Acrobat to play with it on my own PC and I could get "FECHA" to recognize, but not "NUMERO" and not any of the numbers.
I pulled it into a photo editor and messed around with the contrast, as sometimes it's possible to convert a grayscale image such as this to pure black-and-white and get rid of some of the fuzziness, but I couldn't produce a readable image in my quick-and-dirty experiment.
So realistically, you'll need images that are scanned/photographed with higher resolution and better contrast to get reliable OCR.

It looks like you are trying to create a cow from ground beef. The big problem is that JPEG is not suited for this type of non-photographic image. Your png looks fine because it is a lossless format.
If you don't want this problem, do not save the files as JPEG.

Related

How to use JuliaImages to create a smaller image given a starting image?

I just got done exploring the docs for JuliaImages found here. What I want to do is as follows:
I have an image. It is a map of sorts. It takes up a lot of space so I want to index into the image and create a new smaller image that is just essencially a zoomed-in version of the original image. I know I could do this manually, but I want to create a re-usable script that I can use to apply this operation to N number of images. How can I do this using JuliaImages?
If by "zoomed in" you mean focusing on a small portion of the image and making it look bigger, you can do this with ordinary array-indexing tools. For example, img[251:500,147:328] would extract a portion of the image.
If what you're really looking for is a thumbnail, my favorite approach is to use restrict. That is limited to 2-fold reductions. You can also imfilter (best with the IIRGaussian filters of ImageFiltering.KernelFactors) and then call imresize. But there will be no beating the performance of restrict.

How to enlarge map/plot in R?

I plotted a map in R but when I export it the size is very small. How can I enlarge the map and still save it as a picture? (I know that I can save it as PDF and then it's a vector graphic - but I need to copy it to Power Point and also need a transparent background - I don't think that is possible with a PDF isn't it?)
As you can see here the map is way to small to use it in a Power Point slide:
If someone knows a good way to save it as vector graph that I can easily use in Power Point that would be perfect as well.
The png() function lets you specify width and height size in the default resolution of "px" and the defaults are 480 and 480. You can also supply a res argument in units of "ppi". If you have text you probably ought to specify point size >= 20 for legibility. I generally save my graphs as PDF and convert to PNG with an external program. However, the latest versions of PowerPoint will accept pdf formats. It is also possible to save as an .eps format.

Converting webGL html to SVG

I am using R, with the misc3d and rpanel libraries, to create a 3d image in webGL. I then need to embed the image into a PDF via Latex.
The 3d image renders fine and looks great - but I'm thinking I need to convert the webGL HTML file into an SVG or some other kind of vector graphics file which can be embedded in Latex.
Any suggestions on how to accomplish this?
Maybe I am wrong but this way doesn't make sense to me.
You have 3D coordinates
You render objects based on 3D coordinates to a 2D rasterized image using webGL
Then you want to extract 2D vector coordinates from the rendered image?
From the webGL framebuffer you can get the rasterized data (no vector information). So it is like converting a rasterized image (like png) to SVG. Since there is no way (I know) to get the vector information back from a rasterized image chances are high that the image will just be embedded in the SVG file. This wouldn't be a real benefit compared to a rasterized image.
Maybe you can use the vector informations (which you are using for drawing the webGL image) for drawing to a SVG image directly?
Is there a reason you can't use the rgl package instead (I'm not really familiar with rpanel, but I'm pretty sure that misc3d was originally design to work in conjunction with rgl) and use rgl.postscript(...,fmt="pdf") to export directly to PDF? rgl.postscript also offers an SVG option. The results are admittedly a little wonky sometimes (the underlying package it uses isn't completely reliable), but it's definitely the past of least resistance.
Also, I haven't tried it out myself, but I think the following article gives some information about embedding rgl images in their full, rotatable glory into PDFs: Levine, Richard A., Luke Tierney, Hadley Wickham, Eric Sampson, Dianne Cook, and David A. van Dyk. 2010. “Editorial: Publishing Animations, 3D Visualizations, and Movies in JCGS.” Journal of Computational and Graphical Statistics 19 (1) (January): 1–2. doi:10.1198/jcgs.2010.191ed. http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2010.191ed.

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Create CSS sprites based on colour?

I have a large set of thumbnails I wish to display on a page (over 200). I'd like to use CSS sprites to load them to minimise the HTTP requests. I think putting all of them in one massive file is a bad idea, but splitting them into about 6 files of 40-50 thumbnails should work nicely.
All of the thumbnails are fairly low colour (can be reduced to 256 colours without quality drop), but in total all the thumbnails cover a lot more colours.
So, is there an easy way to group them based on their colour? Putting each group of files in separate folder is fine, since I can stitch together with ImageMagick or an online sprite tool later. But doing all of that in one (with CSS) would be nice too.
Update: the reason for grouping by colour:
The idea is to save more bandwidth. If I have 10 mostly-blue thumbnails, 10 green and 10 red, I can combine them to 3 images, reducing each to 256 colours. If I mix thumbnails then reducing to 256 colours makes the images poorer quality.
Firstly, I would suggest not worrying too much, and saving as a 24bit png. It may seem that the image is getting a lot bigger by doing this, but if the thumbnails are small you'll probably find that there is a large amount of bandwidth currently being using with just http headers that will go away that you can use to make your images look better.
However, if you want automate the process you could try working out the average colour (one way of doing something close to this is to resize them to 1x1, then look at the rgb colour for that pixel). Once you have a colour per images, convert to hsv and sort by hue. You can then bundle them based on that sort order. I've not actually tried this, but it may produce acceptable results.
Adjusting the number of images that get bundled will also effect the output quality. If it sucks when you put 30 images per file, try 25 and see how much difference it makes. Actually, might be smarter to think about the number of files...
Put them all into a single file.
Does it look bad, as there aren't enough colours?
add one extra file and split them equally across all the files. Goto step 2.
Well I did some testing by grabbing a sample by hand of one "tint" and comparing it to a montage created by just taking the first N images. There was only a difference of a few kilobytes, which was reduced to about 30 bytes after I found PNGcrush. Fanastic tool!
So in short, my crackpot idea has been disproven. :p
Now, this is nothing more than theoretical blabbering, but, I understand that animated GIFs have support for a distinct color palette per frame. Theoretically, you could place each image on a separate frame of the animation (leaving most of that frame transparent), and set the pause time between frames to 1ms. Set the animation to only go through once, and you could (potentially) have an effective CSS sprite with reduced to 256 colors per image.
Like I said, theoretical blabbering.

Resources