Custom .traineddata file usage in the tesseract in R - r

I have a bunch of .JPGs with some text at the bottom, which consists mostly (but not exclusively of numbers). I wish to use the tesseract package in R to be able to 'read' the text in those .JPGs. Unfortunately, the base tesseract language proved too inaccurate to be worth using. Subsequently I tried using the Magick package to adjust the pictures (crop, resize convert etc) hoping to get a better reading from tesseract, but in my case this failed to get satisfactory results.
I eventually managed to use the description on this link (https://towardsdatascience.com/simple-ocr-with-tesseract-a4341e4564b6) to create a new custom language in Tesseract 4.1.1 (as downloaded from https://github.com/tesseract-ocr/tesseract), which I named font_name.traineddata. The custom-made font_name.traineddata works perfectly on the Tesseract 4.1.1 console and shows significant improvement in results on the base language.
The question I have is: How I get the font_name.traineddata file to be part of the ocr command in R? I have tried the simple solution of just pasting the font_name.traineddata file into the appropriate tessdata folder in the package tesseract (the same folder that also contains the standard english data file called eng.traineddata) and then trying the following:
font_name <- tesseract ("font_name")
ocr("C:/1.jpg", engine = font_name)
This does not work and gives the error :
Error in tesseract_engine_internal(datapath, language, configs, opt_names, :
Unable to find training data for: font_name. Please consult manual for: ?tesseract_download
tesseract_download seems to be of no use, as it is a helper function to download training data from the official tessdata repository. I have also tried renaming the file to a three character name, with the same error.
Does anybody have any suggestions on how to make custom .traineddata files work with ocr in R?

Related

Get variable data out of a group in a NetCDF file using RNetCDF or ncdf4

I am trying to access data from variables within a NetCDF file that contains hierarchical groups using R. For example:
I can't find anything about how to do this in the RNetCDF documentation - though this seems to be out of date online.
Latest version: https://www.rdocumentation.org/packages/RNetCDF/versions/2.6-1
Latest documented version: https://www.rdocumentation.org/packages/RNetCDF/versions/1.9-1
I am open to using ncdf4, though would rather do this in RNetCDF since I think the syntax is easier to read and I use R only for teaching purposes.
I can do this in Python using xarray - but need an R solution in this case. Thanks!

Edit the default PDF manual generated while building R package

I have succsesfully performed the below steps to create my own R package :
created skeleton of the package and pasted .Rd, NAMESPACE and DESCRIPTION files.
executed R CMD check package_name : no errors, it also generated 2 pdf's
One of which contains the output's from .Rd file examples and second is the PDF manual that comprises the documentation itself.
My question is how to make edits to this manual created, such as to change the font size or add an Introductory page to this manual? I read that roxygen / devtools might help but no resource on that was attained. I also went through the Writing R Extensions link that is available but couldn't help me.
Would there be a way using Rd2pdf? but such that even non .Rd files are also included

How to keep various-topic R script at hand?

I have written and collected R code on various topics that solve particular problems at hand. I stored the R script/code in .txt files. I have now 100s of them.
How do you keep your R code at hand efficiently?
#Manetheran has the right idea: write a package. It's easy to do (especially with RStudio). Read "Writing R Extensions" and then on top of that learn about roxygen2 (which allows you to document each function in-line and avoid writing .Rd files).
Then you can use devtools to load your package locally, or once it's stable if you think other people can use the functions you can submit your package to CRAN.
I prefer to keep it simple. I use Total Commander and when I need an example which uses some R function, I just do Alt-F7 and search for *.R files which contain the desired string.
I use RStudio and have created two or three basic scripts. I save my much-used functions in the basic script that is most appropriate. Then, at the start of an RStudio script for a project, I source one or more of the basic scripts as is appropriate.

How can i specify R code in the data files in R package

In my R package, I want R to load some data from an external server to main memory. Looking around i found the option of plain R files in the data file section of a package - http://cran.r-project.org/doc/manuals/R-exts.html#Data-in-packages.
But then i couldnt find literature on how the R code will work in the data file section of a package nor how to invoke it.
It would be useful if someone can point me to a sample package too.

Where is the .R script file located on the PC?

I want to find the location of the script .R files which are used for computation in R.
I know that by typing the object function, I will get the code which is running and then I can copy and edit and save it as a new script file and use that.
The reason for asking to find the foo.R file is
Curiosity
Know what is the algorithm used in the numerical computations
More immedietly, the function from stats package I am using, is running results for two of the arguments and not the others and have to figure out how to make it work.
Error shown by R implies that there might be some modification required in the script file.
I am looking for a more general answer, if its possible.
Edit: As per the comments so far, here is the code to compute spectrum of a time series using autoregressive methods. The data input is a univariate series.
x = ts(data)
spec.ar(x, method = "yule-walker") 1
spec.ar(x, method = "burg") 2
command 1 is running ok.
command 2 gives the following error.
Error in ar.burg.default(x, aic = aic, order.max = order.max, na.action = na.action, :
Burg's algorithm only implemented for univariate series
I did try specify all the arguments correctly like na.action=na.fail, order.max = NULL etc but the message is the same.
Kindly suggest possible solutions.
P.S. (This question is posted after searching the library folder where R is installed and zip files which come with packages, manuals, and opening .rdb, .rdx files)
See FAQ 7.40 How do I access the source code for a function?
In most cases, typing the name of the function will print its source
code. However, code is sometimes hidden in a namespace, or compiled.
For a complete overview on how to access source code, see Uwe Ligges
(2006), “Help Desk: Accessing the sources”, R News, 6/4, 43–45
(http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf).
When R installs a package, it evaluates all the ".R" source files and re-saves them into a binary format for faster loading. Therefore you typically cannot easily find the source file.
As has been suggested elsewhere, you can simply type the function name and see the source code, or download the source package and find the source there.
library(plyr)
ddply # prints the source for ddply
# See the content of the R directory for plyr,
# but it's only binary files:
dir(file.path(find.package("plyr"), "R"))
# [1] "plyr" "plyr.rdb" "plyr.rdx"
# Get the source for the package:
download.packages("plyr", "~", type="source")
# ...then unpack and inspect the R directory...
.libPaths() should tell you all of your current library locations. It's possible to have more than one installation of a package if there are two libraries but only the one that is in the first library will be used. Unless you offer the code and the exact error message, it's not likely that anyone will be able to offer better advice.
I think you are asking to see what I call the source code for a function in a package. If so, the way I do it is as follows, which has worked successfully for me on the three times I have tried. I keep these instructions handy in a few places and just copied and pasted them here:
To see the source code for a function in Program R download the package containing the function. Specifically, download the file that ends in "tar.gz". This is a compressed file. Expand the compressed file using, for example, "WinZip". Now you need to open the uncompressed file that ends in ".tar". Download the free software "7-Zip". Click on the file "7zFM.exe" and navigate to the directory containing the ".tar" file. You can extract the contents of that ".tar" file into a new folder. The contents consist of R files showing the source code for the functions in the R package.
EDIT:
Today (July 8, 2012) I was able to open the 'tar.gz' file using the latest version of 'WinZIP' and could copy the contents (the source code) from there without having to use '7-Zip'.
EDIT:
Today (January 19, 2013) I viewed the source code for functions in base R by downloading the file
'R-2.15.2.tar.gz'
To download that file go to the http://cran.at.r-project.org/ webpage and click on that file in this line:
"The latest release (2012-10-26, Trick or Treat): R-2.15.2.tar.gz, read what's new in the latest version."
Unzip the file. WinZip will work, or it did for me. Then search your computer for readtable.r or another base R function.
agstudy noted here https://stackoverflow.com/questions/14417214/source-file-for-r-function that source code for read.csv is located in the file readtable.r, so do not expect every base R function to have its own file.

Resources