Error in tesseract_engine_internal, Unable to find training data - r

I have a problem with (I am using Windows 10) running library(tesseract) which shows Warning message:
Unable to find English training data.
I have downloaded "eng.traineddata" from https://github.com/tesseract-ocr/tessdata
While try to run
eng <- tesseract("eng")
It displays an error:
Error in tesseract_engine_internal(datapath, language, configs, opt_names, :
Unable to find training data for: eng. Please consult manual for: ?tesseract_download

You've probably used legacy, incompatible traineddata file. You'd need either tessdata_fast or tessdata_best data.
https://github.com/tesseract-ocr

With R4.1 I had to create the file "C:\Program Files (x86)\Tesseract-OCR" and add to it the eng.traineddata file downloaded from https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata.

Related

How to solve "bad restore file magic number" when trying to load data?

I tried to load data to my R working directory and receive this error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘classize.RData’ has magic number 'RDX3'
Use of save versions prior to 2 is deprecated
I googled it and tried many options, unsuccessfully.
My Rstudio version is: 1.2.5033 (The error was happening before updating as well)
I create a new project, in the new directory, I put the data file
The data file is "classize.RData"
I have another alternative which is "classize.RDS" with the sugesstion to use readRDS(file = "classize.RDS"). When using this command, I receive that error:
cannot read workspace version 3 written by R 3.6.1; need R 3.5.0 or newer
This is in the context of a statistical course at university and my teacher assistant is unable to help me out, and whitout resolving this issue, I cannot move forward in the resolution of the needed exrecices. So please, couly you help me resolve that problem.
ps: all the students have access to the same data, It's just for me that it's not working, therefore the file should not be corrupted.

R installed.packages() randomly stopped working on windows 7

installed.packages() command in R lists your installed packages. Mine was working for almost a year and then this command randomly started throwing an error. As this is a built-in command, I am not even sure how to "reinstall" it or address this. Any ideas how to fix the error and get the command working again?
> installed.packages()
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
cannot open compressed file `'C:\Users\Mitch\AppData\Local\Temp\Rtmp6Dawpa/libloc_190_4464fd2b.rds', probable reason 'No such file or directory'`
One suggestion on here involved this in combination:
.libPaths()
installed.packages(lib.loc = 'my path')
The results of this produced yet another error as shown here. Looks like an issue with the installed file still but how to address is the question:
> installed.packages(lib.loc = 'C:/ProgramFilesCoders/R/R-3.3.2/library')
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
cannot open compressed file 'C:\Users\Mitch\AppData\Local\Temp\Rtmp6Dawpa/libloc_190_4464fd2b.rds', probable reason 'No such file or directory'
>
That is odd.
What version of R are you running, standard R or Microsoft R? And did you recently update?
If you did recently update, perhaps your packages did not get copied over, hence the 'No such file or directory' statement.
If you haven't updated, I would install a newer version and see if it fixes the issue.
If your uncertain, you can always use the updateR function to check if you have the latest version and choose to install it or not.
library(installr)
updateR()
Good luck,
I think the issue lies in terms of the where the function is looking for the package information. installed.packages() needs an argument lib.loc.
From official documentation
lib.loc character vector describing the location of R library trees to search through
Looks like the function for some reason is looking in AppData\Local\Temp which is the download location and not the installed location.
Without looking at your R_Home and .libPaths() is difficult to nail down where the problem is, however running .libPaths() should give you one or more paths as shown in the below example. None of these should be temp locations.
>.libPaths()
[1] "C:/Users/UserName/Documents/R/win-library/3.4"
[2] "C:/Program Files/R/R-3.4.0/library"
If not, you can set the path within the .libPaths("your path") or pass the path of the library as part of installed.packages(lib.loc = 'your path') and try again.
Sometimes the most simple obvious solution is what works:
I closed my RStudio environment saving it to .RData
I re-opened RStudio and tried the command again
it worked
For the future, some good ideas got posted on here before I thought to try the above. Here are the suggestions that others included in case the above does not work if this problem is encountered by anyone in the future:
Use .libPaths() to find out proper path where this is installed, and then re-run the command with the path included in it like so: installed.packages(lib.loc = 'your path')
Try debugging it with: debug(installed.packages); Expectation is that we will likely find something wrong with .readPkgDesc(lib, fields) while stepping through debug. This was not tried yet so you may encounter things not written up here when you do try it.
Try Updating R in case it is out of date with these commands: library(installr) and updateR().

Error reading a .nc4 file in R (ncdf4 package)

I am trying to use a data set of .nc4 files downloaded from NASA.
The format NCDF4 is confirmed by this source.
I used download .file in R to get the database and then a simple nc_open (ncdf4 package) to test the file. Unfortunately the result is an "Unknown file format" error.
Here my replication file and my script:
download.file (url=http://hydro1.gesdisc.eosdis.nasa.gov/.../url, destfile=destination_folder/file.nc4)
All fine till this point, but when testing the files:
library(ncdf4)
setwd('destination_folder')
data <- nc_open('file.nc4')
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open("file.nc4") :
Error in nc_open trying to open file file.nc4
Am I missing something?
Thank you.
I do not know what is wrong, but I can add the information that the problem resides in the Windows implementation of the ncdf4 package. With the following statement:
catlg<-nc_open("http://opendap.deltares.nl/thredds/dodsC/opendap/rijkswaterstaat/waterbase/concentration_of_suspended_matter_in_water/catalog.nc")
I have the same problem as described in the question. However, it works perfectly in R under Linux
The file server is an OpenDAP server strictly following netcdf 4 conventions, but maybe some features are not correctly implemented in the ncdf4 package under Windows
for some reason I get the same error using [64-bit] C:\Program Files\R\R-3.4.2), but when using [64-bit] C:\Program Files\R\R-3.3.3 the ncdf4 package works fine.
not that this solves the problem, but it provides an easy work around for the time being.

Cannot find function readDICOM, though oro.dicom is installed

I am struggeling with reading DICOM files in R. I have installed the oro.dicom package, using:
install.packages("oro.dicom", repos="https://cran.cnr.berkeley.edu/")
I have set the working directory to where the files are located.
When trying to read a dicom file, using...
slice=readDICOM("IM-0001-0011.dcm")
... I get the following error message:
Error: could not find function "readDICOM"
Can someone help?
Thank you,
Lena
You should load package with library(oro.dicom) fisrt before you will use functions from that package,

Error when using getGEO() in package GEOquery

I'M running the following code in R:
library(GEOquery)
mypath <- "C:/Users/Farzin/Desktop/BIOC"
GDS1 <- getGEO('GDS1',destdir=mypath)
But I'm getting the following error:
Using locally cached version of GDS1 found here:
C:/Users/Farzin/Desktop/BIOC/GDS1.soft.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument
Could anyone please tell me how I could get rid of this error?
I have had the same error using GEOquery (version 2.23.5) with R and Bioconductor from ubuntu (12.04), whatever GDS file I queried. Could it be that the GEOquery package is faulty ?
In my experience, getGEO is extremely finicky. I commonly experience issues connecting to the GEO server. If this happens during download, getGEO leaves a partial file. But since the partial file is there, when you try to re-download, it will use this cached, partially downloaded file, and run into the error you see (which you want, because its not the full file).
To solve this, delete the cached SOFT file and retry the download.

Resources