I am referring to " https://github.com/keensoft/alfresco-simple-ocr" to perform OCR on tiff and jpeg files but is saying "Couldn't find trailer dictionary","Couldn't read xref table"," exception Failure("Error: pdfinfo could not determine number of pages. Check the pdf input file.\n")" although the transformation from jpeg or tiff files to PDF files is working properly and the PDF file is visible on the alfresco share page" but no OCR is working on those tiff and jpeg files
Basically there are many tools which are used for performing the OCR on pdf files.It depends on the tool as well.There is one bug in alfresco.It is an library issue.Below are details of that.
Create file called transformation.sh and before adding your command in it you have to add below line in it.If you are using windows you need to create batch file accordingly.
unset LD_LIBRARY_PATH
If you are not setting above in the script file you will face an error while conversation.You can find that bug details on below link of alfresco.Its registered issue in alfresco.
https://issues.alfresco.com/jira/browse/ALF-19946
PDF to PDF conversation are very well explained in below link.
http://www.krutikjayswal.com/2016/07/ocr-on-pdf-file-in-alfresco.html
You might need to change the source code for tiff conversation.
Related
I read a jpg image into R as well as all of its EXIF metadata.
Then I do some manipulation to the image and write the output as jpg.
How could I copy all EXIF information to the new jpg file?
(I understand I will need to modify only few tags).
I find very useful information on how to read the EXIF metadata in R
(eg, packages Thermimage or exiftoolr), but not on writing them.
The exiftoolr package allows you to call exif_call() to run any command that the underlying exiftool can do. Examples are included here: https://exiftool.org/examples.html. So if you wanted to run the following command in R you would translate
exiftool -artist="Phil Harvey" -copyright="2011 Phil Harvey" a.jpg
into
exiftoolr::exif_call(args=c(
'-artist="Phil Harvey"',
'-copyright="2011 Phil Harvey"'
),
path="a.jpg")
I am trying to run the tutorial for written exams in the exams package (http://www.r-exams.org/tutorials/exams2nops/). Everything works fine until I am about to process the scanned documents using the nops_scan function, which normally should create a zip file.
In the console it says "Creating ZIP file", but nothing happens after that. Last output:
> nops_scan(dir = "nops_scan")
Loading required namespace: png
Reading PNG files:nops_scan1.png: Trimming PNG, rotating PNG,
extracting information, done.nops_scan2.png: Trimming PNG, rotating
PNG, extracting information, done.
Creating ZIP file:
... and then nothing happens.
I have tried to run dir("nops_scan") and it confirms that no zip file has been generated and placed in this folder.
The files in the tutorial are png-files, so what the tutorials says about running pdftk and ImageMagick, should not apply. From the tutorial: "Note that if there were PDF files that need to be scanned, then the PDF toolkit pdftk and the function convert from ImageMagick need to be available outside of R on the command line."
Could the problem still be related to the above comment about pdftk or ImageMagick? (Which program is used to create the zip files?) I do not know how to make these programs "available outside of R", so instruction on this would be appreciated!
The base zip() function from the tools package is used, see ?zip. If you are on Windows maybe you need to install the Rtools? These are available from CRAN at https://CRAN.R-project.org/bin/windows/Rtools/.
PDFTk and ImageMagick are not involved in this case, they are only needed to convert PDF files to PNG which can then be processed in R. (And just in case anybody else is looking for this information: http://www.R-exams.org/tutorials/installation/ provides links to installation files for these applications.)
I am trying to read in the JPEG table from a TIFF file to locate sub-images in the TIFF file. (This is coming from a whole slide image svs file and I am trying to delete the label and macro image.) The JPEG table is hex encoded and I can't figure out to turn it to readable information to locate the sub-images.
I have tried unpacking the values. I don't want to save the file and open in Linux. I want to do this from within a jupyter notebook. I've tried for a while using "unpack" from IO core tools which didn't work. I also briefly tried BeautifulSoup, but it tells me that there is an invalid start byte. Here's the first line I am trying to decode:
b'\xff\xd8\xff\xdb\x00C\x00'
This line should return something like "JPEG image file..." I think if I can translate this line I can do the rest of this JPEG table.
Used a python TIFF package to help find the pages of the TIFF file I was looking for.
I am trying to download 'Landsat.rar` file (included 6 Landsat bands) and unzip it directly in r, but It doesn't work as I expected. Thank you for your help!
library(raster)
ls_url<-"https://github.com/tuyenhavan/Landsat-Data/blob/LS7/Landsat.rar"
temp<-tempfile()
download.file(ls_url,temp)
unzip(temp,"tif$")
myls<-stack("tif$")
Especially if you are using Windows, it might be that you need to use the binary mode in download.file:
download.file(ls_url, temp, mode="wb")
otherwise the file gets corrupted.
Also, the URL you are using is incorrect. You used the one for the web interface. If you want to get the file itself you need to use (check the link associated with the "Download" button):
https://github.com/tuyenhavan/Landsat-Data/raw/LS7/Landsat.rar
Finally, unzip() doesn't know how to deal with rar archive files. If you created this archive yourself, use the zip format instead; or unrar the file with another program (that you could call from R using system()).
Is there any parser tool for HAR(Http archive) which generates csv or excel output of page loading times? I know there are HAR viewer but I need the output as csv for plotting.
Note: It is easier to write a parser and generate the csv output(which I have done) but before reinventing the wheel, I just want to check for existing tools.
Yes, you got the har2csv (command line tool), here, and you can pull the zip file here.