Extracting receipt data line by line in a certain format - google-cloud-vision

How do I extract text from a receipt line by line with Google Vision?
I have tried matching with the Y coordinates but the result are not that ideal.
Is there other alternative to the approach?

You should use DOCUMENT_TEXT_DETECTION in the JSON request and in the fullTextAnnotation you will have \n
If you consider open sources, this git may be useful for line segmentation ,
line-segmentation-algorithm-to-gcp-vision

Related

Using com.opencsv.CSVReader on windows stops reading lines prematurely

I have two files that are identical except for the line ending codes. The one that uses the newline (linux/Unix)character works (reads all 550 rows of data) and the one that uses carriage return and line feed (Windows) stops returning lines after reading 269 lines. In both cases the data is read correctly up to the point where they stop.
If I run dos2unix on the file that fails, the resulting file works.
I would like to be able read CSV files regardless of their origin. If I could at least detect that the file is in the wrong format before reading part of the data that would be helpful
Even if I could tell at any time in the middle of reading the file that it was not going to work, I could output an error.
My current state of reading half the file and terminating with no error is dangerous.
The problem is that under the covers openCSV uses a BufferedReader which reads a line from the stream until it gets to the Systems line.seperator.
If you know beforehand what the line separator of the file is then in your application just do a System.setProperty("line.separator", newLine) where newLine is either "\n" or "\r\n" based on the file you are about to parse. Or you can pass that in as a parameter.
If you want to automatically detect the file character. Create a method that will take the file you want, create a BufferedReader and read a single line. If the last character is a '\r' then your system system uses "\n" but you want to set it to "\r\n". Else if line.contains("\n") returns true then you are on a system that uses "\r\n" and you want to set it to "\n". Otherwise the system and the file you are reading have compatible line feed characters.
Just note if you do change the system line feed character be sure to set it back after processing the file in case your program is processing multiple files.

Treating "#" as a regular character when reading data

I'm almost certain this has been asked before but due to a certain social media app I drowning in unrelated search results.
So the data set that I'm importing contains actual "#", as in Apartment #404, and I'd like to if possible preserve the character but R thinks it's an end of line or something. At first it would bomb out on the first occurrence, then I set fill=TRUE and now it just ignores the rest of the line after that.
How does one instruct R to treat #'s as regular characters?
If you are not using "#" as a comment symbol in your data, you can use
read.table(..., comment.char="")
That should treat "#" like any other character.

hackerranks's text processing units in UNIX -- how to take input?

I am trying to learn text processing units in Unix through hackerrank.com's practice problems.
One of the problems is this: https://www.hackerrank.com/challenges/text-processing-cut-4
But I don't understand how to take inputs from standard output in unix, so I'm having trouble submitting my answer. Can you please let me know how to get started with taking inputs? Thanks
In the question we don't have to take input, we have to just display the first four characters from each line of text.
solution:
cut -c1-4
i know that this is an old question, but I'll answer anyway, so everyone can have a response, if they need it.
As you can read on the man page, the synopsis is cut [OPTION]... [FILE]..., so the FILE argument is optional (because it's surrounded by square brackets).
In addition:
With no FILE, or when FILE is -, read standard input.
So if you simply call cut command withoud any file specification, it will read from STDIN.

Is there a way to output line separator values in sqlite text output

I've got text like this in one of my sqlite table columns:
Mantas are found in temperate, subtropical and tropical waters. Both species are pelagic; M. birostris migrates across open oceans, singly or in groups, while M. alfredi tends to be resident and coastal. They are filter feeders and eat large quantities of zooplankton, which they swallow with their open mouths as they swim. Gestation lasts over a year, producing live pups.
Mantas may visit cleaning stations for the removal of parasites. Like whales, they breach, for unknown reasons.
The last two lines are broken from the previous by either a \r or \n. I want to be able to see the actual value of \r or \n in the shell output of of the column. Any ideas?
There doesn't seem to be any way to directly do this in the SQLite shell, but you can use .output <file> to output the result to a file, then use a text or hex editor to see what the line endings are.

R: Extract value and lines after key word (text file mining)

Setting:
I have (simple) .csv and .dat files created from laboratory devices and other programs storing information on measurements or calculations. I have found this for other languages but nor for R
Problem:
Using R, I am trying to extract values to quickly display results w/o opening the created files. Hereby I have two typical settings:
a) I need to read a priori unknown values after known key words
b) I need to read lines after known key words or lines
I can't make functions such as scan() and grep() work.
c) Finally I would like to loop over dozens of files in a folder and give me a summary (to make the picture complete: I will manage this part)
I woul appreciate any form of help.
ok, it works for the key value (although perhaps not very nice)
variable<-scan("file.csv", what=character(),sep="")
returns a charactor vector of everything
variable[grep("keyword", ks)+2] # + 2 as the actual value is stored two places ahead
returns characters of seaked values.
as.numeric(lapply(variable, gsub, patt=",", replace="."))
for completion: data had to be altered to number and "," and "." problem needed to be solved.
in a line:
data=as.numeric(lapply(ks[grep("Ks_Boden", ks)+2], gsub, patt=",", replace="."))
Perseverence is not to bad of an asset ;-)
The rest isn't finished, yet, I will post once finished.

Resources