Google cloud vision API error reading pdf - google-cloud-vision

I am currently trying to process a large pdf document using google cloud vision API. When reading the document I am receiving an error that says "json_format.Parse( error". I have attached my code below. How can I fix this?
Code

You are getting the error on that line of code because you are trying to pass json_string with type: <class 'bytes'> and a non existent object vision.types.AnnotateFilesResponse() to json_format.Parse() that requires:
google.protobuf.json_format.Parse(text, message,ignore_unknown_fields=False, descriptor_pool=None) Parses a JSON
representation of a protocol message into a message.
Parameters:
text – Message JSON representation.
message – A protocol buffer message to merge into.
ignore_unknown_fields – If True, do not raise errors for unknown fields.
descriptor_pool – A Descriptor Pool for resolving types. If None use
the default.
Returns The same message passed as argument.
Raises:: ParseError: On JSON parsing problems.
Since your goal is to read the response from your async_batch_annotate_files(), the JSON response from this method will be saved to the defined Cloud Storage Bucket output location. You can just read and parse the data in json_string by converting it to a dictionary. You can then work you way in the dictionary by referring to AnnotateFileResponse reference. using the code below:
output = blob_list[0]
json_string = output.download_as_string()
response = json.loads(json_string)
first_page_response = response['responses'][0]
annotation = first_page_response['fullTextAnnotation']
print('Full text:\n')
print(annotation['text'])
NOTE: Just make sure that you are getting the correct JSON response file (output = blob_list[0]), else the parsing of results will yield and error.

Related

Firebase Realtime DB "orderBy must be a valid JSON encoded path"

I'm writing a REST query for an app but I'm suddenly experiencing an error I've never gotten before. When I try to sort responses by their timestamp I get the error:
error: "orderBy must be a valid JSON encoded path"
My URL looks like https://{db url}.firebaseio.com/users/{user id}/surveys.json?auth={auth token}
My rules are set up like this:
And database is structured like this:
If I add ?orderBy="timestamp" the error shows up.
I am using correct quotation marks in query and have data indexed by timestamp in my rules. What could be happening here? Why would this suddenly no longer work after using it for a long time?
If you are using curl then
curl 'https://{db url}.firebaseio.com/users/{user id}/surveys.json?auth={auth token}&orderBy="timestamp"'
for fetch url :
https://{db url}.firebaseio.com/users/{user id}/surveys.json?auth={auth token}&orderBy="timestamp"

Why MS Graph is truncating the JSON response?

I'm processing M365 mailbox messages via MS Graph. I'm using .Net5 and the latest version of MSGraph SDK for .NET; (particularly the PageIterator for processing email messages) - but i'm actually experiencing the issue even via a pure call via Postman: in some cases the response is just truncated abnormally (hence the response JSON could not be parsed).
One example: ~56k messages are processed successfully, then during trying to get a next page by the iterator (for me seemengly randomly; some mailbox around 56k some at 78k, but almost always 50k+) i got a JSON parsing exception (sometimes unclosed string, sometimes unexpected char).
If i take the actual next page link from the iterator while catching the exception, i can reproduce the issue in Postman; the response is truncated.
In case i query the single message that is truncated separatelly via its id then the full message is available in the response.
An example call which fails has the response payload JSON truncated (but the call actually succeeds with HTTP 200):
https://graph.microsoft.com/v1.0/me/messages?$orderby=receivedDateTime+ASC&$select=ToRecipients,CcRecipients,Subject,From,Body,HasAttachments,ReceivedDateTime&$expand=attachments($select=name)&$top=32&$skip=57454
The end of the result json:
"#odata.etag": "someetaghere",
"id": "someiidhere",
"receivedDateTime": "2017-03-19T09:15:42Z",
"hasAttachments": false,
"subject": "Fwd: Contrat Morval",
"body": {
"contentType": "text",
"content":"Some text just an example which ends somewhere in the middle of the text
Some UPDATES for this particular case:
In Postman
if i remove the "Body" param from the $select list of the above query, it constantly fails with "503 Service Unavailable" after long (<~20sec) response times
unless if i set the $top param to 31 or lower, then everything works OK, regardless if "Body" is iuncluded in the $Select list or not
if i use $top>31 with "Body" inlcuded, the response payload is truncated always at the same position of the 24th item in the result array, regardless of the value of $top
I hoped if i use 30 as page size running my Graph SDK code then i could forget this bug :), but unfortunatelly there i receive "503 Service Unavailable" for the same query that succeeds in Postman, with message
Code: generalException
Message: Unexpected exception returned from the service.
ClientRequestId: 610103aa-ac07-4b8b-b7af-0aa7bdbcce0e
The Timestamp form the response headers: Thu, 03 Dec 2020 10:49:36 GMT
Any help would be appreciated, how could i ensure that the message is loaded correctly? I tought about some throttling, quota/message size limit or restriction, but i could not find anything - and now I can reproduce the issue in postman anytime.
Thanks

Get Analyse Form Result API is returning error code 3003

I used the form labelling tool to train my model. I have got the modelID, run the Analyse Form API successfully, but when called the get analyse form result, I've got the error code:
3003 "OCR extraction error: [Wrong response code: FailedToDownloadImage. Message: Failed to download image from input URL..]"
I haven't tested the model on any of these 5 pictures that I used for training purposes. Instead, I used 3 completely new documents.
Any idea how I could get this to work?
This is the form I analysed (pdf)
When you submit the 3 new documents to analyze, do you submit them from your Azure storage blob, or local file system, or from other places with a URL? if it's the last case (URL), the current service has a bug. You could try the first 2 options, and see if they solve your problem.
-xin (Form Recognizer Team)
Check your url for encoding standards.
This error can be throwed when you send an url without url encoding.
For example spaces need to be rapleced by %20.
Indeed this url:
"https://test.com/Attachments/Recognized 3728_001.pdf"
needs to be changed to
"https://test.com/Attachments/Recognized%203728_001.pdf"
Check this link for other cases:

Is a corrupted file an Invalid Argument?

I'm programming a service with a team. The service receives a file as a byte array and returns a response. We are expecting a specific type of file (PDF, WORD, EXCEL, TXT, etc)
We are discussing what type of exception throws if the file is corrupted or invalid (a 0 bytes PDF file for example).
We are using gRPC as the communication protocol, so I'm thinking in return an Invalid Argument status code, but some coworker disagrees with me and proposes to use the Unknown status code.
Which scenarios allow me to use the Invalid Argument status code?
UNKNOWN should be reserved for cases when you don't know what sort of failure happened; this normally happens when converting errors from one type to another and it isn't clear what the original error implied.
INVALID_ARGUMENT's documentation:
// The client specified an invalid argument. Note that this differs
// from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments
// that are problematic regardless of the state of the system
// (e.g., a malformed file name).
That's exactly the case presented here, where the server does not consider the input valid.

Posting binary buffer payload using Node-RED

I am trying to send a byte array through POST using Node-RED. I can successfully create the buffer using this module and storing it in msg.payload. However I can't figure out how to add it as a parameter in a http request node.
The receiving application requires enclosing quotes. So I use the payload in the following url: localhost:port/path?var=\"{{payload}}\", but it gives
"Error converting http params to args: invalid character '\' looking for beginning of value"
If using it in the request url without quotes: localhost:port/path?var={{payload}} nothing gets through (I can see on the other end).
I am using Protobuf due to the application on the other side, but I've also tried creating a buffer, as described here. However, nothing changes.
POSTs should not have arguments in the URL. The data should all be in the body.
Do you need to make the msg.payload an object with keys matching the arg names.
msg.payload = {
var = [buffer]
}
You will probably have to play around with the content-type header as by default I believe Node-RED will send a JSON body and you probably want application/x-www-form-urlencoded
You can set the headers by adding a msg.headers object

Resources