Amazon Textract returns different results returns different results between WebApp Demo, AnalyzeDocumentRequest and StartDocumentAnalysisRequest - amazon-textract

this is my first question on StackOverFlow, I would like to extract key-value pairs (FORMS) from a (scanned) PDf document via Amazon Textract. What I have noticed, however, is that some key-value pairs returned by the webapp demo (https://us-east-2.console.aws.amazon.com/textract/home?region=us-east-2#/demo) are absent from the methods that can be implemented in the code.
Furthermore, between these two methods, the Synchronous method (AnalyzeDocumentRequest), which does not accept PDF but forces a pre-conversion of the document into an image, in turn finds key-value pairs (Sync Result Example) which the Asynchronous method does not. (Async Result Example)
The problem is similar to this guy's, when he talks about the difference in results between the two methods of analyzing the document : AWS Textract - GetDocumentAnalysisRequest only returns correct results for first page of document
The code implementation is equal to these example:
Synchronous Method: https://docs.aws.amazon.com/textract/latest/dg/examples-extract-kvp.html
Asynchronous Method: https://github.com/awsdocs/amazon-textract-developer-guide/blob/master/doc_source/async-analyzing-with-sqs.md
Has anyone ever had the same problem?

We had this problem recently. The demo website provided by AWS found 50 fields, our own code using the provided API yielded 30 fields.
After some trial land error and a lot of googling we found that the response returned by GetDocumentAnalysisAsync included a NextToken which is used to ask for more results. Turns out we had to call GetDocumentAnalysisAsync again with this token (rinse-and-repeat) until the result response no longer included a NextToken.
At that point we knew we had all the data.

Related

Can I get consistent order of fields from a doc.get().data() query in a Firestore database?

I have a firestore database with data like this:
Now, I access this data with doc('mydoc').get().data() and it returns the data. But, even without the data changing, if I make the same call again and again, I get different response. I mean, the data is the same, but the order of the fields is different each time.
Here's my logs with two calls, see how the field order is random? Not just between objects in the same request, but between the same object in different requests.
I'm accessing this data in a Cloud Function and serving it as an API endpoint. I want to cache the response if the data (in the database) hasn't changed, but I can't, because the data (as returned by doc.get().data()) is constantly changing.
From what I could find, this might stem from ProtoBuf encoding.
My question: is there any way to get a consistent response to a firebase query when the underlying data isn't changing?
And if no, is my only option to JSON.stringify() the whole object before putting it into firestore? (I don't need to query within document objects.)
Edit for clarity: I am not expecting to know in advance the order of the fields being returned. I am expecting (hoping) that the order will be the same each time.
JSON object fields are unordered as per the JSON spec. Individual implementations of JSON are free to rearrange order however they see fit, and there's no surefire way to guarantee an order. See e.g. this answer.
This isn't a Firestore-specific problem, this is just generally how JSON objects work. You cannot and should not depend on the order of fields for any parsing or representation.
If display order is extremely important to you, you might want to investigate libraries like ordered-json.

Jmeter: How do I create a separate report for each assertion?

Right now, my HTTP call has 3 assertions. The reporting options for the results listener only provide a checkbox for "Assertion Results", which lumps all of my assertion results into one value in the CSV output.
The team would like to create a csv output for each assertion. The problem is, you can't add a results listener under an assertion, it must be under the HTTP call. I can't think of a way to create separate reports besides making three separate HTTP calls, each with their own results listener writing a report. That is not ideal.
I tried with jtl with XML.
Below is the config. Use minimum parameter required as it consume a lot of resource.
Below is the output.
But i think if you try to have response time in the same then there will be same entry twice. Similarly, other calculation might get impacted. So, you can use two one this and other simple data writer.
Hope this helps.

Using response data from one scenario to another

With Karate, I'm looking to simulate an end-to-end test structure where I do the following:
Make a GET request to specific data
Store a value as a def variable
Use that information for a separate scenario
This is what I have so far:
Scenario: Search for asset
Given url "https://foo.bar.buzz"
When method get
Then status 200
* def responseItem = $.items[0].id // variable initialized from the response
Scenario: Modify asset found
Given url "https://foo.bar.buzz/" + responseItem
// making request payload
When method put.....
I tried reading the documentation for reusing information, but that seemed to be for more in-depth testing.
Thoughts?
It is highly recommended to model flows like this as one scenario. Please refer to the documentation: https://github.com/intuit/karate#script-structure
Variables set using def in the Background will be re-set before every
Scenario. If you are looking for a way to do something only once per
Feature, take a look at callonce. On the other hand, if you are
expecting a variable in the Background to be modified by one Scenario
so that later ones can see the updated value - that is not how you
should think of them, and you should combine your 'flow' into one
scenario. Keep in mind that you should be able to comment-out a
Scenario or skip some via tags without impacting any others. Note that
the parallel runner will run Scenario-s in parallel, which means they
can run in any order.
That said, maybe the Background or hooks is what you are looking for: https://github.com/intuit/karate#hooks

Is it ok to model a computation as a web resource (REST resource)?

Let's say I have a component wich performs an addition as a business operation.
I don't need the result of the sum to be persisted anywhere because, let's say, the only thing that matters is the result of the addition.
Let´s say that the client component should get interested in saving the result of the addition, so I need to indicate the client how to save the result of the addition, so he can come back later and retrieve this result.
May the addition service be modeled as a web resource? Something like:
GET api.mycompany.com/addition?param1=x&param2=y
should return the result of the business operation. The response may present the following as a link (here comes the hypermedia) to persist the result:
POST api.mycompany.com/addition?param1=x&param2=y
Is this approach correct? -In the sense of a truly restful api -
Considering CRUD operations, this one:
GET api.mycompany.com/addition?param1=x&param2=y
Is idempotent, safe and cacheable, so I'll consider it as a RESTful GET.
As soon as your parameters become more complex, you could POST them to your 'addition' resource, returning a URL to computed result.
Yes, it is fine. But perhaps it is better to have a resource called operation that returns a list of links to supported operations. You could then have operation/addition?param1=x&param2=y and so on and so fort. Of course, the links should be opaque and the documented media-type returned by a call to the operation resource should provide information about the other available resources.

How should Transient Resources be retrieved in a RESTful API

For a while I was (wrongly) thinking that a RESTful API just exposed CRUD operation to persisted entities for a web application. When you code something up in "the real world" you soon find out that this is not enough. For example, a bank account transfer doesn't have to be a persisted entity. It could be a transient resource where you POST to /transfers/ and in the payload you specify the details:
{"accountToCredit":1234, "accountToDebit":5678, "amount":10}
Using POST here makes sense because it changes the state on the server ($10 moves from one account to another every time this POST occurs).
What should happen in the case where it doesn't affect the server? The simple first answer would be to use GET. For example, you want to get a list of savings and checking accounts that have less than $100. You would then call something like GET to /accounts/searchResults?minBalance=0&maxBalance=100. What happens though if your search parameter need to use complex objects that wouldn't fit in the maximum length of a GET request.
My first thought was to use POST, but after thinking about it some more it should probably be a PUT since it isn't changing the state of the server, but from my (limited) understanding I always though of PUT as updating a resource and POST as creating a resource (like creating this search results). So which should be used in this case?
I found the following links which provide some information but it wasn't clear to me what should be used in the different cases:
Transient REST Representations
How to design RESTful search/filtering?
RESTful URL design for search
I would agree with your approach, it seems reasonable to me to use GET when searching for resources, and as said in one of your provided links, the whole point of query strings is for doing things like search. I also agree that PUT fits better when you want to update some resource in an idempotent way (no matter how many times you hit the request, the result will be the same).
So generally, I would do it as you propose. Now, if you are limited by the maximum length of GET request, then you could use POST or PUT, passing your parameters in a JSON, in a URI like:
PUT /api/search
You could see this as a "search resource" where you send new parameters. I know it seems like a workaround and you may be worried that REST is about avoiding verbs in the URIs. Well, there are few cases that it's still acceptable and RESTful to use verbs, e.g. in cases where calculation or conversion is involved to generate the result (for more about this, check this reference).
PS. I think this workaround is still RESTful, but even if it wasn't, REST isn't an obsession and an ultimate goal. Being pragmatic and keeping a clean API design might be a better approach, even if in few cases you are not RESTful.

Resources