StartDocumentAnalysis is not producing TABLE data - amazon-textract

Am trying to detect and extract TABLE/FORMS data from a multi-page PDF (Async operation), which as per docs, "StartDocumentAnalysis" API is the right one. However,the outputs differ in each case when doing from console and from automation (using above API).
Case1: When analyzing same multi-page PDF from Textract console, I see 13 Tables are detected along with Forms/Lines/Words.
{
"BlockType": "CELL",
"Confidence": 67.00711822509766,
"RowIndex": 1,
"ColumnIndex": 2,
"RowSpan": 1,
"ColumnSpan": 1,
"Geometry": {....},
},
"Id": "fba85b26-3340-4f00-912c-5b86f253b7cd",
"Relationships": [
{
"Type": "CHILD",
"Ids": [
"684d019a-8c89-4cc1-81a0-702e581938f6",
"de8def9a-de59-4f70-ae2f-1624b1b607d7"
]
}
],
"Page": 3,
"childText": "Ficus Bank ",
"SearchKey": "Ficus Bank "
}
Case2: When using "StartDocumentAnalysis" API on same PDF from S3, the BlockType ="TABLE" are present in the output but without any data.
{
"BlockType": "CELL",
"ColumnIndex": 2,
"ColumnSpan": 1,
"Confidence": 67.00711822509766,
"EntityTypes": null,
"Geometry": {....},
},
,
"Hint": null,
"Id": "77c2a38a-c36c-4d8d-ba32-7e51a022ee23",
"Page": 3,
"Query": null,
"Relationships": [
{
"Ids": [
"e9b3b510-1041-4d6a-ab42-1fa42643038e",
"0a4d2092-c09e-4a56-aafe-700c006b85a3"
],
"Type": "CHILD"
}
],
"RowIndex": 1,
"RowSpan": 1,
"SelectionStatus": null,
"Text": null,
"TextType": null
},
Observation: The below two keys ("childText" & "SearchKey") are missing from the Async API's output from CELL type of TABLE blocktype.
"childText": "Ficus Bank ",
"SearchKey": "Ficus Bank "
Any info/direction would be appreciated.

Related

Getting a specific item in a sub array and selecting one value from it

I want to get the boardgame rank (value) from this nested array in Cosmos DB.
{
"name": "Alpha",
"statistics": {
"numberOfUserRatingVotes": 4155,
"averageRating": 7.26201,
"baysianAverageRating": 6.71377,
"ratingStandardDeviation": 1.18993,
"ratingMedian": 0,
"rankings": [
{
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
},
{
"id": 4664,
"name": "wargames",
"friendlyName": "War Game Rank",
"type": "family",
"value": 140
},
{
"id": 5497,
"name": "strategygames",
"friendlyName": "Strategy Game Rank",
"type": "family",
"value": 434
}
],
"numberOfComments": 1067,
"weight": 2.3386,
"numberOfWeightVotes": 127
},
}
So I want:
{
"name": "Alpha",
"rank": 746
}
Using this query:
SELECT g.name, r
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this (so close!):
{
"name": "Alpha",
"r": {
"id": 1,
"name": "boardgame",
"friendlyName": "Board Game Rank",
"type": "subtype",
"value": 746
}
},
But extending the query to this:
SELECT g.name, r.value as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'
I get this error:
Failed to query item for container Games:
Message: {"errors":[{"severity":"Error","location":{"start":21,"end":26},"code":"SC1001","message":"Syntax error, incorrect syntax near 'value'."}]}
ActivityId: 0a0cb394-2fc3-4a67-b54c-4d02085b6878, Microsoft.Azure.Documents.Common/2.14.0
I don't understand why this doesn't work? I don't understand what the syntax error is. I tried adding square braces but that didn't help. Can some help me understand why I get this error and also how to achieve the output I'm looking for?
This should work,
SELECT g.name, r["value"] as rank
FROM Games g
JOIN r IN g.statistics.rankings
WHERE r.name = 'boardgame'

How to Add data to a nested field in Firebase Firestore using a conditional field path

var k = [
{
"category": "Cars",
"products": [
{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
},
{
"productName": "Mercedes",
"quantity": 1,
"costPrice": 220000,
"coverPhoto": "www.mercerdezphotoUrl.jpg"
}
]
},
{
"category": "Food",
"products": [
{
"productName": "Pizza",
"quantity": 50,
"costPrice": 30,
"coverPhoto": "www.pizzaphotoUrl.jpg"
},
{
"productName": "Pancake",
"quantity": 3,
"costPrice": 3,
"coverPhoto": "www.pancakephotoUrl.jpg"
}
]
}
];
Given the above code sample, I'm trying to add a new map to the nested list, product using the firebase.arrayUnion()
Map _map =
{
"productName": 'Tesla',
"quantity": 2,
"costPrice": 45000,
"coverPhoto": "www.teslaPhotoUrl.jpg"
};
I want to only add this map only where the key 'category' is equal to 'Cars'
_firebaseFirestoreRef.collection('data').doc(id).update({
"fieldPath to the the list where key category == Cars": FieldValue.arrayUnion([_map])
});
And I want final result to be like this on my firebase firestore database
[
{
"category": "Cars",
"products": [
{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
},
{
"productName": "Mercedes",
"quantity": 1,
"costPrice": 220000,
"coverPhoto": "www.mercerdezphotoUrl.jpg"
},
{
"productName": 'Tesla',
"quantity": 2,
"costPrice": 45000,
"coverPhoto": "www.teslaPhotoUrl.jpg"
}
]
},
{
"category": "Food",
"products": [
{
"productName": "Pizza",
"quantity": 50,
"costPrice": 30,
"coverPhoto": "www.pizzaphotoUrl.jpg"
},
{
"productName": "Pancake",
"quantity": 3,
"costPrice": 3,
"coverPhoto": "www.pancakephotoUrl.jpg"
}
]
}
];
I know I need to use the FirebaseFirestore FieldPath and some query objects, but I don't know how to use it effectively to achieve this ...
Since your k is an array, you're trying to update an existing element in an array field, which is not possible in Firestore. You'll first need to read the document, get the k array from it, update it in your application code, and then write the resulting field back to the database.
This has been covered quite regularly before, so I recommend looking at some other questions about updating an item in an array.
You can also consider turning the top-level array into a map, using the category for the first-level field name:
products: {
"Cars": [{
"productName": "Aston Martin",
"quantity": 2,
"costPrice": 13500000,
"coverPhoto": "www.astonmartinphotoUrl.jpg"
}, ... ]
"Food": [{
...
}]
}
Now you can add an item to the Cars array with an array union on products.Cars.
As Dharmaraj commented, you could also consider putting the products into a subcollection. This will allow you to query the products separately, and allows you to read the parent document without reading all products.

How to know which product variation has which Variation ID in WooCommerce REST API

I'm using WooCommerce Rest APIs in wordpress to make an android application and i'm trying to get product variation from the product detail's response.I am getting product variations with its attributes and names in a list but i don't know how to check which variation has which Variation ID.
"attributes": [
{
"id": 3,
"name": "Ships From",
"position": 0,
"visible": false,
"variation": true,
"options": [
"China"
]
},
{
"id": 20,
"name": "Type",
"position": 1,
"visible": false,
"variation": true,
"options": [
"1",
"2",
"3"
]
},
{
"id": 21,
"name": "Length (m/ft)",
"position": 2,
"visible": false,
"variation": true,
"options": [
"0.25 / 0.82",
"0.5 / 1.64",
"1 / 3.28",
"1.5 / 4.92",
"2 / 6.56"
]
}
],
"default_attributes": [],
"variations": [
158435,
158436,
158437,
158438,
158439,
158440,
158441,
158442,
158443,
158444,
158445,
158446
],
here i have names and ID of attributes with everything and in below i have all ID of product variations but i don't know how to know that if i mix China+1+"0.25 / 0.82 then which will be variation ID.
thanks in advance.
You have to make another api call to wp-json/wc/v3/products/< product_id >/variations or make individual calls to wp-json/wc/v3/products/< product_id >/variations/< variation_id > using those variations id's on the "variations" array and check which variation has which combination of attributes, there is no direct relation between the variation ID and the attributes it uses.

How to fix this Matchmaking Rule set for AWS Game Lift

I am new to Game Lift and am trying to make a ruleset for a Jeopardy game for a project I am creating. I try to apply what I want to do for the match making but I always seem to get this error and cannot figure out for the life of me what is wrong.
I am doing 3 players, each having near the same skill set so to keep it fair. Can someone explain what I am doing wrong?
I have already looked up all around the documentation of Game lift but I am still confused how this portion works. The examples they gave worked and I tried editing them to my own liking but it seems it did not work.
"name": "Normal_Game",
"ruleLanguageVersion": "1.0",
"playerAttributes": [{
"name": "skill",
"type": "number",
"default": 10
}],
"teams": [{
"name": "red",
"maxPlayers": 1,
"minPlayers": 1
}, {
"name": "blue",
"maxPlayers": 1,
"minPlayers": 1
},{
"name": "green",
"maxPlayers": 1,
"minPlayers":1
}],
"rules": [{
"name": "FairTeamSkill",
"description": "The average skill of players in each team is within 10 points from the average skill of all players in the match",
"type": "distance",
// get skill values for players in each team and average separately to produce list of two numbers
"measurements": [ "avg(teams[*].players.attributes[skill])" ],
// get skill values for players in each team, flatten into a single list, and average to produce an overall average
"referenceValue": "avg(flatten(teams[*].players.attributes[skill]))",
"maxDistance": 10 // minDistance would achieve the opposite result
}, {
"name": "EqualTeamSizes",
"description": "Only launch a game when the number of players in each team matches, e.g. 4v4, 5v5, 6v6, 7v7, 8v8",
"type": "comparison",
"measurements": [ "count(teams[red].players)" ],
"referenceValue": "count(teams[blue].players)",
"operation": "=" // other operations: !=, <, <=, >, >=
"referenceValue": "count(teams[green].players)",
"operation": "="
}],
"expansions": [{
"target": "rules[FairTeamSkill].maxDistance",
"steps": [{
"waitTimeSeconds": 5,
"value": 50
}, {
"waitTimeSeconds": 15,
"value": 100
}]
}]
}
I validate it all the time, expecting it to take it but it doesn't my error messages keep occurring as this:
Rule set*
Encountered JSON parsing error: Unexpected character ('"' (code 34)): was expecting comma to separate Object entries at [Source: { "name": "Normal_Game", "ruleLanguageVersion": "1.0", "playerAttributes": [{ "name": "skill", "type": "number", "default": 10 }], "teams": [{ "name": "red", "maxPlayers": 1, "minPlayers": 1 }, { "name": "blue", "maxPlayers": 1, "minPlayers": 1 },{ "name": "green", "maxPlayers": 1, "minPlayers":1 }], "rules": [{ "name": "FairTeamSkill", "description": "The average skill of players in each team is within 10 points from the average skill of all players in the match", "type": "distance", // get skill values for players in each team and average separately to produce list of two numbers "measurements": [ "avg(teams[*].players.attributes[skill])" ], // get skill values for players in each team, flatten into a single list, and average to produce an overall average "referenceValue": "avg(flatten(teams[*].players.attributes[skill]))", "maxDistance": 10 // minDistance would achieve the opposite result }, { "name": "EqualTeamSizes", "description": "Only launch a game when the number of players in each team matches, e.g. 4v4, 5v5, 6v6, 7v7, 8v8", "type": "comparison", "measurements": [ "count(teams[red].players)" ], "referenceValue": "count(teams[blue].players)", "operation": "=" // other operations: !=, <, <=, >, >= "referenceValue": "count(teams[green].players)", "operation": "=" }], "expansions": [{ "target": "rules[FairTeamSkill].maxDistance", "steps": [{ "waitTimeSeconds": 5, "value": 50 }, { "waitTimeSeconds": 15, "value": 100 }] }] }; line: 38, column: 10]
You seem to have these 2:
"referenceValue":
"operation":
defined twice in EqualTeamSizes rules, that might cause issues. And a missing comma after "operation": "="
{
"name": "EqualTeamSizes",
"description": "Only launch a game when the number of players in each team matches, e.g. 4v4, 5v5, 6v6, 7v7, 8v8",
"type": "comparison",
"measurements": [ "count(teams[red].players)" ],
"referenceValue": "count(teams[blue].players)",
"operation": "=" // other operations: !=, <, <=, >, >=
"referenceValue": "count(teams[green].players)",
"operation": "="
}

Want to output two values from each line of a huge JSONL file in R Studio

I'm walking through a huge JSONL file (100G, 100M rows) line by line extracting two key values from the data. Ideally, I want this written to a file with two columns. I'm a real beginner here.
Here is an example of the JSON on each row of the file referenced on my C drive:
https://api.unpaywall.org/v2/10.6118/jmm.2017.23.2.135?email=YOUR_EMAIL
or:
{
"best_oa_location": {
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "FAKEURL",
"url_for_landing_page": "URL2",
"url_for_pdf": "URL4",
"version": "publishedVersion"
},
"data_standard": 2,
"doi": "10.6118/jmm.2017.23.2.135",
"doi_url": "URL5",
"genre": "journal-article",
"is_oa": true,
"journal_is_in_doaj": false,
"journal_is_oa": false,
"journal_issns": "2288-6478,2288-6761",
"journal_name": "Journal of Menopausal Medicine",
"oa_locations": [
{
"evidence": "open (via page says license)",
"host_type": "publisher",
"is_best": true,
"license": "cc-by-nc",
"pmh_id": null,
"updated": "2018-02-14T11:18:21.978814",
"url": "URL6",
"url_for_landing_page": "hURL7": "hURL8",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via OAI-PMH doi match)",
"host_type": "repository",
"is_best": false,
"license": "cc-by-nc",
"pmh_id": "oai:pubmedcentral.nih.gov:5606912",
"updated": "2017-10-21T18:12:39.724143",
"url": "URL9",
"url_for_landing_page": "URL11",
"url_for_pdf": "URL12",
"version": "publishedVersion"
},
{
"evidence": "oa repository (via pmcid lookup)",
"host_type": "repository",
"is_best": false,
"license": null,
"pmh_id": null,
"updated": "2018-10-11T01:49:34.280389",
"url": "URL13",
"url_for_landing_page": "URL14",
"url_for_pdf": null,
"version": "publishedVersion"
}
],
"published_date": "2017-01-01",
"publisher": "The Korean Society of Menopause (KAMJE)",
"title": "A Case of Granular Cell Tumor of the Clitoris in a Postmenopausal Woman",
"updated": "2018-06-20T20:31:37.509896",
"year": 2017,
"z_authors": [
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Min",
"given": "Ji-Won"
},
{
"affiliation": [
{
"name": "Department of Obstetrics and Gynecology, Soonchunhyang University Cheonan Hospital, University of Soonchunhyang College of Medicine, Cheonan, Korea."
}
],
"family": "Kim",
"given": "Yun-Sook"
}
]
}
Here's the code i'm using/wrote:
library (magrittr)
library (jqr)
con = file("C:/users/ME/desktop/miniunpaywall.jsonl", "r");
while ( length(line <- readLines(con, n = -1)) > 0) {
write.table( line %>% jq ('.doi,.best_oa_location.license'), file='test.txt', quote=FALSE, row.names=FALSE);}
What results from this is a line of text for each row of JSON that looks like this:
"10.1016/j.ijcard.2018.10.014,CC-BY"
This is effectively:
"[DOI],[LICENSE]"
I want ideally to have the output be:
[DOI] tab [LICENSE]
I believe my problem is that I'm writing the values as a string into a single column when i say:
write.table( line %>% jq ('.doi,.best_oa_location.license')
I havent figured out a way to remove the quotes i'm getting around each line in my file or how i could separate the two values with a tab. I feel I'm pretty close. Help!

Resources