Create index on nested array value with dynamodb - amazon-dynamodb

I have the following data stored in a DynamoDB table called elo-history.
{
"gameId": "chess",
"guildId": "abc123",
"id": "c3c640e2d8b76b034605d8835a03bef8",
"recordedAt": 1621095861673,
"results": [
{
"oldEloRating": null,
"newEloRating": 2010,
"place": 1,
"playerIds": [
"abc1"
]
},
{
"oldEloRating": null,
"newEloRating": 1990,
"place": 2,
"playerIds": [
"abc2"
]
}
],
"versus": "1v1"
}
I have 2 indexes, guildId-recordedAt-index and gameId-recordedAt-index. Theses allow me to query on those fields.
I am trying to add another index for results[].playerIds[]. I want to be able to do a query for records with playerId=abc1 and have those sorted just like guildId and gameId. Does DynamoDB support something like? Do I need to restructure the data or save it in two different formats to support this type of query?
Something like this.
New table called player-elo-history in addition to the elo-history table. This would store the list of games by playerId
{
"id": "abc1",
"gameId": "chess",
"guildId": "abc123",
"recordedAt": 1621095861673,
"results": [
[
{
"oldEloRating": null,
"newEloRating": 2010,
"place": 1,
"playerIds": [
"abc1"
]
},
{
"oldEloRating": null,
"newEloRating": 1990,
"place": 2,
"playerIds": [
"abc2"
]
}
]
]
}
{
"id": "abc2",
"gameId": "chess",
"guildId": "abc123",
"recordedAt": 1621095861673,
"results": [
[
{
"oldEloRating": null,
"newEloRating": 2010,
"place": 1,
"playerIds": [
"abc1"
]
},
{
"oldEloRating": null,
"newEloRating": 1990,
"place": 2,
"playerIds": [
"abc2"
]
}
]
]
}

It looks like you're modeling the one-to-many relationship between Games and Results using a complex attribute (e.g. a list or objects) on the Game item. This is a completely valid approach to modeling one-to-many relationships and is best used when 1) the results data doesn't change (or change often) and 2) you don't have any access patterns around Results.
Since it sounds like you do have access patterns around Results, you'd be better off storing your Results in their own items.
For example, you might consider modeling results in the user partition with a PK=USER#user_id SK=RESULT#game_id. This would allow you to fetch results by User ID (QUERY where PK=USER#user_id SK begins_with RESULT). Alternatively, you could model results with a PK=RESULT#game_id SK=USER#user_id and create a GSI that swaps the PK/SK's which will allow you to group results by User.
I don't know the specifics around your access patterns, but can say that you'll need to move results into their own items if you want to support access patterns around game results.

Related

Great Expectations - Result validation for row_count and column_freshness

I would like to validate results for row count and column freshness on some data on AWS. I am using a check_config.json file to configure the checks. I use terraform to make a Glue job to run the check and throw the result to DynamoDB. The result in DynamoDB is not elaborate and I would like the result to be more specific on the exact results obtained before marking a check as fail or pass. I would like to see, for example, when was the table last modified(column freshness) and number of rows obtained after a count (expect_row_count).
Below is the current result in DynamoDB:
Below is the json code:
{
"table": "table1",
"checks": [
{
"check": "custom_expect_column_to_be_fresh",
"parameters": {
"columns": [
"column1"
],
"strftime_format": "%Y-%m-%d",
"threshold_days": 0,
"threshold_hours": 10
}
},
{
"check": "expect_table_row_count_to_be_between",
"result_format" : "COMPLETE",
"include_config": "True",
"parameters": {
"min_value": 1,
"max_value": 100000
},
"alarm" : {
"threshold": 100,
"period": 3600
}
}
]
}
I was expecting a more elaborate result on how many rows were obtained before the row_count is marked as a failure and I also want to see the last table modification timestamp before column freshness marks as a failure.

How can I query nested arrays in Data Explorer?

I am trying to write a query in Data Explorer over a Cosmos DB to give me a list of results where the order has a discount applied. That requires that I examine every element of the Totals array for a Discounts element that is not empty.
I've tried to use ARRAY_LENGTH within ARRAY_CONTAINS as shown below and that didn't return a result set. I know the ARRAY_CONTAINS is use to look for a field value within an array, but I was hoping that it would accept ARRAY_LENGTH command.
SELECT * FROM c where ARRAY_CONTAINS(c.OrderHeader.Totals,{ARRAY_LENGTH(Discounts):1},true))
I've also tried to check for a value in CampaignId field of the Discounts array using the following query. It didn't return a result set.
SELECT * FROM c where ARRAY_CONTAINS(c.OrderHeader.Totals.Discounts,{CampaignId:null},false)
I would assume there's a way to do this, so any input would be greatly appreciated!
{
"OrderHeader": {
"Totals": [
{
"Currency": "CAD",
"Price": 10.00,
"Discounts": []
},
{
"Currency": "CAD",
"Price": 20.00,
"Discounts": []
},
{
"Currency": "CAD",
"Price": 30.00,
"Discounts": [
{
"CampaignId": "Campaign2",
"CouponDefinition": null,
}
]
}
}
Please try this sql:
SELECT t.Currency,t.Price,t.Discounts FROM c JOIN t
IN c.OrderHeader.Totals WHERE ARRAY_LENGTH(t.Discounts) > 0
Here is the result:
[
{
"Currency": "CAD",
"Price": 30,
"Discounts": [
{
"CampaignId": "Campaign2",
"CouponDefinition": null
}
]
}
]
Hope it can help you.

Gremlin group by vertex property and get sum other properties in the same vertex

We have vertex which will store various jobs and their types and counts as properties. I have to group by the status and their counts. I tried the following query which works for one property(receiveCount)
g.V().hasLabel("Jobs").has("Type",within("A","B","C")).group().by("Type").by(fold().match(__.as("p").unfold().values("receiveCount").sum().as("totalRec")).select("totalRec")).next()
I wanted to give 10 more properties like successCount, FailedCount etc.. Is there a better way to give that?
You could use cap() step just like:
g.V().has("name","marko").out("knows").groupCount("a").by("name").group("b").by("name").by(values("age").sum()).cap("a","b")
And the result would be:
"data": [
{
"a": {
"vadas": 1,
"josh": 1
},
"b": {
"vadas": [
27.0
],
"josh": [
32.0
]
}
}
]

DocumentDB - WHERE clause within collection

Given a sample document like this one from the Microsoft examples:
{
"id": "AndersenFamily",
"lastName": "Andersen",
"parents": [
{ "firstName": "Thomas" },
{ "firstName": "Mary Kay"}
],
"children": [
{
"firstName": "Henriette Thaulow",
"gender": "female",
"grade": 5,
"pets": [{ "givenName": "Fluffy" }]
}
],
"address": { "state": "WA", "county": "King", "city": "seattle" },
"creationDate": 1431620472,
"isRegistered": true
}
We can see that there is a sub-collection children containing an array of documents.
Let's say I wanted to write a query using the SELECT ... FROM ... WHERE ... type syntax, how would I go about writing a query to find families with any daughters (any children with gender "female")
So something like
SELECT c.id
FROM c
WHERE c.children.contains( // I'm stuck!
I'm wondering if I'm missing a JOIN or something but honestly I'm not sure where I go from here, and I'm struggling to find anything helpful in Google partially because I'm not sure how to phrase my search!
You need the JOIN keyword to unwind the children, then apply the filter on gender like the query below:
SELECT f.id
FROM family f
JOIN child IN f.children
WHERE child.gender = "female"

Query to get exact matches of Elastic Field with multile values in Array

I want to write a query in Elastic that applies a filter based on values i have in an array (in my R program). Essentially the query:
Matches a time range (time field in Elastic)
Matches "trackId" field in Elastic to any value in array oth_usr
Return 2 fields - "trackId", "propertyId"
I have the following primitive version of the query but do not know how to use the oth_usr array in a query (part 2 above).
query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)
view_list <- elastic::Search(index = "organised_recent",type = "PROPERTY_VIEW",size = 10000000,
body=query, fields = c("trackId", "propertyId"))$hits$hits
You need to add a terms query and embed it as well as the range one into a bool/must query. Try updating your query like this:
terms <- paste(sprintf("\"%s\"", oth_usr), collapse=", ")
query <- sprintf('{"query":{"bool":{"must":[{"terms": {"trackId": [%s]}},{"range": {"time": {"gte": "%s","lte": "%s"}}}]}}}',terms,start_date,end_date)
I'm not fluent in R syntax, but this is raw JSON query that works.
It checks whether your time field matches given range (start_time and end_time) and whether one of your terms exact matches trackId.
It returns only trackId, propertyId fields, as per your request:
POST /indice/_search
{
"_source": {
"include": [
"trackId",
"propertyId"
]
},
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"gte": "start_time",
"lte": "end_time"
}
}
},
{
"terms": {
"trackId": [
"terms"
]
}
}
]
}
}
}

Resources