Cosmos DB query strings in array retain grouping and trim the value? - azure-cosmosdb

Say we have two sets of data in my collection:
{
"id": "111",
"linkedId": [
"ABC:123",
"ABC:456"
]
}
{
"id": "222",
"linkedId": [
"DEF:321",
"DEF:654"
]
}
What query can I run to get a result that will look like this?
{
[
"123",
"456"
]
},
{
[
"321",
"654"
]
}
I have tried
SELECT c.linkedId FROM c
But this has the "linkedId" as the property name in the result set. And I tried LEFT but it doesn't trim first 4 characters of the string.
Then I tried
SELECT value cc FROM cc In c.linkedId
But this loses the grouping.
Any idea?

Since the elements are just strings, not json object, i suggest you using UDF in cosmos db query sql.
UDF:
function userDefinedFunction(arr){
var returnArr = [];
for(var i=0;i<arr.length;i++){
returnArr.push(arr[i].substring(4,7));
}
return returnArr;
}
SQL:
SELECT value udf.test(c.linkedId) FROM c
OUTPUT:

Related

Great Expectations - Result validation for row_count and column_freshness

I would like to validate results for row count and column freshness on some data on AWS. I am using a check_config.json file to configure the checks. I use terraform to make a Glue job to run the check and throw the result to DynamoDB. The result in DynamoDB is not elaborate and I would like the result to be more specific on the exact results obtained before marking a check as fail or pass. I would like to see, for example, when was the table last modified(column freshness) and number of rows obtained after a count (expect_row_count).
Below is the current result in DynamoDB:
Below is the json code:
{
"table": "table1",
"checks": [
{
"check": "custom_expect_column_to_be_fresh",
"parameters": {
"columns": [
"column1"
],
"strftime_format": "%Y-%m-%d",
"threshold_days": 0,
"threshold_hours": 10
}
},
{
"check": "expect_table_row_count_to_be_between",
"result_format" : "COMPLETE",
"include_config": "True",
"parameters": {
"min_value": 1,
"max_value": 100000
},
"alarm" : {
"threshold": 100,
"period": 3600
}
}
]
}
I was expecting a more elaborate result on how many rows were obtained before the row_count is marked as a failure and I also want to see the last table modification timestamp before column freshness marks as a failure.

Terraform: Add item to a DynamoDB Table

What is the correct way to add tuple and key-pair values items to a DynamoDB database via Terraform?
I am trying like this:
resource "aws_dynamodb_table_item" "item" {
table_name = aws_dynamodb_table.dynamodb-table.name
hash_key = aws_dynamodb_table.dynamodb-table.hash_key
for_each = {
"0" = {
location = "Madrid"
coordinates = [["lat", "40.49"], ["lng", "-3.56"]]
visible = false
destinations = [0, 4]
}
}
item = <<ITEM
{
"id": { "N": "${each.key}"},
"location": {"S" : "${each.value.location}"},
"visible": {"B" : "${each.value.visible}"},
"destinations": {"L" : [{"N": "${each.value.destinations}"}]
}
ITEM
}
And I am getting the message:
each.value.destinations is tuple with 2 elements
│
│ Cannot include the given value in a string template: string required.
I also have no clue on how to add the coordinates variable.
Thanks!
List should be something like that :
"destinations": {"L": [{ "N" : 1 }, { "N" : 2 }]}
You are trying to pass
"destinations": {"L": [{ "N" : [0,4] }]}
Also you are missing the last } in destinations key
TLDR: I think the problem here is that you are trying to put L(N) - i.e. a list of numeric values, while your current Terraform code tries to put all the destinations into one N/number.
Instead of:
[{"N": "${each.value.destinations}"}]
you need some iteration over destinations and building a {"N": ...} of them.
"destinations": {"NS": ${jsonencode(each.value.destinations)}}
Did the trick!

Cosmos DB - can't use a property (array) with IN Keyword

I'm attempting to query for several documents at once, using some properties that were found on the first document, similar to a TSQL left join on property value.
My attempt in CosmosDB:
select c from assets
join ver on c.versions
where c.id = '123' OR c.id IN ver.otherIds
--NOTE: ver.otherIds is an array
The query above results in a syntax error, stating it doesn't understand ver.otherIds. The docs state the syntax to be where c.id in ("123","456"...)
Things I've tried to work around this:
Attempted Custom UDF that takes in the array generates the syntax wanted Ex) ["123,"456"] --> "("123", "456")
Attempted using array_contains(ver.otherIds, c.id)
Attempted sub query approach, which produced a "The cardinality of a scalar subquery result set cannot be greater than one" error:
select value c from c
where array_contains((select ... that produces array), c.id)
None of the above worked.
I can, of course, pull the first asset, then generate a second query to pull the rest, but I'd rather not do that. I can also just de-normalize all the data, but without giving specifics to my scenario, it would end up being a very bad idea.
Any ideas?
Thanks in advance!
You could use your second scenario:ARRAY_CONTAINS.
My sample document:
[
{
"id": "1",
"versions": [
{
"otherIds": [
"1",
"2",
"3"
]
}
]
},
{
"id": "2",
"versions": [
{
"otherIds": [
"1",
"2",
"3"
]
},
{
"otherIds": [
"123",
"2",
"3"
]
}
]
},
{
"id": "123",
"versions": [
{
"otherIds": [
"1",
"2",
"3"
]
},
{
"otherIds": [
"123",
"2",
"3"
]
}
]
}
]
SQL:
SELECT distinct c.id,c.versions FROM c
join ver in c.versions
where c.id="123" or array_contains(ver.otherIds,c.id,false)
ARRAY_CONTAINS function can specify if the match is full or partial.

Query to get exact matches of Elastic Field with multile values in Array

I want to write a query in Elastic that applies a filter based on values i have in an array (in my R program). Essentially the query:
Matches a time range (time field in Elastic)
Matches "trackId" field in Elastic to any value in array oth_usr
Return 2 fields - "trackId", "propertyId"
I have the following primitive version of the query but do not know how to use the oth_usr array in a query (part 2 above).
query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)
view_list <- elastic::Search(index = "organised_recent",type = "PROPERTY_VIEW",size = 10000000,
body=query, fields = c("trackId", "propertyId"))$hits$hits
You need to add a terms query and embed it as well as the range one into a bool/must query. Try updating your query like this:
terms <- paste(sprintf("\"%s\"", oth_usr), collapse=", ")
query <- sprintf('{"query":{"bool":{"must":[{"terms": {"trackId": [%s]}},{"range": {"time": {"gte": "%s","lte": "%s"}}}]}}}',terms,start_date,end_date)
I'm not fluent in R syntax, but this is raw JSON query that works.
It checks whether your time field matches given range (start_time and end_time) and whether one of your terms exact matches trackId.
It returns only trackId, propertyId fields, as per your request:
POST /indice/_search
{
"_source": {
"include": [
"trackId",
"propertyId"
]
},
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"gte": "start_time",
"lte": "end_time"
}
}
},
{
"terms": {
"trackId": [
"terms"
]
}
}
]
}
}
}

Kendo - Grid - Aggregate with Complex Objects

I have a Kendo UI grid. The grid has a datasource with complex object data. For example, {"foo": {"bar" : 10}}. Although the column field can navigate the object graph (i.e. foo.bar), the aggregate field doesn't seem to be able to.
Here's the code:
var grid = $("#grid").kendoGrid({
dataSource: {
data: [
{"foo": {"bar": 10}},
{"foo": {"bar": 20}}
],
aggregate: [
{field: "foo.bar", aggregate: "sum"}
]
},
columns: [
{
field: "foo.bar",
footerTemplate: "Sum: #= sum # "
}
]
}).data("kendoGrid");
Here's the fiddle:
http://jsfiddle.net/e6shF/1/
Firebug reports "TypeError: data.foo is undefined" in line 8 of kendo.all.min.js.
Am I doing something incorrectly? If this is a bug in Kendo, is there a way to work around this? I have to keep the objects complex.
Here's a "better" anwser from Kendo Support:
The behavior you are experiencing is caused by the fact that the "path" you have specified will be used as a key in the map created as result of the aggregation. Producing a object similar to the following:
{ "foo.bar" : { sum: 30 } }
Unfortunately, this construct is not supported by the footer template generation and will not be resolved correctly. A possible workaround for this scenario is to use a function instead. I have modify the sample in order to illustrate this.
var grid = $("#grid").kendoGrid({
dataSource: {
data: [
{"foo": {"bar": 10}},
{"foo": {"bar": 20}}
],
aggregate: [
{field: "foo.bar", aggregate: "sum"}
]
},
columns: [
{
field: "foo.bar",
footerTemplate: function(data) { return "Sum: " + data["foo.bar"].sum; }
}
]
}).data("kendoGrid");
It is not possible to have complex objects in aggregates since dynamically generated function for evaluating it, thinks that foo.bar is the name of the field (just one field)?
Do you really need that complex field?
I might understand that the server (providing the data of the grid) sends that complex foo but you can always flatten it using parse or data functions in the datasource. Something like this:
var grid = $("#grid").kendoGrid({
dataSource:{
data:[
{"foo":{"bar":10}},
{"foo":{"bar":20}}
],
aggregate:[
{field:"foo_bar", aggregate:"sum"}
],
schema: {
parse:function (data) {
var res = [];
$.each(data, function (idx, elem) {
res.push({ "foo_bar":elem.foo.bar })
});
return res;
}
}
},
columns: [
{
field: "foo_bar",
footerTemplate:"Sum: #= sum # "
}
]
}).data("kendoGrid");
Where I transform received foo.bar into foo_bar and use this for aggregation.

Resources