DynamoDB boto3 : How to query if a key has some value - amazon-dynamodb

I'm writing a python code to query a table on dynamo DB. Along with other conditions I'd also like to get rows where there exists some value for the key I'm specifying.
KeyConditionExpression = Key('Status').eq('Done') & Key('Name').begins_with('A') & Key('Error').exists()
In the code above, I'd like to display rows where the column "Error" has some value be it anything.
but the last condition is throwing an error.
AttributeError: 'Key' object has no attribute 'exists'.
How can modify the code to incorporate the third query?

Assuming that your “Error” key is neither your partition key and nor sort key as you must specify the partition key name and value as an equality condition always. Also, for the sort key you can use one of the undermentioned comparison operators.
a = b — true if the attribute a is equal to the value b
a < b — true if a is less than b
a <= b — true if a is less than or equal to b
a > b — true if a is greater than b
a >= b — true if a is greater than or equal to b
a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
Now coming back to your issue, if you are using Query operation, then you can definitely use QueryFilter to check if some non-null value exists for key “Error”. The official documentation defines QueryFilter as :
In a Query operation, QueryFilter is a condition that evaluates the query results after the items are read and returns only the desired values.
The following comparison operators are available for QueryFilter:
EQ | NE | LE | LT | GE | GT | NOT_NULL | NULL | CONTAINS | NOT_CONTAINS | BEGINS_WITH | IN | BETWEEN
Please refer to the undermentioned link to get more idea about the QueryFilter and FilterExpression
AWS Official Documentation Link : https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/LegacyConditionalParameters.QueryFilter.html

Related

Can we make pack_all consider only non-null & non-empty columns

pack_all() function considers all the input columns while making a dynamic object. Is it possible to somehow force it to consider only non-empty & non-null columns? If not, is there any workaround to apply filter on top of the resulting dynamic value?
There is no flavor of pack_all that will do it, but as an alternative, you can combine mv-apply and mv-expand operators to achieve this. Here is an example (adapted from the docs):
datatable(SourceNumber:string,TargetNumber:string,CharsCount:long)
[
'555-555-1234','555-555-1212',46,
'555-555-1234','555-555-1213',50,
'555-555-1212','',int(null)
]
| extend values =pack_all()
| mv-apply removeProperties = values on
(
mv-expand kind = array values
| where isempty(values[1])
| summarize propsToRemove = make_set(values[0])
)
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove
It should be added as a new answer, that pack_all() did in the meantime get a new option to exclude null/empty values
pack_all([ignore_null_empty])
ignore_null_empty: An optional bool indicating whether to
ignore null/empty columns and exclude them from the resulting property
bag. Default: false.
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packallfunction

Kusto sub query selection using toscalar - returns only last matching record

I am referring sqlcheatsheet - Nested queries
Query 1:
traces
| where customDimensions.Domain == "someDomain"
| where message contains "some-text"
| project itemId=substring(itemId,indexof(itemId,"-"),strlen(itemId))
Result :
itemId
-c580-11e9-888a-8776d3f65945
-c580-11e9-888a-8776d3f65945
-c580-11e9-9b01-c3be0f4a2bf2
Query 2:
traces
| where customDimensions.Domain == "someDomain"
| where itemId has toscalar(
traces
| where customDimensions.Domain == "someDomain"
| where message contains "some-text"
| project itemId=substring(itemId,indexof(itemId,"-"),strlen(itemId)))
Result for the second query returns records matching only last record of sub query
ie:) > -c580-11e9-9b01-c3be0f4a2bf2
Question :
How get entire result set that has matching with all the three items.
My requirement is to take entire sequence of logs for a particular request.
To get that I have below inputs, I could able to take one log, from that I can find ItemId
The itemId looks like "b5066283-c7ea-11e9-9e9b-2ff40863cba4". Rest of all logs related to this request must have "-c7ea-11e9-9e9b-2ff40863cba4" this value. Only first part will get incremented like b5066284 , b5066285, b5066286 like that.
toscalar(), as its name implies, returns a scalar value.
Given a tabular argument with N columns and M rows it'll return the value in the 1st column and the 1st row.
For example: the following will return a single value - 1
let T = datatable(a:int, b:int, c:int)
[
1,2,3,
4,5,6,
7,8,9,
]
;
print toscalar(T)
If I understand the intention in your 2nd query correctly, you should be able to achieve your requirement by using has_any.
For example:
let T = datatable(item_id:string)
[
"c580-11e9-888a-8776d3f65945",
"c580-11e9-888a-8776d3f65945",
"c580-11e9-9b01-c3be0f4a2bf2",
]
;
T
| where item_id has_any (
(
T
| parse item_id with * "-" item_id
)
)

Replacing empty string column with null in Kusto

How do I replace empty (non null) column of string datatype with null value?
So say the following query returns non zero recordset:-
mytable | where mycol == ""
Now these are the rows with mycol containing empty strings. I want to replace these with nulls. Now, from what I have read in the kusto documentation we have datatype specific null literals such as int(null),datetime(null),guid(null) etc. But there is no string(null). The closest to string is guid, but when I use it in the following manner, I get an error:-
mytable | where mycol == "" | extend test = translate(mycol,guid(null))
The error:-
translate(): argument #0 must be string literal
So what is the way out then?
Update:-
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| summarize myset=make_set(s) by n
If you execute this, you can see that empty strings are being considered as part of sets. I don't want this, no such empty strings should be part of my array. But at the same time I don't want to lose value of n, and this is exactly what will happen if I if I use isnotempty function. So in the following example, you can see that the row where n=12 is not returned, there is no need to skip n=12, one could always get an empty array:-
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| where isnotempty(s)
| summarize myset=make_set(s) by n
There's currently no support for null values for the string datatype: https://learn.microsoft.com/en-us/azure/kusto/query/scalar-data-types/null-values
I'm pretty certain that in itself, that shouldn't block you from reaching your end goal, but that goal isn't currently clear.
[update based on your update:]
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| summarize make_set(todynamic(s)) by n

JanusGraph - Warning about all vertices scan after index was created

I am using Janusgraph 0.2.0 and have the following vertex defined (in Python):
class Airport(TypedVertex):
type = goblin.VertexProperty(goblin.String, card=Cardinality.single)
airport_code = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_city = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_name = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_region = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_runways = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_longest_runway = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_elev = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_country = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_lat = goblin.VertexProperty(goblin.Float,
card=Cardinality.single)
airport_long = goblin.VertexProperty(goblin.Float,
card=Cardinality.single)
I then defined an index for this node on the airport code field using the following commands (some commands were excluded to keep it short).
mgmt.makePropertyKey('type').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_city').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_code').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_country').dataType(String.class).cardinality(Cardinality.SINGLE).make()
airport_code = mgmt.getPropertyKey('airport_code')
airport_city = mgmt.getPropertyKey('airport_city')
airport_country = mgmt.getPropertyKey('airport_country')
mgmt.buildIndex('by_airport_code_unique', Vertex.class).addKey(airport_code).unique().buildCompositeIndex()
mgmt.buildIndex('by_airport_city', Vertex.class).addKey(airport_city).buildCompositeIndex()
mgmt.buildIndex('by_airport_country', Vertex.class).addKey(airport_country).buildCompositeIndex()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_code_unique').call()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_city').call()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_country').call()
After the creating, I use a script to describe the :schema and I see that all the indexes are Registered:
| Graph Index . | Type . | Element | Unique | Backing | PropertyKey | Status |
|-----------------------:|:-----|:--------|:-------|:--------|:-----------|:--------|
| by_airport_code_unique | Composite | JanusGraphVertex | true | internalindex | airport_code | REGISTERED |
| by_airport_city | Composite | JanusGraphVertex | false | internalindex | airport_city | REGISTERED |
| by_airport_country | Composite | JanusGraphVertex | false | internalindex | airport_country | REGISTERED |
When I try to insert the second vertex with the same airport_code, as expected, I get an exception on constraint violation. However, if I go into the gremlin console and run a traversal to retrieve the vertices by their airport_code:
g.V().has('airport_code').values()
I get a warning: WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
I had a similar problem a few weeks ago, and the issue was that I was trying to define indexes based on labels and I was told that at the time, janusgraph does not support indexes on labels. However, I don't think this is the case here.
Any suggestions or ideas on why my index is not working or not being used?
Thanks in advance for any help.
--MD
You are seeing the warning because your query does not utilize the index. A composite index is used for equality matches.
Composite indexes are very fast and efficient but limited to equality lookups for a particular, previously-defined combination of property keys. Mixed indexes can be used for lookups on any combination of indexed keys and support multiple condition predicates in addition to equality depending on the backing index store.
In order to leverage a composite index, you need to provide the property and a value to match. For example:
g.V().has('airport_code', 'JFK').toList()
I'm not sure why the index wasn't ENABLED after creation, perhaps something in the steps you left out. If you create the index within the same management transaction as the property keys, it should be ENABLED rather than REGISTERED. Check out the index lifecycle wiki.

Trying to put same values in a item's list and getting ValidationException

I recently decided to try out amazon-dynamodb and still trying to get a hold of it.
In my use-case, i have to store two variables which have same values as a string list ("SS") in an item to DynamoDB. When i tried to do that, this is what i'm getting
[ValidationException: One or more parameter values were invalid: Input collection [X,X] contains duplicates.]
message: 'One or more parameter values were invalid: Input collection [X, X] contains duplicates.',
code: 'ValidationException',
time: Fri Apr 25 2014 20:38:21 GMT+0000 (UTC),
statusCode: 400
My question: Is there a way to store duplicate values in an item's list?
Any knowledge regarding this will be appreciated.
No, the String "list" is actually "set". If you are trying to add a list of values with duplicates, it will give you such an exception; if you are trying to add values that are already in the set, it will succeed silently.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html#DataModel.DataTypes
In DynamoDB the Hash Key, or Hash Key + Range Key combination must be unique.
On a Hash Key (id) Only Table, id must be unique.
id
--------
john
mary
jane
On a Hash Key (id) + Range Key (timestamp) Table, combination of id + timestamp must be unique
id | timestamp
--------|-------------------------
john | 2014-04-28T07:53:29.000Z
john | 2014-04-28T08:53:29.000Z
john | 2014-04-28T09:53:29.000Z
mary | 2014-04-28T07:53:29.000Z
jane | 2014-04-28T07:53:29.000Z
If your table has a Hash Key whose type is a String Set, then DynamoDB expects the values of the String Set to be unique as well.
id (String Set)
------------------------
["john"]
["mary"]
["jane"]
["john", "mary"]
["john", "jane"]
["john", "jane", "mary"]
Therefore, if you are trying to achieve the following below, if can be done because each combination is unique.
id (String Set)
----------------
["john", "mary"]
["john", "jane"]
But if you are trying to achieve the following below, then an exception will occur:
id (String Set)
----------------
["john", "mary"]
["john", "mary"]

Resources