Replacing empty string column with null in Kusto - azure-data-explorer

How do I replace empty (non null) column of string datatype with null value?
So say the following query returns non zero recordset:-
mytable | where mycol == ""
Now these are the rows with mycol containing empty strings. I want to replace these with nulls. Now, from what I have read in the kusto documentation we have datatype specific null literals such as int(null),datetime(null),guid(null) etc. But there is no string(null). The closest to string is guid, but when I use it in the following manner, I get an error:-
mytable | where mycol == "" | extend test = translate(mycol,guid(null))
The error:-
translate(): argument #0 must be string literal
So what is the way out then?
Update:-
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| summarize myset=make_set(s) by n
If you execute this, you can see that empty strings are being considered as part of sets. I don't want this, no such empty strings should be part of my array. But at the same time I don't want to lose value of n, and this is exactly what will happen if I if I use isnotempty function. So in the following example, you can see that the row where n=12 is not returned, there is no need to skip n=12, one could always get an empty array:-
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| where isnotempty(s)
| summarize myset=make_set(s) by n

There's currently no support for null values for the string datatype: https://learn.microsoft.com/en-us/azure/kusto/query/scalar-data-types/null-values
I'm pretty certain that in itself, that shouldn't block you from reaching your end goal, but that goal isn't currently clear.
[update based on your update:]
datatable(n:int,s:string)
[
10,"hello",
10,"",
11,"world",
11,"",
12,""
]
| summarize make_set(todynamic(s)) by n

Related

Extract the numeric value from string in Kusto

This is my datatable:
datatable(Id:dynamic)
[
dynamic([987654321][Just Kusto Things]),
]
and I've extracted 1 field from a json using
| project ID=parse_json(Data).["CustomValue"]
And the result is something like - [987654321][Just Kusto Things]. I wanted to extract the numbered value(987654321) within the 1st square brackets. How to best retrieve that value? Using split/parse/extract?
the datatable in the sample is not valid. If the values are just an array then you can get the results by using the array position like this:
datatable(Id:dynamic)
[
dynamic([987654321,"Just Kusto Things"]),
]
| extend Id = Id[0]
If it is something else, please provide a valid datatable with an example that is representative of the real data.
the result is something like - [987654321][Just Kusto Things]. I wanted to extract the numbered value(987654321) within the 1st square brackets. How to best retrieve that value?
you can use the parse operator
For example:
print input = '[987654321][Just Kusto Things]'
| parse input with '[' output:long ']' *

Can we make pack_all consider only non-null & non-empty columns

pack_all() function considers all the input columns while making a dynamic object. Is it possible to somehow force it to consider only non-empty & non-null columns? If not, is there any workaround to apply filter on top of the resulting dynamic value?
There is no flavor of pack_all that will do it, but as an alternative, you can combine mv-apply and mv-expand operators to achieve this. Here is an example (adapted from the docs):
datatable(SourceNumber:string,TargetNumber:string,CharsCount:long)
[
'555-555-1234','555-555-1212',46,
'555-555-1234','555-555-1213',50,
'555-555-1212','',int(null)
]
| extend values =pack_all()
| mv-apply removeProperties = values on
(
mv-expand kind = array values
| where isempty(values[1])
| summarize propsToRemove = make_set(values[0])
)
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove
It should be added as a new answer, that pack_all() did in the meantime get a new option to exclude null/empty values
pack_all([ignore_null_empty])
ignore_null_empty: An optional bool indicating whether to
ignore null/empty columns and exclude them from the resulting property
bag. Default: false.
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packallfunction

Kusto sub query selection using toscalar - returns only last matching record

I am referring sqlcheatsheet - Nested queries
Query 1:
traces
| where customDimensions.Domain == "someDomain"
| where message contains "some-text"
| project itemId=substring(itemId,indexof(itemId,"-"),strlen(itemId))
Result :
itemId
-c580-11e9-888a-8776d3f65945
-c580-11e9-888a-8776d3f65945
-c580-11e9-9b01-c3be0f4a2bf2
Query 2:
traces
| where customDimensions.Domain == "someDomain"
| where itemId has toscalar(
traces
| where customDimensions.Domain == "someDomain"
| where message contains "some-text"
| project itemId=substring(itemId,indexof(itemId,"-"),strlen(itemId)))
Result for the second query returns records matching only last record of sub query
ie:) > -c580-11e9-9b01-c3be0f4a2bf2
Question :
How get entire result set that has matching with all the three items.
My requirement is to take entire sequence of logs for a particular request.
To get that I have below inputs, I could able to take one log, from that I can find ItemId
The itemId looks like "b5066283-c7ea-11e9-9e9b-2ff40863cba4". Rest of all logs related to this request must have "-c7ea-11e9-9e9b-2ff40863cba4" this value. Only first part will get incremented like b5066284 , b5066285, b5066286 like that.
toscalar(), as its name implies, returns a scalar value.
Given a tabular argument with N columns and M rows it'll return the value in the 1st column and the 1st row.
For example: the following will return a single value - 1
let T = datatable(a:int, b:int, c:int)
[
1,2,3,
4,5,6,
7,8,9,
]
;
print toscalar(T)
If I understand the intention in your 2nd query correctly, you should be able to achieve your requirement by using has_any.
For example:
let T = datatable(item_id:string)
[
"c580-11e9-888a-8776d3f65945",
"c580-11e9-888a-8776d3f65945",
"c580-11e9-9b01-c3be0f4a2bf2",
]
;
T
| where item_id has_any (
(
T
| parse item_id with * "-" item_id
)
)

DynamoDB boto3 : How to query if a key has some value

I'm writing a python code to query a table on dynamo DB. Along with other conditions I'd also like to get rows where there exists some value for the key I'm specifying.
KeyConditionExpression = Key('Status').eq('Done') & Key('Name').begins_with('A') & Key('Error').exists()
In the code above, I'd like to display rows where the column "Error" has some value be it anything.
but the last condition is throwing an error.
AttributeError: 'Key' object has no attribute 'exists'.
How can modify the code to incorporate the third query?
Assuming that your “Error” key is neither your partition key and nor sort key as you must specify the partition key name and value as an equality condition always. Also, for the sort key you can use one of the undermentioned comparison operators.
a = b — true if the attribute a is equal to the value b
a < b — true if a is less than b
a <= b — true if a is less than or equal to b
a > b — true if a is greater than b
a >= b — true if a is greater than or equal to b
a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
Now coming back to your issue, if you are using Query operation, then you can definitely use QueryFilter to check if some non-null value exists for key “Error”. The official documentation defines QueryFilter as :
In a Query operation, QueryFilter is a condition that evaluates the query results after the items are read and returns only the desired values.
The following comparison operators are available for QueryFilter:
EQ | NE | LE | LT | GE | GT | NOT_NULL | NULL | CONTAINS | NOT_CONTAINS | BEGINS_WITH | IN | BETWEEN
Please refer to the undermentioned link to get more idea about the QueryFilter and FilterExpression
AWS Official Documentation Link : https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/LegacyConditionalParameters.QueryFilter.html

Filenet query number conversion in order by

I have this Filenet query:
SELECT
[This], [Ente], [IDAtto], [Numero], [Tipologia], [DataEmissione]
FROM
[AttoNormativo]
WHERE
([DataEmissione] > 20160405T215959Z AND [DataEmissione] < 20160408T220001Z)
ORDER BY
[DataEmissione] desc, [Tipologia], [Numero], [Ente]
OPTIONS (TIMELIMIT 180)
The problem is that [Numero] property is string type, so it does not order properly. There is some cast function that I can use to convert it numeric?
Thank you very much.
No, there is not. According to the docs the orderby is a property_spec followed optionally by ASC or DESC.
<orderby> ::= <property_spec> [ ASC | DESC ]
The only function allowed in the ORDER BY is COALESCE() which can be used to provide a default sorting value when the data is null.
As per the documentation, properties of type Boolean, DateTime, Float64, ID, Integer32, and Object may appear in an ORDER BY clause, along with short String properties. Neither Binary nor long String properties may be used to order a query.
You can define a custom string property to store in either a short or long database column by setting the UsesLongColumn property when the property is created.
Now - if you are worried about the null values, then you may consider using the COALESCE function.
<orderby> ::= [ COALESCE '(' <property_spec>, <literal> ')' || <property_spec> ] [ ASC | DESC ]
You can find more about Relational Queries - here.

Resources