Create a dynamic dictionary from a column for keys and a column for values in Kusto - azure-data-explorer

If I have a table like below, how do I create a dictionary of dynamic type from the 2 columns? E.g. {"a":"1", "b":"2", etc}
let test = datatable (
keys: string,
vals: string
) [
"a,b,c,d", "1,2,3,4"
];
There is the split() and zip() function but they create array of arrays and that doesn't work with todynamic()

Alternate variation
let test = datatable (keys: string, vals: string) ["a,b,c,d", "1,2,3,4"];
test
| mv-apply k = split(keys, ",") to typeof(string)
,v = split(vals, ",") to typeof(string)
on (summarize make_bag(bag_pack(k, v)))
keys
vals
bag_
a,b,c,d
1,2,3,4
{"a":"1","b":"2","c":"3","d":"4"}
Fiddle

You could try something like this, assuming both input arrays have the same length:
test
| extend keys = split(keys, ","),
vals = split(vals, ",")
| mv-apply with_itemindex = i k = keys to typeof(string) on (
summarize bag = make_bag(pack(k, vals[i]))
)
| project bag

Related

Kusto KQL: how to check if JSON array in dataset contains element of another array?

The dataset (table) I'm querying has a column containing a JSON string array.
I have a fixed list of verbs which I need to check against each entry in the table and find those, where at least one of the items in the JSON list starts with one of the verbs from the fixed list.
// Verbs to look for (actual list is longer).
let verbs = datatable (verb : string) [
"discover",
"gain"
];
// Data. Second column is a JSON string.
let data = datatable(id : int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
];
// Query: I need to know if at least one of the items in the "json" column starts
// with one of the verbs of the "verbs" list.
data
| extend parsedJson = parse_json(json)
| extend OneOrMoreListItemsHaveVerb = false
| project id, OneOrMoreListItemsHaveVerb
I tried to use mv_apply() but failed because I'm dealing with two lists/arrays compared against each other, not one array and one item.
For the example data above, I expect items with IDs 1 and 3 to be returned. The first element of item 1 has "discover" and the 2nd element of item 3 starts with "gain".
you could create an array from your input table (e.g. using summarize make_set()), then loop over it using mv-apply foreach of the inputs.
for example:
let verbs = datatable (verb: string) [
"discover", "gain"
]
;
let verbs_list = toscalar(verbs | summarize make_set(verb))
;
let data = datatable(id: int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
data
| mv-apply verb = verbs_list on (
mv-apply input = parse_json(json) on (
where input startswith verb
)
)
| project ['id'], json
id
json
1
["Discover me", "some text"]
3
["first element fine", "gain power isn't ok"]
alternatively, you can implement similar logic using the partition operator:
let verbs = datatable (verb: string) [
"discover", "gain"
]
;
let data = datatable(id: int, json: string) [
1, "[\"Discover me\", \"some text\"]",
2, "[\"All good\", \"no invalid verbs\"]",
3, "[\"first element fine\", \"gain power isn't ok\"]",
]
;
verbs
| partition by verb
{
data
| mv-apply input = parse_json(json) on(
where input startswith verb
)
| project ['id'], json
}
id
json
1
["Discover me", "some text"]
3
["first element fine", "gain power isn't ok"]

Kusto Query Dynamic sort Order

I have started working on Azure Data Explorer( Kusto) recently.
My requirement to make sorting order of Kusto table in dynamic way.
// Variable declaration
let SortColumn ="run_date";
let OrderBy="desc";
// Actual Code
tblOleMeasurments
| take 10
|distinct column1,column2,column3,run_date
|order by SortColumn OrderBy
Here My code working fine till Sortcolumn but when I tried to add [OrderBy] after [SortColumn] kusto gives me error .
My requirement here is to pass Asc/desc value from Variable [OrderBy].
Kindly assist here with workarounds and solutions which help me .
The sort column and order cannot be an expression, it must be a literal ("asc" or "desc"). If you want to pass the sort column and sort order as a variable, create a union instead where the filter on the variables results with the desired outcome. Here is an example:
let OrderBy = "desc";
let sortColumn = "run_date";
let Query = tblOleMeasurments | take 10 |distinct column1,column2,column3,run_date;
union
(Query | where OrderBy == "desc" and sortColumn == "run_date" | order by run_date desc),
(Query | where OrderBy == "asc" and sortColumn == "run_date" | order by run_date asc)
The number of union legs would be the product of the number of candidate sort columns times two (the two sort order options).
An alternative would be sorting by a calculated column, which is based on your sort_order and sort_column. The example below works for numeric columns
let T = range x from 1 to 5 step 1 | extend y = -10 * x;
let sort_order = "asc";
let sort_column = "y";
T
| order by column_ifexists(sort_column, "") * case(sort_order == "asc", -1, 1)

neo4j percentage of attribute for social network

How can I calculate the percentage of an attribute for all the connections of a social network?
In this particular sample I would want to calculate the fraudulence of a user by assessing its interactions (call, sms):
CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})
CREATE
(Alice)-[:CALL]->(Bob),
(Bob)-[:SMS]->(Charlie),
(Charlie)-[:SMS]->(Bob),
(Fanny)-[:SMS]->(Charlie),
(Esther)-[:SMS]->(Fanny),
(Esther)-[:CALL]->(David),
(David)-[:CALL]->(Alice),
(David)-[:SMS]->(Esther),
(Alice)-[:CALL]->(Esther),
(Alice)-[:CALL]->(Fanny),
(Fanny)-[:CALL]->(Fraudster)
When trying to query like:
MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)
I see the following error:
Invalid input '>': expected 0..9, '.', UnsignedHexInteger, UnsignedOctalInteger or UnsignedDecimalInteger (line 3, column 33 (offset: 66))
"RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)"
^
This query will return the percentage of connections to each fraud:
MATCH (:Person)-[:CALL|:SMS]->(f:Person)
WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
UNWIND fs AS f
WITH divisor, f
WHERE f.fraud = 1
RETURN f, COUNT(*)/divisor AS percentage
With the sample data, the result is:
+----------------------------------------------+
| f | percentage |
+----------------------------------------------+
| Node[13]{id:"h",fraud:1} | 9.090909090909092 |
| Node[6]{id:"a",fraud:1} | 9.090909090909092 |
+----------------------------------------------+
This query only needs a single scan of the DB, and is explicit about the node labels and relationship types -- to filter out any other data that might be in the DB.
In your RETURN section, you invoke a new query : MATCH (a) -->(b) RETURN count().
This is not allowed in Neo4j, you should make a sub-query with the WITH keyword for that :
MATCH ()-->()
WITH count(*) AS total
MATCH ()-->(b)
WHERE b.fraud = 1
RETURN toFloat(count(*)) / total * 100
Or in your case, because you only want the total count of relationship in your DB, you can make this query :
MATCH ()-->(b)
WHERE b.fraud = 1
RETURN toFloat(count(*)) / size(()-->()) * 100
Updates
adding toFloat on cypher queries,otherwise the division give an interger not a float

Return multiple COLUMN_JSON results as JSON array

I am storing data in standard tables in a MariaDB, but would like to return records from related tables as a JSON string.
What I intend to do is have a function where I can pass in exerciseId and the function returns a JSON string of all related exerciseMuscle records, meaning each exercise record returned by a stored proc can also include nested data from child tables.
I have been able to create JSON records using COLUMN_JSON and COLUMN_CREATE but can only get this to return as a set of individual records, rather than an array of JSON values as a need. The SQL I'm using is:
select
e.id,
CONVERT(COLUMN_JSON(COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)) USING utf8) as musclesJson
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
This returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"}
| 96 | {"role":"secondary","muscle":"shoulders"}
When what I want is:
| id | musclesJson
| 96 | [{"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}]
Is it possible to return multiple results in one row without having to iterate through the results and build it manually? If I add a group by to the SQL then the JSON only includes the first record.
Turns out it was GROUP_CONCAT that I needed, and specifying a comma as the delimiter. So changing my SQL to:
select
e.id,
CONVERT(
GROUP_CONCAT(
COLUMN_JSON(
COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)
)
SEPARATOR ','
) USING utf8) as muscles
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
Returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}

Cassandra - CqlEngine - using collection

I want to know how I can work with collection in cqlengine
I can insert value to list but just one value so I can't append some value to my list
I want to do this:
In CQL3:
UPDATE users
SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
In CqlEngine:
connection.setup(['127.0.0.1:9160'])
TestModel.create(id=1,field1 = [2])
this code will add 2 to my list but when I insert new value it replace by old value in list.
The only help in Cqlengine :
https://cqlengine.readthedocs.org/en/latest/topics/columns.html#collection-type-columns
And I want to know that how I can Read collection field by cqlengine.
Is it an dictionary in my django project? how I can use it?!!
Please help.
Thanks
Looking at your example it's a list.
Given a table based on the Cassandra CQL documentation:
CREATE TABLE plays (
id text PRIMARY KEY,
game text,
players int,
scores list<int>
)
You have to declare model like this:
class Plays(Model):
id = columns.Text(primary_key=True)
game = columns.Text()
players = columns.Integer()
scores = columns.List(columns.Integer())
You can create a new entry like this (omitting the code how to connect):
Plays.create(id = '123-afde', game = 'quake', players = 3, scores = [1, 2, 3])
Then to update the list of scores one does:
play = Plays.objects.filter(id = '123-afde').get()
play.scores.append(20) # <- this will add a new entry at the end of the list
play.save() # <- this will propagate the update to Cassandra - don't forget it
Now if you query your data with the CQL client you should see new values:
id | game | players | scores
----------+-------+---------+---------------
123-afde | quake | 3 | [1, 2, 3, 20]
To get the values in python you can simply use an index of an array:
print "Length is %(len)s and 3rd element is %(val)d" %\
{ "len" : len(play.scores), "val": play.scores[2] }

Resources