How to check if Kusto input query parameters exist in a table? - azure-data-explorer

I am passing a list of node IDs to a Kusto query as input. Right now I am printing the rows of a table after checking if the rows' node IDs are in the input list. This is a way to check if the input node IDs are valid and have corresponding rows in the table. This is my current query:
declare query_parameters(nodeIds:string);let nodes = todynamic(parse_json(nodeIds)); let serverNames = database('xyz').abc | where NodeId in~ (nodes) | project NodeId, DeviceName; serverNames;
I would like to instead print all the input nodes, valid or invalid - and if they are valid, their corresponding rows in the table. How can I do that?
(Note: I changed the names of the database and table to preserve anonymity. Thanks.)
I've tried iff, case and !in~.

Instead of using the in operator, you can use a left outer join.
For example:
let T = datatable(NodeId:string, DeviceName:string) [
"n1","d1",
"n2","d2",
"n5","d5",
"n6","d6"
]
;
let nodeIds = '["n1","n3","n4","n6"]'
;
let nodes =
print nodes = parse_json(nodeIds)
| mv-expand NodeId = nodes to typeof(string)
| project NodeId
;
let serverNames =
nodes
| join kind=leftouter hint.strategy=broadcast (
T
| project NodeId, DeviceName
) on NodeId
;
serverNames
NodeId
NodeId1
DeviceName
n1
n1
d1
n6
n6
d6
n4
n3

Related

How to retreive custom property corresponding to another property in azure

I am trying to write a kusto query to retrieve a custom property as below.
I want to retrieve count of pkgName and corresponding organization. I could retrieve the count of pkgName and the code is attached below.
let mainTable = union customEvents
| extend name =replace("\n", "", name)
| where iif('*' in ("*"), 1 == 1, name in ("*"))
| where true;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension = customDimensions["pkgName"]
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(itemId) by tostring(dimension)
| extend Events = dcount_hll(hll)
| order by Events desc
| serialize rank = row_number()
| extend dimension = iff(rank > 10, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ['pkgName'] = dimension, Counts = dcount_hll(merged);
cohortedTable
Please help me to get the organization along with each pkgName projected.
Please try this simple query:
customEvents
| summarize counts=count(tostring(customDimensions.pkgName)) by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization)
Please feel free to modify it to meet your requirement.
If the above does not meet your requirement, please try to create another table which contains pkgName and organization relationship. Then use join operator to join these tables. For example:
//create a table which contains the relationship
let temptable = customEvents
| summarize by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization);
//then use the join operator to join these tables on the keyword pkgName.

SQLite and multiple insert clean

I would like to populate a freshly created Table in a SQLite DB.
In this table, some keys are references to other tables and I'd like not to hard-code these references
-> I'm currently using a "mapping" table in order to fetch ids using names (~ constants emulation)
The problem is: this solution works but is very verbose
Minimal working example: (storing dictionary words, using foreign keys to a category table)
-- Tables creation
CREATE TABLE categories(
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE words(
id INTEGER PRIMARY KEY,
id_category INTEGER NOT NULL,
name TEXT,
FOREIGN KEY(id_category) REFERENCES categories(id)
);
CREATE TABLE CONSTANTS(
name TEXT PRIMARY KEY,
value INTEGER NOT NULL
);
INSERT INTO categories(name) VALUES("noun");
INSERT INTO CONSTANTS(name, value) VALUES("category_noun", last_insert_rowid());
INSERT INTO categories(name) VALUES("abreviation");
INSERT INTO CONSTANTS(name, value) VALUES("category_abreviation", last_insert_rowid());
INSERT INTO categories(name) VALUES("character");
INSERT INTO CONSTANTS(name, value) VALUES("category_character", last_insert_rowid());
And now, the core of the problem: too much verbose.
In this example is only one foreign key, a few insert to illustrate the problem
INSERT INTO words(id_category, name) VALUES
((SELECT value FROM CONSTANTS WHERE name = "category_noun"),
"hello"),
((SELECT value FROM CONSTANTS WHERE name = "category_abreviation"),
"SO"),
((SELECT value FROM CONSTANTS WHERE name = "category_abreviation"),
"user"),
((SELECT value FROM CONSTANTS WHERE name = "category_character"),
"!")
;
I would like to have something looking like this pseudo-sqlite code:
-- same table creations as before
INSERT INTO words(id_category, name) VALUES
-- Fetch constants once
CAT_NOUM = SELECT value FROM CONSTANTS WHERE name = "category_noum"),
CAT_ABREV = SELECT value FROM CONSTANTS WHERE name = "category_abreviation"),
CAT_CHAR = SELECT value FROM CONSTANTS WHERE name = "category_abreviation")
)
-- Fill the table, using constants
(CAT_NOUM, "Hello"),
(CAT_ABREV, "SO"),
(CAT_NOUM, "user"),
(CAT_CHAR, "SO"),
...
;
I'm wondering if
There is already a SQLite solution to this problem
I should use something like sed to replace a hard-coded string like __SED__CAT_NOUM with its greped value in the SQLite script
Doing this stuff programmatically would be the right way
It is better to use INSERT...SELECT with UNION ALL instead of INSERT...VALUES:
INSERT INTO words(id_category, name)
SELECT value, 'hello' FROM CONSTANTS WHERE name = 'category_noun' UNION ALL
SELECT value, 'SO' FROM CONSTANTS WHERE name = 'category_abreviation' UNION ALL
SELECT value, 'user' FROM CONSTANTS WHERE name = 'category_abreviation' UNION ALL
SELECT value, '!' FROM CONSTANTS WHERE name = 'category_character';
See the demo.
Or use Row Values to join to CONSTANTS:
INSERT INTO words(id_category, name)
SELECT c.value, t.column2
FROM CONSTANTS C INNER JOIN (
VALUES ('category_noun', 'hello'),
('category_abreviation', 'SO'),
('category_abreviation', 'user'),
('category_character', '!')
) t ON t.column1 = c.name;
See the demo.
Results:
SELECT * FROM words;
| id | id_category | name |
| --- | ----------- | ----- |
| 1 | 1 | hello |
| 2 | 2 | SO |
| 3 | 2 | user |
| 4 | 3 | ! |

neo4j percentage of attribute for social network

How can I calculate the percentage of an attribute for all the connections of a social network?
In this particular sample I would want to calculate the fraudulence of a user by assessing its interactions (call, sms):
CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})
CREATE
(Alice)-[:CALL]->(Bob),
(Bob)-[:SMS]->(Charlie),
(Charlie)-[:SMS]->(Bob),
(Fanny)-[:SMS]->(Charlie),
(Esther)-[:SMS]->(Fanny),
(Esther)-[:CALL]->(David),
(David)-[:CALL]->(Alice),
(David)-[:SMS]->(Esther),
(Alice)-[:CALL]->(Esther),
(Alice)-[:CALL]->(Fanny),
(Fanny)-[:CALL]->(Fraudster)
When trying to query like:
MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)
I see the following error:
Invalid input '>': expected 0..9, '.', UnsignedHexInteger, UnsignedOctalInteger or UnsignedDecimalInteger (line 3, column 33 (offset: 66))
"RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)"
^
This query will return the percentage of connections to each fraud:
MATCH (:Person)-[:CALL|:SMS]->(f:Person)
WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
UNWIND fs AS f
WITH divisor, f
WHERE f.fraud = 1
RETURN f, COUNT(*)/divisor AS percentage
With the sample data, the result is:
+----------------------------------------------+
| f | percentage |
+----------------------------------------------+
| Node[13]{id:"h",fraud:1} | 9.090909090909092 |
| Node[6]{id:"a",fraud:1} | 9.090909090909092 |
+----------------------------------------------+
This query only needs a single scan of the DB, and is explicit about the node labels and relationship types -- to filter out any other data that might be in the DB.
In your RETURN section, you invoke a new query : MATCH (a) -->(b) RETURN count().
This is not allowed in Neo4j, you should make a sub-query with the WITH keyword for that :
MATCH ()-->()
WITH count(*) AS total
MATCH ()-->(b)
WHERE b.fraud = 1
RETURN toFloat(count(*)) / total * 100
Or in your case, because you only want the total count of relationship in your DB, you can make this query :
MATCH ()-->(b)
WHERE b.fraud = 1
RETURN toFloat(count(*)) / size(()-->()) * 100
Updates
adding toFloat on cypher queries,otherwise the division give an interger not a float

Return multiple COLUMN_JSON results as JSON array

I am storing data in standard tables in a MariaDB, but would like to return records from related tables as a JSON string.
What I intend to do is have a function where I can pass in exerciseId and the function returns a JSON string of all related exerciseMuscle records, meaning each exercise record returned by a stored proc can also include nested data from child tables.
I have been able to create JSON records using COLUMN_JSON and COLUMN_CREATE but can only get this to return as a set of individual records, rather than an array of JSON values as a need. The SQL I'm using is:
select
e.id,
CONVERT(COLUMN_JSON(COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)) USING utf8) as musclesJson
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
This returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"}
| 96 | {"role":"secondary","muscle":"shoulders"}
When what I want is:
| id | musclesJson
| 96 | [{"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}]
Is it possible to return multiple results in one row without having to iterate through the results and build it manually? If I add a group by to the SQL then the JSON only includes the first record.
Turns out it was GROUP_CONCAT that I needed, and specifying a comma as the delimiter. So changing my SQL to:
select
e.id,
CONVERT(
GROUP_CONCAT(
COLUMN_JSON(
COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)
)
SEPARATOR ','
) USING utf8) as muscles
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
Returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}

SQLite subquery: "IN" the result of the outer query

I have two tables user and pair. I want to get the number of duplicate pairs (a, b) for each user.name.
user
name | id
-------------
"Alice" | 0
"Bob" | 1
"Alice" | 2
pair
id | a | b
-----------
0 | 0 | 1
0 | 1 | 3
1 | 0 | 1
2 | 1 | 3
In the above example, the result should be:
name | id | c
-------------------
"Alice" | 0,2 | 1
"Bob" | 1 | 0
When there is only one id for each user, I can do this:
SELECT name, id, (
SELECT COUNT(*) FROM pair JOIN pair AS p USING (id, a, b)
WHERE id = user.id AND pair.rowid < p.rowid
) AS c FROM user;
When there is multiple ids, I can get the correct result from the below query, but it is quite slow when there is more rows and more subqueries.
SELECT name, GROUP_CONCAT(id), (
WITH t AS (SELECT id FROM user AS u WHERE name = user.name)
SELECT COUNT(*) FROM pair JOIN pair AS p USING (a, b)
WHERE pair.id IN t AND p.id IN t AND pair.rowid < p.rowid
) AS c FROM user GROUP BY name;
I want to know that is there a simple and efficient way for this, like changing the WHERE clause from pair.id = user.id to pair.id IN <<the user.id list>>?
/* This will not work! "Error: no such table: user.id" */
SELECT name, GROUP_CONCAT(id), (
SELECT COUNT(*) FROM pair JOIN pair AS p USING (a, b)
WHERE pair.id IN user.id AND p.id IN user.id AND pair.rowid < p.rowid
) AS c FROM user GROUP BY name;
The GROUP BY name operation can be sped up if the database is able to go through the rows in order, without having to sort the table.
This can be done with an index on the name column (the other column makes this a covering index, which helps only a little more):
CREATE INDEX user_name_id_index ON user(name, id);
The query looks up pair rows by their id, a, and b values; these lookups can be sped up with an index on these columns:
CREATE INDEX pair_id_a_b_index ON pair(id, a, b);
To help the query optimizer make better decisions when selecting indexes, run ANALYZE.
The query optimizer gets improved constantly; get the newest SQLite version, if possible.
To check how your queries are executed, look at the output of the EXPLAIIN QUERY PLAN command.

Resources