fairly new to KQL. I need to find if a certain pair of associated values (x,y) exists in a table (T).
My thought was to write the line:
let T =
...
...
...
DataTable
| where x, y in T
but the in operator only takes one argument as input, so this does not work. How can I find only x,y pairs that exist in T?
If I understand correctly, I think you're looking for a join between the tables. Will the following work for you?
Return all records from DataTable for which x,y exist in T. Note that it's recommended to place the smaller data set on the left side of the join, so you may want to switch the order based on your datasets.
let T = datatable(x:string, y:string)
[
"A", "B"
];
let DataTable = datatable(x:string, y:string, col1:long)
[
"A", "B", 1,
"C", "D", 2
];
T
| join kind=inner DataTable on x,y
| project-away x1, y1
Related
I have an Azure data explorer table that contains the values of property fields for objects in my source database. The table has rows for different types of object, so not all columns are applicable to each object type.
I'd like to run queries to show the data for objects, but only project the columns that are populated with values and not the columns that are not applicable. So I won't know the column names at the time of querying as they are being triggered by an action that only contains the object name, not type, or schema.
Here is a way to achieve this (credit Alex):
datatable(col1:string, col2: string , col3:int)
[
'aa', '', 5,
'cc', 'dd', int(null)
]
| where col1=="aa"
| as T
| extend values = pack_all()
| mv-apply values on
(
mv-expand kind = array values
| where isnotempty(values[1])
| summarize EmptyValuesRemoved = make_bag(pack(tostring(values[0]), values[1]))
)
| project EmptyValuesRemoved
| evaluate bag_unpack(EmptyValuesRemoved)
Results:
In case the data was ingested as json, you can save the original object in a column of its own by mapping the root object (use the "$" notation for that), then you just need to return that column.
I've used isnotempty:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/isnotemptyfunction
There is also isnotnull.
I am trying to write a code which deletes all repeated elements in a Vector. How do I do this?
I already tried using unique and union but they both delete all the repeated items but 1. I want all to be deleted.
For example: let x = [1,2,3,4,1,6,2]. Using union or unique returns [1,2,3,4,6]. What I want as my result is [3,4,6].
There are lots of ways to go about this. One approach that is fairly straightforward and probably reasonably fast is to use countmap from StatsBase:
using StatsBase
function f1(x)
d = countmap(x)
return [ key for (key, val) in d if val == 1 ]
end
or as a one-liner:
[ key for (key, val) in countmap(x) if val == 1 ]
countmap creates a dictionary mapping each unique value from x to the number of times it occurs in x. The solution can then be easily found by extracting every key from the dictionary that maps to val of 1, ie all elements of x that occur precisely once.
It might be faster in some situations to use sort!(x) and then construct an index for the elements of the sorted x that only occur once, but this will be messier to code, and also the output will be in sorted order, which you may not want. The countmap method preserves the original ordering.
I need to go through few millions of data searching for a year sent as a parameter to a method. The year comes as a varchar.
This is the query I'm working with
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND to_char(cre_date, 'YYYY') = year_;
cre_ date is of type date and year_ is from type carchar.
when performing this query it take around 25 minutes to process it completely.
Is anyone knows about a different approach to find out the quick execution.
Please help.
This didn't work out.
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND cre_date LIKE '%2013';
The reason might be 'cre_date' and '%2013' are of different types
If you have an index on (mch_code, contract, cre_date) columns, you can improve performance by doing something like:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= to_date('01/01/'||year_, 'dd/mm/yyyy')
and cre_date < add_months(to_date('01/01/'||year_, 'dd/mm/yyyy'), 12);
Even better would be to declare the start of the year as a DATE variable prior to running the sql, eg:
v_year_dt := to_date('01/01/'||year_, 'dd/mm/yyyy');
which would make the query:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= v_year_dt
and cre_date < add_months(v_year_dt, 12);
If you don't have an index on those three columns, you could create a function based index on (mch_code, contract, to_char(cre_date, 'yyyy')) that should help speed up your query, depending on the percentage of rows you're expecting to select. It may help even more if you added the x and y columns into the index, so that no table access was required at all.
Alternatively, you could think about partitioning the table on cre_date, monthly or yearly.
The reason your query is slow is that you're applying a function to a column on every row in your table. Let's try it another way:
SELECT X,Y
FROM A
WHERE mch_code = 'KN' AND
contract = '15KTN' AND
CRE_DATE BETWEEN TO_DATE('01/01/' || year_, 'DD/MM/YYYY')
AND TO_DATE('01/01/' || year_, 'DD/MM/YYYY') + INTERVAL '1' YEAR;
This eliminates the need to apply a function against every row in the table, and should allow any indexes on CRE_DATE to be used.
Best of luck.
You can try with EXTRACT function:
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND EXTRACT(YEAR FROM cre_date) = year_;
I have a dictionary of names with a number (a score) assigned to them. The file is laid out as so:
Person A,7
Peron B,6
If a name is repeated in the file e.g. Person B occurred on 3 lines with 3 different scores I want to calculate the mean average of these scores then append this result to a dictionary in the form of a list. However, I keep encountering an error when i try to sort the dictionary. Code below.
else:
for key in results:
keyValue = results[key]
if len(keyValue) > 1:
# Line below this needs modification
keyValue = list(sum(keyValue)/len(keyValue))
newResults[key] = keyValue
# Error in above code...
else:
newResults[key] = keyValue
print(newResults)
print(sorted(zip(newResults.values(), newResults.keys()), reverse=True))
Results is a dictionary of the people (the keys) and their scores (the values) where the values are lists so that:
results = {'Bob':[7],'Jane':[8,9]}
If you're using Python 3.x you can use its statistics library which contains a function mean. Now assuming that your dict looks like: results = {'Bob': [7], 'Jane': [8, 9]} you can create a newResults dict like this:
from statistics import mean
newResults = {key: mean(results[key]) for key in results}
This is called dict comprehension and as you can see it's kinda intuitive. Starting with { you're telling that dict is going to be created. Then with key: value you're defining its structure. Lastly, with for loop you iterate over a collection that will be used for the dict creation. You can achieve the same with:
newResults = {}
for key in results:
newResults[key] = mean(results[key])
You want to sort the dict in the end. Unfortunately it's not possible. You can either create an OrderedDict, which remembers the items insertion order or a list which will contain sorted keys to your dict. The latter will look like:
sortedKeys = sorted(newResults, key=lambda x: newResults[x])
Are there any standard library calls I can use to either perform set operations on two arrays, or implement such logic myself (ideally as functionally and also efficiently as possible)?
Yes, Swift has the Set class.
let array1 = ["a", "b", "c"]
let array2 = ["a", "b", "d"]
let set1:Set<String> = Set(array1)
let set2:Set<String> = Set(array2)
Swift 3.0+ can do operations on sets as:
firstSet.union(secondSet)// Union of two sets
firstSet.intersection(secondSet)// Intersection of two sets
firstSet.symmetricDifference(secondSet)// exclusiveOr
Swift 2.0 can calculate on array arguments:
set1.union(array2) // {"a", "b", "c", "d"}
set1.intersect(array2) // {"a", "b"}
set1.subtract(array2) // {"c"}
set1.exclusiveOr(array2) // {"c", "d"}
Swift 1.2+ can calculate on sets:
set1.union(set2) // {"a", "b", "c", "d"}
set1.intersect(set2) // {"a", "b"}
set1.subtract(set2) // {"c"}
set1.exclusiveOr(set2) // {"c", "d"}
If you're using custom structs, you need to implement Hashable.
Thanks to Michael Stern in the comments for the Swift 2.0 update.
Thanks to Amjad Husseini in the comments for the Hashable info.
Swift Set operations
Example
let a: Set = ["A", "B"]
let b: Set = ["B", "C"]
union of A and B a.union(b)
let result = a.union(b)
var a2 = a
a2.formUnion(b)
//["A", "B", "C"]
symmetric difference of A and B a.symmetricDifference(b)
let result = a.symmetricDifference(b)
//["A", "C"]
difference A \ B a.subtracting(b)
let result = a.subtracting(b)
//["A"]
intersection of A and B a.intersection(b)
let result = a.intersection(b)
//["B"]
Please note that result order depends on hash
[Swift Set]
The most efficient method I know is by using godel numbers. Google for godel encoding.
The idea is so. Suppose you have N possible numbers and need to make sets of them. For example, N=100,000 and want to make sets like {1,2,3}, {5, 88, 19000} etc.
The idea is to keep the list of N prime numbers in memory and for a given set {a, b, c, ...} you encode it as
prime[a]*prime[b]*prime[c]*...
So you encode a set as a BigNumber. The operations with BigNumbers, despite the fact that they are slower than operations with Integers are still very fast.
To unite 2 sets A, B, you take
UNITE(A, B) = lcm(a, b)
lowest-common-multiple of A and B as A and B are sets and both numbers.
To make the intersection you take
INTERSECT(A, B) = gcd (a, b)
greatest common divisor.
and so on.
This encoding is called godelization, you can google for more, all the language of arithmetics written using the logic of Frege can be encoded using numbers in this way.
To get the operation is-member? it is very simple --
ISMEMBER(x, S) = remainder(s,x)==0
To get the cardinal it's a little more complicated --
CARDINAL(S) = # of prime factors in s
you decompose the number S representing the set in product of prime factors and add their exponents. In case the set does not allow duplicates you will have all exponents 1.
There aren't any standard library calls, but you may want to look at the ExSwift library. It includes a bunch of new functions on Arrays including difference, intersection and union.
You may want to follow same pattern as in Objective-C, which also lacks such operations, but there is a simple workaround:
how to intersect two arrays in objective C?