has_any performance tanks after 64 values - azure-data-explorer

I have a query which was performing really well via an optimization that uses has_any to filter on values in a dynamic object.
Suddenly its performance has tanked, and I noticed that this happens when the list has more than 64 values.
Here are some stats with increasing number of values used in the has_any():
limit 10 totalcpu 58s, datascanned 9.1GB
limit 20 totalcpu 63s, datascanned 13.2GB
limit 30 totalcpu 80s, datascanned 17.5GB
limit 40 totalcpu 92s, datascanned 21.8GB
limit 60 totalcpu 124s, datascanned 30.3GB
limit 64 totalcpu 130s, datascanned 32.1GB
limit 65 totalcpu 12412s, datascanned 930GB
limit 70 totalcpu 12263s, datascanned 868GB
limit 80 totalcpu 13410s, datascanned 1.9TB

has_any() internally rewrites itself as regex after certain limit (it is 64 values right now, in future it may grow - but still: a limit will exist).
If you find yourself looking for a specific element in dynamic array - you can try set_intersect() function:
https://learn.microsoft.com/en-us/azure/kusto/query/setintersectfunction
Using this function will check:
... | where array_length(set_intersect(source, lookup_array))>0

Related

How the object size in R are calculated?

> print(object.size(runif(1e6)),unit="Mb")
7.6 Mb
This gives me 7.6Mb for a vector with 1 million elements. But why? each element is 32 bit or 64 bit? I cannot add these numbers up.
They're 64-bit (8-byte) floating point values. One megabyte (Mb) is 2^20 bytes (not 10^6 - see below) ... so ...
8*1e6/(2^20)
[1] 7.629395
Lots of potential for confusion about what Mb means:
according to Wikipedia "MB" is the recommended abbreviation for "megabyte", but R uses "Mb"
there is plenty of confusion about whether "mega" means 10^6 or 2^20 in this context.
As usual, this is clearly documented, deep in the details of ?object.size ...
As illustrated by below tables, the legacy and IEC standards use binary units (multiples of 1024), whereas the SI standard uses decimal units (multiples of 1000) ...
*object size* *legacy* *IEC*
1 1 bytes 1 B
1024 1 Kb 1 KiB
1024^2 1 Mb 1 MiB
Google's conversion appears to use SI units (1 MB = 10^6 bytes) instead.

Growing trees.. Progress: 1%. Estimated remaining time: 1 hour, 4 minutes, 21 seconds

When I run a random forrest using "ranger" with caret package, I receive output like the following while running:
Growing trees.. Progress: 1%. Estimated remaining time: 1 hour, 4
minutes, 21 seconds. Growing trees.. Progress: 2%. Estimated remaining
time: 1 hour, 3 minutes, 43 seconds. Growing trees.. Progress: 4%.
Estimated remaining time: 51 minutes, 28 seconds. Growing trees..
Progress: 5%. Estimated remaining time: 45 minutes, 32 seconds.
Growing trees.. Progress: 7%. Estimated remaining time: 39 minutes, 12
seconds.
Once it gets to 100% it starts again. It's been running a few hours already and is already on it's 5th iteration.
Looking at documentation here it looks like num.trees defaults to 500?
I just want to sanity check I have understood this correctly. If I leave my computer running will it try to do this 500 times? Or have I misunderstood the output?

Unity conversions in transmission delay

I'm currently learning about transmission delay and propagation. I'm really having a tough time with the conversions. I understand how it all works but I cant get through the converting. For example:
8000bits/5mbps(mega bits per second) I have no idea how to do this conversion , I've tried looking online but no one explains how the conversion happens. I'm supposed to get 1.6 ms, but I cannot see how the heck that happens. I tried doing it this way, 8000b / 5x10^6 b/s but that gives me 1600 s.
(because that would not fit in a comment):
8000 bits = 8000 / 1000 = 8 kbit, or 8000 / 1000 / 1000 = 0.008 mbit.
(or 8000 / 1024 = 7.8 Kibit, or 8000 / 1024 / 1024 = 0.0076 Mibit,
see here: https://en.wikipedia.org/wiki/Data_rate_units)
Say you have a throughput of 5mbps (mega bits per second), to transmit your 8000 bits that's:
( 0.008 mbit) / (5 mbit/s) = 0.0016 s = 1.6 ms
That is, unit wise:
bit / (bit/s)
bit divided by bit => the bit unit disappear,
then divide and divide by seconds = not "something per second", but second,
result unit is second.

convert 56 kbps to monthly usage in GB

From my internet connection (SIM card) of 56kbps (unlimited data) what would be total gigabytes of data I can consume provided I was using it continuously?
My basic math:
30 days = 2592000 seconds
56 * 2592000 = 145152000 kb = 141750 MB = 141 GB
Does this calculation make sense?
Your basic maths is good, unfortunately you were tricked by the notations which are unfortunately very confusing in this domain.
1) Lower case b stands for a bit, while capital B is a byte, which is made of 8 bits. So when you get 56 kb/s you actually get 56/8 = 7 kB/s.
This gives you 1814400 kB per month.
2) Now comes the second problem. The definition of what is a kB, a MB or a GB is not uniform. Normally you would expect that there are defined following powers of ten (as in any other science) in which case your 1814400 kB per month would convert into 18144 MB per month or 18.1 GB per month.
However for historical reason MB are sometimes defined as 1024 kB and GB as 1024 MB. In this case you would get 17719 MB per month or 17.3 GB per month.
Which convention you should use depend what you actually want to do with it. But such a small difference is probably irrelevant to you compared to potential fluctuations in the actual transfer rate of your connection.

Profiler shows OpenCL not uses all registers available

Here is the copy of occupancy analysis of my kernel from the NVIDIA Compute Visual Profiler:
Kernel details : Grid size: 300 x 1, Block size: 224 x 1 x 1
Register Ratio = 0.75 ( 24576 / 32768 ) [48 registers per thread]
Shared Memory Ratio = 0 ( 0 / 49152 ) [0 bytes per Block]
Active Blocks per SM = 2 : 8
Active threads per SM = 448 : 1536
Occupancy = 0.291667 ( 14 / 48 )
Achieved occupancy = 0.291667 (on 14 SMs)
Occupancy limiting factor = Registers
Warning: Grid Size (300) is not a multiple of available SMs (14).
I am new to openCL and I did a lot of optimisations to bring down the number of registers used so that 3 concurrent blocks can be launched on a SM. However, the profiler only shows that only 2 blocks can run concurrently and the limit factor is registers. But the problem is that it is obvious that my kernel only uses 224 x 48 = 10752 registers per block and therefore, would be capable of running 3 blocks (i.e. 224 x48 x 3 = 32256 registers / 32768 available registers). The problem still exists when I reduce the number of threads per block to 208 which means it should only uses 208 x 48 x 3 = 29952 / 32768 for 3 blocks...
At first, I think it is because of local memory, but my calculation of local memory shows it should be able to launch 3 blocks / SM. And I dont know why the profiler does not show Shared Memory Ratio although my kernel uses local memory.
Thanks for your help.

Resources