GroupBy within bucket and get max count within that interval - kibana

I have an index that outputs objects and status of that object every 30 seconds. Number of objects remain constant each 30 seconds, but their state can change. I'm trying to generate a graph using timelion of the count of objects in a specific state.
This works fine when my interval is less than 30 seconds, but once they start getting bucketed together, I get double (or more) counts of the items. So I'm trying to get a max count of items in a given state within each interval. I'm not sure if this is possible in timelion.
t = 0: object1 = A, object2 = A, object3 = B, object4=A (so objects in state A would be 3)
t = 30: object1 = A, object2 = B, object3 = B, object4=A (objects in A would be 2)
t = 60: object1 = A, object2 = B, object3 = B, object4=A (objects in A would be 2)
t = 90: object1 = B, object2 = B, object3 = B, object4=A (objects in A would be 1)
With the above set, I would get a count of 3 the first publish, then 2, then 2 and then 1. If I chart a count of this with a 1m interval, it would count 5 (3 + 2) and then 3 (2 + 1), but I want the max of any publish within that hour, so I would want 3 and then 2.
Is there any way I can do this in timelion?

Related

The game with the marbles

Problem: There are R red marbles, G green marbles and B blue marbles (R≤G≤B) Count the number of ways to arrange them in a straight line so that the two marbles next to each other are of different colors.
For example, R=G=B=2, the answer is 30.
I have tried using recursion and of course TLE:
Define r(R,B,G) to be the number of ways of arranging them where the first marble is red. Define b(R,B,G),g(R,B,G) respectively.
Then r(R, B, G) = b(R-1,B,G) + g(R-1,B,G)
And the answer is r(R,B,G) + b(R,B,G) + g(R,B,G)
But we can see that r(R, B, G) = b(B, R, G) ...
So, we just need a function f(x,y,z)=f(y,x−1,z)+f(z,x−1,y)
And the answer is f(x,y,z) + f(y,z,x) + f(z,x,y).
The time limit is 2 seconds.
I don't think dynamic is not TLE because R, G, B <= 2e5
Some things to limit the recursion:
If R>G+B+1, then there is no way to avoid having 2 adjacent reds. (Similar argument for G>R+B+1 & B>R+G+1.)
If R=G+B+1, then you alternate reds with non-reds, and your problem is reduced to how many ways you can arrange G greens and B blacks w/o worrying about adjacency (and should thus have a closed-form solution). (Again, similar argument for G=R+B+1 and B=R+G+1.)
You can use symmetry to cut down the number of recursions.
For example, if (R, G, B) = (30, 20, 10) and the last marble was red, the number of permutations from this position is exactly the same as if the last marble was blue and (R, G, B) = (10, 20, 30).
Given that R ≤ G ≤ B is set as a starting condition, I would suggest keeping this relationship true by swapping the three values when necessary.
Here's some Python code I came up with:
memo = {}
def marble_seq(r, g, b, last):
# last = colour of last marble placed (-1:nothing, 0:red, 1:green, 2:blue)
if r == g == b == 0:
# All the marbles have been placed, so we found a solution
return 1
# Enforce r <= g <= b
if r > g:
r, g = g, r
last = (0x201 >> last * 4) & 0x0f # [1, 0, 2][last]
if r > b:
r, b = b, r
last = (0x012 >> last * 4) & 0x0f # [2, 1, 0][last]
if g > b:
g, b = b, g
last = (0x120 >> last * 4) & 0x0f # [0, 2, 1][last]
# Abort if there are too many marbles of one colour
if b>r+g+1:
return 0
# Fetch value from memo if available
if (r,g,b,last) in memo:
return memo[(r,g,b,last)]
# Otherwise check remaining permutations by recursion
result = 0
if last != 0 and r > 0:
result += marble_seq(r-1,g,b,0)
if last != 1 and g > 0:
result += marble_seq(r,g-1,b,1)
if last != 2 and b > 0:
result += marble_seq(r,g,b-1,2)
memo[(r,g,b,last)] = result
return result
marble_seq(50,60,70,-1) # Call with `last` set to -1 initially
(Result: 205435997562313431685415150793926465693838980981664)
This probably still won't work fast enough for values up to 2×105, but even with values in the hundreds, the results are quite enormous. Are you sure you stated the problem correctly? Perhaps you're supposed to give the results modulo some prime number?

Verify that all edges in a 2D graph are sufficiently far from each other

I have a graph where each node has coordinates in 2D (it's actually a geographic graph, with latitude and longitude.)
I need to verify that if the distance between two edges is less than MAX_DIST then they share a node. Of course, if they intersect, then the distance between them is zero.
The brute force algorithm is trivial, is there a more efficient algorithm?
I was thinking of trying to adapt https://en.wikipedia.org/wiki/Closest_pair_of_points_problem to graph edges (and ignoring pairs of edges with a shared node), but it is not trivial to do so.
I was curios to see how the rtree index idea would perform so I created a small script to test it using two really cool libraries for Python: Rtree and shapely
The snippet generates 1000 segments with 1 < length < 5 and coordinates in the [0, 100] interval, populates the index and then counts the pairs that are closer than MAX_DIST==0.1 (using the classic and the index-based method).
In my tests the index method was around 25x faster using the conditions above; this might vary greatly for your data set but the result is encouraging:
found 532 pairs of close segments using classic method
7.47 seconds for classic count
found 532 pairs of close segments using index method
0.28 seconds for index count
The performance and correctness of the index method depends on how your segments are distributed (how many are close, if you have very long segments, the parameters used).
import time
import random
from rtree import Rtree
from shapely.geometry import LineString
def generate_segments(number):
segments = {}
for i in range(number):
while True:
x1 = random.randint(0, 100)
y1 = random.randint(0, 100)
x2 = random.randint(0, 100)
y2 = random.randint(0, 100)
segment = LineString([(x1, y1), (x2, y2)])
if 1 < segment.length < 5: # only add relatively small segments
segments[i] = segment
break
return segments
def populate_index(segments):
idx = Rtree()
for index, segment in segments.items():
idx.add(index, segment.bounds)
return idx
def count_close_segments(segments, max_distance):
count = 0
for i in range(len(segments)-1):
s1 = segments[i]
for j in range(i+1, len(segments)):
s2 = segments[j]
if s1.distance(s2) < max_distance:
count += 1
return count
def count_close_segments_index(segments, idx, max_distance):
count = 0
for index, segment in segments.items():
close_indexes = idx.nearest(segment.bounds, 10)
for close_index in close_indexes:
if index >= close_index: # do not count duplicates
continue
close_segment = segments[close_index]
if segment.distance(close_segment) < max_distance:
count += 1
return count
if __name__ == "__main__":
MAX_DIST = 0.1
s = generate_segments(1000)
r_idx = populate_index(s)
t = time.time()
print("found %d pairs of close segments using classic method" % count_close_segments(s, MAX_DIST))
print("%.2f seconds for classic count" % (time.time() - t))
t = time.time()
print("found %d pairs of close segments using index method" % count_close_segments_index(s, r_idx, MAX_DIST))
print("%.2f seconds for index count" % (time.time() - t))

Number of action per year. Combinatorics question

I'm writing a diploma about vaccines. There is a region, its population and 12 month. There is an array of 12 values from 0 to 1 with step 0.01. It means which part of population should we vaccinate in every month.
For example if we have array = [0.1,0,0,0,0,0,0,0,0,0,0,0]. That means that we should vaccinate 0.1 of region population only in first month.
Another array = [0, 0.23,0,0,0,0,0,0, 0.02,0,0,0]. It means that we should vaccinate 0.23 of region population in second month and 0.02 of region population in 9th month.
So the question is: how to generate (using 3 loops) 12(months) * 12(times of vaccinating) * 100 (number of steps from 0 to 1) = 14_400 number of arrays that will contain every version of these combinations.
For now I have this code:
for(int month = 0;month<12;month++){
for (double step = 0;step<=1;step+=0.01){
double[] arr = new double[12];
arr[month] = step;
}
}
I need to add 3d loop that will vary number of vaccinating per year.
Have no idea how to write it.
Idk if it is understandable.
Hope u get it otherwise ask me, please.
You have 101 variants for the first month 0.00, 0.01..1.00
And 101 variants for the second month - same values.
And 101*101 possible combinations for two months.
Continuing - for all 12 months you have 101^12 variants ~ 10^24
It is not possible to generate and store so many combinations (at least in the current decade)
If step is larger than 0.01, then combination count might be reliable. General formula is P=N^M where N is number of variants per month, M is number of months
You can traverse all combinations representing all integers in range 0..P-1 in N-ric numeral system. Or make digit counter:
fill array D[12] with zeros
repeat
increment element at the last index by step value
if it reaches the limit, make it zero
and increment element at the next index
until the first element reaches the limit
It is similar to counting 08, 09, here we cannot increment 9, so make 10 and so on
s = 1
m = 3
mx = 3
l = [0]*m
i = 0
while i < m:
print([x/3 for x in l])
i = 0
l[i] += s
while (i < m) and l[i] > mx:
l[i] = 0
i += 1
if i < m:
l[i] += s
Python code prints 64 ((mx/s+1)^m=4^3) variants like [0.3333, 0.6666, 0.0]

Create logical ordered queue list from percentage value

This is math task. I need to create an order list (or queue list) from x values - everyone is percent and sum of all of them is 100. I want logical order of these values. Let see this:
a = 50,
b = 25,
c = 15,
d = 10
The greatest common divisor of these numbers is 5, so the queue should has length 100/5 = 20. And the result should look like this (or very similar):
a, b, a, b, a, c, a, b, a, d, a, c, a, b, a, c, a, b, a, d
I'm looking for formula how to this order. Thanks in advance.
I take it that you'd like to distribute each letter as uniformly as possible throughout the array or string. The preliminary step of finding the greatest common divisor and dividing the numbers 50,25,15,10 by it is straightforward. Once this is done, you get the number of times each letter must appear. Then the algorithm can be: beginning with the empty string, add the "most underrepresented" letter, repeat. I define "most underrepresented" as the one with the maximal difference of (desired proportion) - (actual proportion so far).
Here is this algorithm implemented in Python.
count = {'a': 10, 'b': 5, 'c': 3, 'd': 2}
length = sum(count.values())
str = ''
while len(str) < length:
deficit = {}
for char in count:
deficit[char] = count[char]/length - (str.count(char)/len(str) if str else 0)
str += max(deficit, key=deficit.get)
print(str)
The output is abcadabacabadabacaba. Split by letter to show the distribution:
a..a.a.a.a.a.a.a.a.a
.b....b...b...b...b.
..c.....c.......c...
....d.......d.......

Calculate if trend is up, down or stable

I'm writing a VBScript that sends out a weekly email with client activity. Here is some sample data:
a b c d e f g
2,780 2,667 2,785 1,031 646 2,340 2,410
Since this is email, I don't want a chart with a trend line. I just need a simple function that returns "up", "down" or "stable" (though I doubt it will ever be perfectly stable).
I'm terrible with math so I don't even know where to begin. I've looked at a few other questions for Python or Excel but there's just not enough similarity, or I don't have the knowledge, to apply it to VBS.
My goal would be something as simple as this:
a b c d e f g trend
2,780 2,667 2,785 1,031 646 2,340 2,410 ↘
If there is some delta or percentage or other measurement I could display that would be helpful. I would also probably want to ignore outliers. For instance, the 646 above. Some of our clients are not open on the weekend.
First of all, your data is listed as
a b c d e f g
2,780 2,667 2,785 1,031 646 2,340 2,410
To get a trend line you need to assign a numerical values to the variables a, b, c, ...
To assign numerical values to it, you need to have little bit more info how data are taken. Suppose you took data a on 1st January, you can assign it any value like 0 or 1. Then you took data b ten days later, then you can assign value 10 or 11 to it. Then you took data c thirty days later, then you can assign value 30 or 31 to it. The numerical values of a, b, c, ... must be proportional to the time interval of the data taken to get the more accurate value of the trend line.
If they are taken in regular interval (which is most likely your case), lets say every 7 days, then you can assign it in regular intervals a, b, c, ... ~ 1, 2, 3, ... Beginning point is entirely your choice choose something that makes it very easy. It does not matter on your final calculation.
Then you need to calculate the slope of the linear regression which you can find on this url from which you need to calculate the value of b with the following table.
On first column from row 2 to row 8, I have my values of a,b,c,... which I put 1,2,3, ...
On second column, I have my data.
On third column, I multiplied each cell in first column to corresponding cell in second column.
On fourth column, I squared the value of cell of first column.
On row 10, I added up the values of the above columns.
Finally use the values of row 10.
total_number_of_data*C[10] - A[10]*B[10]
b = -------------------------------------------
total_number_of_data*D[10]-square_of(A[10])
the sign of b determines what you are looking for. If it's positive, then it's up, if it's negative, then it's down, and if it's zero then stable.
This was a huge help! Here it is as a function in python
def trend_value(nums: list):
summed_nums = sum(nums)
multiplied_data = 0
summed_index = 0
squared_index = 0
for index, num in enumerate(nums):
index += 1
multiplied_data += index * num
summed_index += index
squared_index += index**2
numerator = (len(nums) * multiplied_data) - (summed_nums * summed_index)
denominator = (len(nums) * squared_index) - summed_index**2
if denominator != 0:
return numerator/denominator
else:
return 0
val = trend_value([2781, 2667, 2785, 1031, 646, 2340, 2410])
print(val) # -139.5
in python:
def get_trend(numbers):
rows = []
total_numbers = len(numbers)
currentValueNumber = 1
n = 0
while n < len(numbers):
rows.append({'row': currentValueNumber, 'number': numbers[n]})
currentValueNumber += 1
n += 1
sumLines = 0
sumNumbers = 0
sumMix = 0
squareOfs = 0
for k in rows:
sumLines += k['row']
sumNumbers += k['number']
sumMix += k['row']*k['number']
squareOfs += k['row'] ** 2
a = (total_numbers * sumMix) - (sumLines * sumNumbers)
b = (total_numbers * squareOfs) - (sumLines ** 2)
c = a/b
return c
trendValue = get_trend([2781,2667,2785,1031,646,2340,2410])
print(trendValue) # output: -139.5

Resources