Graphite + Statsd, different values on different time range - graphite

While using StatsD and Graphite, I'm running into problems while viewing the same stats_counts.* metrics value in different time ranges:
As you can see from the graphs above, the same measured data is being viewed differently when picking a bigger time range.
I would've understand the loose of accuracy due to Whisper's storage-aggregation scheme utilised on older data, but this really doesn't explain how recent data is being displayed like it has a different value in different ranges of time.
Just for the record, my schema_aggregation.conf looks like this:
[munin]
pattern = ^munin\..*
xFilesFactor = 0
aggregationMethod = average
[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.upper$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum
[logster]
pattern = ^logster\..*
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average
and my storage-schemas.conf:
[carbon]
pattern = ^carbon\..*
retentions = 60:90d
[munin]
pattern = ^munin\..*
retentions = 10s:6h,1min:7d,10min:5y
[stats]
pattern = ^stats\..*
retentions = 10s:6h,1min:7d,10min:1y
[stats_counts]
pattern = ^stats_counts\..*
retentions = 10s:6h,1min:7d,10min:1y
[logster]
pattern = ^logster\..*
retentions = 60s:12h,10m:1y
# [default_1min_for_1day]
# pattern = .*
# retentions = 60s:1d
Any idea what might be wrong? maybe a configuration which I missed?

It looks like you're running into the regular Graphite behavior where it averages y-values when you have more x-axis data points than pixel in the graph (Since 2h of data has 720 data points). Does that also happen when you view the graph with &width=1000?

Related

How do I apply velocity and acceleration to match the result of math formulas

So I have an initial velocity iv a final velocity fv (that is always 0) a time t and an acceleration variable a
I use these variables to calculate final distance fd
Note: language used here is Kotlin
Note: Formula used for calculating fd and a are not something I came up with
var iv = 10.0 // initial velocity
var fv = 0.0 // final velocity
var t = 8.0 // time
var a = ((fv - iv)/t) // acceleration
var fd: Double = ((iv*t) + (a/2.0*Math.pow(t,2.0)))
I get the result that fd = 40.0
when I try to model this the way I would try to apply it in code.
var d = 0.0 // current distance traveled
var i = 0 // current time elapsed
while (i < t) {
d += v
v += a
i++
}
I end up with the result of d = 45.0 when d should equal fd at the end.
what am I doing wrong in applying velocity and acceleration to velocity so that my results differ from what the mathematical formulas show they should be?
Don't worry about "formulas" - think about the physics.
If you have ever studied calculus and physics you know that:
a = dv/dt // a == acceleration; v == velocity; t == time
v = ds/dt // v == velocity; s == distance; t == time
If you know calculus well enough you can integrate the equation for acceleration twice to get the distance traveled as a function of time:
a(t) = dv/dt = a0
v(t) = ds/dt = a0*t + v0
s(t) = (a0/2)*t^2 + v0*t + s0
You can calculate the constants:
a0 = -1.25 m/sec^s
v0 = 10 m/s
s0 = 0 m
Substituting:
a(t) = -1.25
v(t) = 10 - 1.25*t
s(t) = -0.625*t^2 + 10*t = (10 - 0.625*t)*t
You can also calculate the answer numerically. That's what you're doing with Kotlin.
If you know the initial conditions
a(0), v(0), and s(0)
you can calculate the value at the end of a time increment dt like this:
a(t+dt) = f(t+dt)
v(t+dt) = v(t) + a(t)*dt
s(t+dt) = s(t) + v(t)*dt
Looks like you are assuming that acceleration is constant throughout the time you're interested in.
You don't say what units you're using. I'll assume metric units: length in meters and time in seconds.
You decelerate from an initial velocity of 10 m/sec to a final velocity of 0 m/second over 8 seconds. That means a constant acceleration of -1.25 m/sec^2.
You should be able to substitute values into these equations and get the answers you need.
Do the calculations by hand before you try to code them.

Verify that all edges in a 2D graph are sufficiently far from each other

I have a graph where each node has coordinates in 2D (it's actually a geographic graph, with latitude and longitude.)
I need to verify that if the distance between two edges is less than MAX_DIST then they share a node. Of course, if they intersect, then the distance between them is zero.
The brute force algorithm is trivial, is there a more efficient algorithm?
I was thinking of trying to adapt https://en.wikipedia.org/wiki/Closest_pair_of_points_problem to graph edges (and ignoring pairs of edges with a shared node), but it is not trivial to do so.
I was curios to see how the rtree index idea would perform so I created a small script to test it using two really cool libraries for Python: Rtree and shapely
The snippet generates 1000 segments with 1 < length < 5 and coordinates in the [0, 100] interval, populates the index and then counts the pairs that are closer than MAX_DIST==0.1 (using the classic and the index-based method).
In my tests the index method was around 25x faster using the conditions above; this might vary greatly for your data set but the result is encouraging:
found 532 pairs of close segments using classic method
7.47 seconds for classic count
found 532 pairs of close segments using index method
0.28 seconds for index count
The performance and correctness of the index method depends on how your segments are distributed (how many are close, if you have very long segments, the parameters used).
import time
import random
from rtree import Rtree
from shapely.geometry import LineString
def generate_segments(number):
segments = {}
for i in range(number):
while True:
x1 = random.randint(0, 100)
y1 = random.randint(0, 100)
x2 = random.randint(0, 100)
y2 = random.randint(0, 100)
segment = LineString([(x1, y1), (x2, y2)])
if 1 < segment.length < 5: # only add relatively small segments
segments[i] = segment
break
return segments
def populate_index(segments):
idx = Rtree()
for index, segment in segments.items():
idx.add(index, segment.bounds)
return idx
def count_close_segments(segments, max_distance):
count = 0
for i in range(len(segments)-1):
s1 = segments[i]
for j in range(i+1, len(segments)):
s2 = segments[j]
if s1.distance(s2) < max_distance:
count += 1
return count
def count_close_segments_index(segments, idx, max_distance):
count = 0
for index, segment in segments.items():
close_indexes = idx.nearest(segment.bounds, 10)
for close_index in close_indexes:
if index >= close_index: # do not count duplicates
continue
close_segment = segments[close_index]
if segment.distance(close_segment) < max_distance:
count += 1
return count
if __name__ == "__main__":
MAX_DIST = 0.1
s = generate_segments(1000)
r_idx = populate_index(s)
t = time.time()
print("found %d pairs of close segments using classic method" % count_close_segments(s, MAX_DIST))
print("%.2f seconds for classic count" % (time.time() - t))
t = time.time()
print("found %d pairs of close segments using index method" % count_close_segments_index(s, r_idx, MAX_DIST))
print("%.2f seconds for index count" % (time.time() - t))

magnitude squared coherence in matlab

I am tring to get magnitud squared coherence (MSC) and I am finding some problems.
In theory, the MSC is the result of the crospectra of two signals, devided by the autospectra of each signal.
Therefore, this is my code:
Fs = 1000; % Sampling frequency
T = 1/Fs; % Sampling period
L = length(myData(1,:)); % length of segment
Hz = Fs*(0:(L/2))/L; % frequency vector
dat1 = fft(myData(1,:));
dat2 = fft(myData(2,:));
pow1 = dat1.*conj(dat1); % autospectra signal 1
pow2 = dat2.*conj(dat2); % autospectra signal 2
cpow = abs(dat1.*conj(dat2)).^2; % crosspectra
coh = cpow./(pow1.*pow2); % getting the coherence
coherence = coh(1:L/2+1);
coherence(2:end-1) = coherence(2:end-1); % adjusting length to Nyquest freq
figure;
plot(Hz,coherence)
but the result are only "1" and it does not make much seance, so there must be I mistake but I just can't find it.
Thanks for your help

What's wrong with my Euclidean Distance Calculation? (Julia)

I'm trying to compute the Perceptually Important Points by using three different methods.
Euclidean Distance;
Perpendicular Distance;
Vertical Distance.
Method 2 and 3 gives me the same Point, but Euclidean distance not. Can't find the mistake I made. Hope someone can help me.
pt = 7.6 #pt
_t = 1 #t
ptT = 10.7 #p(t+T)
_T = 253 #t+T
# Distances
dE = Float64[] #Euclidean Distances
dP = Float64[] #Perpendicular Distances
dV = Float64[] #Vertical Distances
xi = Float64[] #x values
for i in 2:length(stockdf[:Price])-1
_de = sqrt((_t - i)^2 + (pt - stockdf[:Price][i])^2) + sqrt((_T - i)^2 + (ptT - stockdf[:Price][i])^2)
push!(dE,_de)
_dP = abs(_s*i+_c-stockdf[:Price][i])/sqrt(_s^2+1)
push!(dP,_dP)
_dV = abs(_s*i+_c-stockdf[:Price][i])
push!(dV,_dV)
push!(xi,i)
end
Both method 2 and 3 give me the max point indexed at 153, but method 1 gives me a point, which is not the max point and is indexed at 230.
Formula for the 3rd PIP with Euclidean Distance is:
dE = sqrt((t-i)^2 + (pt-pi)^2) + sqrt((t+T-i)^2+(pt+T-pi)^2)
EDIT:
For a better understanding I reproduced the code with other variables which you can test for yourself.
xs = Array(1:10)
ys = rand(1:1:10,10)
dde = Float64[]
ddP = Float64[]
ddV = Float64[]
xxi = Float64[]
# Connecting Line of first 2 PIPs
_ss = (ys[end]-ys[1])/10
_cc = ys[1]-(1*(ys[end]-ys[1]))/10
_zz = Float64[]
for i in 1:length(dedf[:Price])
push!(_zz,_ss*i+_cc)
end
for i in 2:length(xs)-1
_dde = sqrt((1-i)^2+(ys[1]-ys[i])) + sqrt((10-i)^2 + (ys[end]- ys[i])^2)
push!(dde,_dde)
_ddP = abs(_ss*i+_cc-ys[i])/sqrt(_ss^2+1)
push!(ddP,_ddP)
_ddV = abs(_ss*i+_cc-ys[i])
push!(ddV,_ddV)
push!(xxi,i)
end
println(dde)
for i in 1:length(dde)
if ddV[i] == maximum(ddV)
println(i)
end
end
For Euclidean Distance I get index 7
for Perpendicular and Vertical Distance I get index 5. Look at the graphs
Euclidean Distance on graph
Perpendicular Distance on graph
EDIT:
I'm working through a book about pattern recognition in financial time series. Now I downloaded the same data, which the book used and the now the results are the same. All of the 3 methods gave me the same index. But with different data sets, method 1 differs from 2 and 3. I don't know why.

Find nearest 3D point

I have two data files, each of them contain a big number of 3-dimensional points (file A stores approximately 50,000 points, file B stores approximately 500,000 points). My goal is to find for every point (a) in file A the point (b) in file B which has the smallest distance to (a). I store the points in two lists like this:
List A nodes:
(ID X Y Z)
[ ['478277', -107.0, 190.5674, 128.1634],
['478279', -107.0, 190.5674, 134.0172],
['478282', -107.0, 190.5674, 131.0903],
['478283', -107.0, 191.9798, 124.6807],
... ]
List B data:
(X Y Z Data)
[ [-28.102, 173.657, 229.744, 14.318],
[-28.265, 175.549, 227.824, 13.648],
[-27.695, 175.925, 227.133, 13.142],
...]
My first approach was to simply iterate through the first and second list with a nested loop and compute the distance between every points like this:
outfile = open(job[0] + '/' + output, 'wb');
dist_min = float(job[5]);
dist_max = float(job[6]);
dists = [];
for node in nodes:
shortest_distance = 1000.0;
shortest_data = 0.0;
for entry in data:
dist = math.sqrt((node[1] - entry[0])**2 + (node[2] - entry[1])**2 + (node[3] - entry[2])**2);
if (dist_min <= dist <= dist_max) and (dist < shortest_distance):
shortest_distance = dist;
shortest_data = entry[3];
outfile.write(node[0] + ', ' + str('%10.5f' % shortest_data + '\n'));
outfile.close();
I recognized that the amount of loops Python has to run is way too big (~25,000,000,000), so I had to fasten my code. I tried to first calculate all distances with list comprehensions but the code still is too slow:
p_x = [row[1] for row in nodes];
p_y = [row[2] for row in nodes];
p_z = [row[3] for row in nodes];
q_x = [row[0] for row in data];
q_y = [row[1] for row in data];
q_z = [row[2] for row in data];
dx = [[(px - qx) for px in p_x] for qx in q_x];
dy = [[(py - qy) for py in p_y] for qy in q_y];
dz = [[(pz - qz) for pz in p_z] for qz in q_z];
dx = [[dxxx * dxxx for dxxx in dxx] for dxx in dx];
dy = [[dyyy * dyyy for dyyy in dyy] for dyy in dy];
dz = [[dzzz * dzzz for dzzz in dzz] for dzz in dz];
D = [[(dx[i][j] + dy[i][j] + dz[i][j]) for j in range(len(dx[0]))] for i in range(len(dx))];
D = [[(DDD**(0.5)) for DDD in DD] for DD in D];
To be honest, at this point, I do not know which of the two approaches is better, anyway, none of the two possibilities seem feasible. I'm not even sure if it is possible to write a code which calculates all distances in an acceptable time. Is there even another way to solve my problem without calculating all distances?
Edit: I forgot to mention that I am running on Python 2.5.1 and am not allowed to install or add any new libraries...
Just in case someone is interrested in the solution:
I found a way to speed up the whole process by not calculating all distances:
I created a 3D-list, representing a grid in the given 3D space, divided in X, Y and Z in a given step size (e.g. (Max. - Min.) / 1,000). Then I iterated over every 3D point to put it into my grid. After that I iterated over the points of set A again, looking if there are points from B in the same cube, if not I would increase the search radius, so the process is looking in the adjacent 26 cubes for points. The radius is increasing until there is at least one point found. The resulting list is comparatively small and can be ordered in short time and the nearest point is found.
The processing time went down to a couple minutes and it is working fine.
p_x = [row[1] for row in nodes];
p_y = [row[2] for row in nodes];
p_z = [row[3] for row in nodes];
q_x = [row[0] for row in data];
q_y = [row[1] for row in data];
q_z = [row[2] for row in data];
min_x = min(p_x + q_x);
min_y = min(p_y + q_y);
min_z = min(p_z + q_z);
max_x = max(p_x + q_x);
max_y = max(p_y + q_y);
max_z = max(p_z + q_z);
max_n = max(max_x, max_y, max_z);
min_n = min(min_x, min_y, max_z);
gridcount = 1000;
step = (max_n - min_n) / gridcount;
ruler_x = [min_x + (i * step) for i in range(gridcount + 1)];
ruler_y = [min_y + (i * step) for i in range(gridcount + 1)];
ruler_z = [min_z + (i * step) for i in range(gridcount + 1)];
grid = [[[0 for i in range(gridcount)] for j in range(gridcount)] for k in range(gridcount)];
for node in nodes:
loc_x = self.abatemp_get_cell(node[1], ruler_x);
loc_y = self.abatemp_get_cell(node[2], ruler_y);
loc_z = self.abatemp_get_cell(node[3], ruler_z);
if grid[loc_x][loc_y][loc_z] is 0:
grid[loc_x][loc_y][loc_z] = [[node[1], node[2], node[3], node[0]]];
else:
grid[loc_x][loc_y][loc_z].append([node[1], node[2], node[3], node[0]]);
for entry in data:
loc_x = self.abatemp_get_cell(entry[0], ruler_x);
loc_y = self.abatemp_get_cell(entry[1], ruler_y);
loc_z = self.abatemp_get_cell(entry[2], ruler_z);
if grid[loc_x][loc_y][loc_z] is 0:
grid[loc_x][loc_y][loc_z] = [[entry[0], entry[1], entry[2], entry[3]]];
else:
grid[loc_x][loc_y][loc_z].append([entry[0], entry[1], entry[2], entry[3]]);
out = [];
outfile = open(job[0] + '/' + output, 'wb');
for node in nodes:
neighbours = [];
radius = -1;
loc_nx = self.abatemp_get_cell(node[1], ruler_x);
loc_ny = self.abatemp_get_cell(node[2], ruler_y);
loc_nz = self.abatemp_get_cell(node[3], ruler_z);
reloop = True;
while reloop:
if neighbours:
reloop = False;
radius += 1;
start_x = 0 if ((loc_nx - radius) < 0) else (loc_nx - radius);
start_y = 0 if ((loc_ny - radius) < 0) else (loc_ny - radius);
start_z = 0 if ((loc_nz - radius) < 0) else (loc_nz - radius);
end_x = (len(ruler_x) - 1) if ((loc_nx + radius + 1) > (len(ruler_x) - 1)) else (loc_nx + radius + 1);
end_y = (len(ruler_y) - 1) if ((loc_ny + radius + 1) > (len(ruler_y) - 1)) else (loc_ny + radius + 1);
end_z = (len(ruler_z) - 1) if ((loc_nz + radius + 1) > (len(ruler_z) - 1)) else (loc_nz + radius + 1);
for i in range(start_x, end_x):
for j in range(start_y, end_y):
for k in range(start_z, end_z):
if not grid[i][j][k] is 0:
for grid_entry in grid[i][j][k]:
if not isinstance(grid_entry[3], basestring):
neighbours.append(grid_entry);
dists = [];
for n in neighbours:
d = math.sqrt((node[1] - n[0])**2 + (node[2] - n[1])**2 + (node[3] - n[2])**2);
dists.append([d, n[3]]);
dists = sorted(dists);
outfile.write(node[0] + ', ' + str(dists[0][-1]) + '\n');
outfile.close();
Function to get the position of a point:
def abatemp_get_cell(self, n, ruler):
for i in range(len(ruler)):
if i >= len(ruler):
return False;
if ruler[i] <= n <= ruler[i + 1]:
return i;
The gridcount variable gives one the chance to fasten the process, with a small gridcount the process of sorting the points into the grid is very fast, but the lists of neighbours in the search loop gets bigger and more time is needed for this part of the process. With a big gridcount more time is needed at the beginning, however the loop runs faster.
The only issue I have now is the fact, that there are cases when the process found neighbours but there are other points, which are not yet found, but are closer to the point (see picture). So far I solved this issue by incrementing the search radius another time when there are already neigbours. And still then I have points which are closer but not in the neighbours list, although it's a very small amount (92 out of ~100,000). I could solve this problem by increment the radius two times after finding neighbours, but this solution seems not very smart. Maybe you guys have an idea...
This is the first working draft of the process, I think it will be possible to improve it even more, just to give you an idea of how it is working...
It took me a bit of thought but at the end I think I found a solution for you.
Your problem is not in the code you wrote but in the algorithm it implements.
There is an algorithm called Dijkstra's algorithm and here is the gist of it: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm .
Now what you need to do is to use this algorithm in a clever way:
create a node S (stand for source).
Now link edges from S to all the nodes in B group.
After you done that you should link edges from each point b in B to each point a in A.
You should set the cost of the links from the source to 0 and the other to the distance between 2 points (only in 3D).
Now if we will use Dijkstra's algorithm the output we will get would be the cost to travel from S to each point in the graph (we are only interested in the distance to points in group A).
So since the cost is 0 to each point b in B and S is only connected to points in B so the road to any point a in A must include a node in B (actually exactly one since the shortest distance between to points is a single line).
I am not sure if this will fasten your code but as far as I know, a way to solve this problem without calculating all distances does not exist and this algorithm is the best time complexity one could hope for.
take a look at this generic 3D data structure:
https://github.com/m4nh/skimap_ros
it has a very fast RadiusSearch feature just ready to be used. This solution (similar to Octree but faster) avoids to you to create the Regular Grid first (you don't have to fix MAX/MIN size along each axis) and you save a lot of memory

Resources