Graphite: append a "current time" point to the end of a series - graphite

I have a "succeeded" metric that is just the timestamp. I want to see the time between successive successes (this is how long the data is stale for). I have
derivative(Success)
but I also want to know how long between the last success time and the current time. since derivative transforms xs[n] to xs[n+1] - xs[n], the "last" delta doesn't exist. How can I do this? Something like:
derivative(append(Success, now()))
I don't see any graphite functions for appending series, and I don't see any user-defined graphite functions.
The general problem is to be alerted when the data is stale, via graphite monitoring. There may be a better solution than the one I'm thinking about.

identity is a function whose value at any given time is the timestamp of that time.
keepLastValue is a function that takes a series and replicates data points forward over gaps in the data.
So then diffSeries(identity("now"), keepLastValue(Success)) will be a "sawtooth" series that climbs steadily while Success isn't updated, and jumps down to zero (or close to it — there might be some time skew) every time Success has a data point. If you use graphite monitoring to get the current value of that expression and compare it to some threshold, it will probably do what you want.

Related

Reconstruct time series from readings of EMAs and EMAs on EMAs

I received readings of multiple multiple filters for a time series x. Those filters are differences of EMAs and EMAs on EMAs. My filters have been calculated as
F1 = EMA(x, period.short) - EMA(x, period.long) and
F2 = ZMA(x, period.short) - ZMA(x, period.long).
ZMA is a delagged EMA which is calculated as EMA(EMA(x, period), period). Unfortunately the original values of the time series x have been lost. Thus, I wonder whether it is possible to somehow reconstruct the original time series. I suppose that an exact reconstruction won't be possible, however to get an glimpse of an idea I'd be already happy to get one possible time series history for the filter values of identical time points.
Therefor I wonder how such a reconstruction could be done in an efficient way and how it could be implemented (if possible in R). Maybe there are already some packages available for such a purpose?!
Thus, I'd be happy if you'd have any advice. Many thanks in advance!

RRDTOOL one second logging, values missing

I spent more than two months with RRDTOOL to find out how to store and visualize data on graph. I'm very close now to my goal, but for some reason I don't understand why it is happening that some data are considered to be NaN in my case.
I counting lines in gigabytes sized of log files and have feeding the result to an rrd database to visualize events occurrence. The stepping of the database is 60 seconds, the data is inserted in seconds base whenever it is available, so no guarantee the the next timestamp will be withing the heartbeat or within the stepping. Sometimes no data for minutes.
If have such big distance mostly my data is considered to be NaN.
b1_5D.rrd
1420068436:1
1420069461:1
1420073558:1
1420074583:1
1420076632:1
1420077656:1
1420079707:1
1420080732:1
1420082782:1
1420083807:1
1420086881:1
1420087907:1
1420089959:1
1420090983:1
1420094055:1
1420095080:1
1420097132:1
1420098158:1
1420103284:1
1420104308:1
1420107380:1
1420108403:1
1420117622:1
1420118646:1
1420121717:1
1420122743:1
1420124792:1
1420125815:1
1420131960:1
1420134007:1
1420147326:1
1420148352:1
rrdtool create b1_4A.rrd --start 1420066799 --step 60 DS:Value:GAUGE:120:0:U RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:10:1008 RRA:AVERAGE:0.5:30:1440 RRA:AVERAGE:0.5:360:1460
The above gives me an empty graph for the input above.
If I extend the heart beat, than it will fill the time gaps with the same data. I've tried to insert zero values, but that will average out the counts and bring results in mils.
Maybe I taking something wrong regarding RRDTool.
It would be great if someone could explain what I doing wrong.
Thank you.
It sounds as if your data - which is event-based at irregular timings - is not suitable for an RRD structure. RRD prefers to have its data at constant, regular intervals, and will coerce the incoming data to match its requirements.
Your RRD is defined to have a 60s step, and a 120s heartbeat. This means that it expects one sample every 60s, and no further apart than 120s.
Your DS is a gauge, and so the values you enter (all of them '1' in your example) will be the values stored, after any time normalisation.
If you increase the heartbeat, then a value received within this time will be used to make a linear approximation to fill in all samples since the last one. This is why doing so fills the gaps with the same data.
Since your step is 60s, the smallest sample time sidth will be 1 minute.
Since you are always storing '1's, your graph will therefore either show '1' (when the sample was received in the heartbeart window) or Unknown (when the heartbeat expired).
In other words, your graph is showing exactly what you gave it. You data are being coerced into a regular set of numerical values at a 1-minute step, each being 1 or Unknown.

R - Cluster x number of events within y time period

I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
https://dl.dropboxusercontent.com/u/16400709/StackOverflow/DataStack.csv
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.

GeoServer: extract raster data of a time period

is there a way to use WMS->GetFeatureInfo specifying a TIME period (eg: 2014-01-01/2014-03-01) to extract a series of values from a raster layer loaded from a GeoServer instance?
Thanks in advance
Not at the moment, no. It may be added in the future though, it's not the first time I hear this request. I don't have a ETA, it depends on when funding to work on it shows up.
In the meantime, a somewhat complex workaround might be to configure the image mosaic index as a WFS feature type, query it by date, figure out the exact time values intersected by the interval, and then do N GetFeatureInfo requests, one for each of those values.

Graphite does not graph values correctly when using long durations?

I'm trying to graph data using statsd and graphite. I have a simple counter, I increment it by 1, and then when I graph the values for the counter over the day, I see strange values like 0.09 as the peak in my graph (see http://i.stack.imgur.com/o4gmz.png)
This graph should be showing 2 logins, but instead it's showing 0.09. If I change the time scale from 1 day to the last 15 minutes, then it correctly shows the two logins (see http://i.stack.imgur.com/23vDJ.png)
I've set up my finest retention to be in 10s increments in storage-schemas.conf:
retentions = 10s:7d,1m:21d,24h:5y
I've set up my storage-aggregation.conf file to sum counts:
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
(And, before you ask, yes; this is a .count).
If I try my URL with &rawData=true then in either case I see some Nones, some 0.0s, and a pair of 1.0s separated by some 0.0s. I never see these fractional values that somehow show up on the graph. So... Is this a bug? Am I doing something wrong?
There's also consolidateBy function which tells graphite what to do if there's no enough pixels to draw everything accurately. By default it's using "avg" function and therefore strange results when time ranges are greater. Here excerpt from documentation:
When a graph is drawn where width of the graph size in pixels is
smaller than the number of datapoints to be graphed, Graphite
consolidates the values to to prevent line overlap. The
consolidateBy() function changes the consolidation function from the
default of ‘average’ to one of ‘sum’, ‘max’, or ‘min’. This is
especially useful in sales graphs, where fractional values make no
sense and a ‘sum’ of consolidated values is appropriate.
Another function that could be useful is hitcount. Short excerpt from here why it's useful:
This function is like summarize(), except that it compensates
automatically for different time scales (so that a similar graph
results from using either fine-grained or coarse-grained records) and
handles rarely-occurring events gracefully.
I spent some time scratching my head why I get fractions for my counter with time ranges longer than couple hours when my aggregation rule is max. It's pretty confusing, especially at the beginning when you play with single counters to see if everything works. Checking rawData is quite a good way for debugging sanity check ;)

Resources