Graphite: show change from previous value - graphite

I am sending Graphite the time spent in Garbage Collection (getting this from jvm via jmx). This is a counter that increases. Is their a way to have Graphite graph the change every minute so I can see a graph that shows time spent in GC by minute?

You should be able to turn the counter into a hit-rate with the Derivative function, then use the summarize function to the counter into the time frame that your after.
&target=summarize(derivative(java.gc_time), "1min") # time spent per minute
derivative(seriesList)
This is the opposite of the integral function. This is useful for taking a
running totalmetric and showing how many requests per minute were handled.
&target=derivative(company.server.application01.ifconfig.TXPackets)
Each time you run ifconfig, the RX and TXPackets are higher (assuming there is network traffic.)
By applying the derivative function, you can get an idea of the packets per minute sent or received, even though you’re only recording the total.
summarize(seriesList, intervalString, func='sum', alignToFrom=False)
Summarize the data into interval buckets of a certain size.
By default, the contents of each interval bucket are summed together.
This is useful for counters where each increment represents a discrete event and
retrieving a “per X” value requires summing all the events in that interval.
Source: http://graphite.readthedocs.org/en/0.9.10/functions.html

Related

AWS DynamoDB matrix doesn't work correctly

I am using dynamoDB and curious of this
When I see table matrix with period of 10, 30 seconds, it seems that it exceeds provisioned value
Then, when I see table matrix with period of 1 minutes, it doesn't reach provisioned value at all
I want to know why this happens.
This isn't a DynamoDB thing. It's a CloudWatch rendering thing.
Consumed capacity has a natural period of 1 minute. When you set your graph period to 1 minute everything is correct. Consumed is below provisioned.
If you change the graph period to 30 seconds, your consumed view adjusts and you see consumption that's double what's real. The math behind the scenes divides by the wrong period. Graph period of 10 seconds, you get 6x reality. Graph period of 5 minutes, you get 1/5th reality.
The Provisioned line isn't based on an equation involving Period so it's not affected by the chosen period.
Maybe someone can comment on why the user is allowed to control the Period of the view but it just messes things up when it doesn't match the natural period of the data.

Averaging when events are slower than measurement time

I'm trying to come up with a better way of providing an instantaneous average when input signal is very slow. This seems like a math-y kinda question so if it should be over there let me know.
I have events that are measured as a pulse. Normally I can collect the pulses in a counter and then read the counter value at a fixed interval, say 1/4 second. I can then take the count value and divide by the number of seconds so n/0.25 and get a rate. I then apply a low pass filter to clean up the average and that works great normally.
What do I do when the events happen once every 1-60 seconds? The obvious choice is to wait until I have a sufficient number of counts and divide by total time. However, I need to provide the user with a reading every few seconds so waiting is not an option. I need some way to estimate the value.
I've thought of one solution that's kinda hard to explain. I was wondering if there was a standard way of doing this. I'm pretty sure I have to utilize a different kind of "data," the lack of an event. The goal is to estimate until enough time/events have passed to really calculate a rate and to transition from estimate to real rate seamlessly.

How do you sum a statsd counter over a large time range with correct values?

Background
A basic use case for statsd & grafana is learning how many times a function has been called over a time range-- whether that is "last 6h", "since beginning of today", "since beginning of time", etc.
What I'm struggling to find is the correct function to achieve this. I'm using a hosted solution; however, I can confirm that data is being flushed from StatsD to Graphite in 10s intervals.
Current Setup
StatsD Flush: 10s
Graph Function: hitcount(counters.login.employer.count, "10seconds")
Time Range: 24h
Problem
When using hitcount(counters.login.employer.count, "10seconds"), the data returned is incorrect. In fact, I can do 24h, 23h, 22h, and note the values are actually increasing.
I've performed all testing here in a controlled environment, only my machine is sending metrics to StatsD. This is not yet in production code.
Any idea what could be going on here?
The way counters work is that on each interval the value of the counter is sent to graphite and reset in statsd, so what you're looking for is the sum of the series.
You can do that using consolidateBy('sum') combined with maxDataPoints=1.
Be aware that if your series is being aggregated in graphite you'll need to make sure that the aggregation is by sum, otherwise when values get rolled up from the individual values reported by statsd into aggregated buckets they'll be averaged, and your sum won't work across longer intervals. You can read more about configuring aggregation in Graphite here.

graphite: how to get per-second metrics from batch metrics?

I'm trying to measure a online mini-batch processing system with a per-second metrics (total query per second). For every batch, a metric (e.g. "stats.gauges.<host>.query.count") will be send to graphite. batches are processed in several different hosts in parallel and a batch of data take about 5 seconds to process.
I've tried:
simply sum series: sumSeries(stats.gauges.*.query.count),
the result metrics is many times greater than the actual value;
scale to 1 second:
scaleToSeconds(sumSeries(stats.gauges.*.query.count),
1), the result metrics is much less than the actual value;
integral then derivative: nonNegativeDerivative(sumSeries(integral(stats.gauges.*.query.count))), same as the first case ...
send gauges with
delta=True param, then derivative. the result is about 20% greater
than the actual value
so, how to get per-second metrics from batch metrics? what is the best practice?
You should use carbon-aggregator service to add several metrics together as they come in. There is an example which fits your case at http://graphite.readthedocs.io/en/latest/config-carbon.html#aggregation-rules-conf
As your batch takes 5 secs to process, frequency should be 5 to buffer all the metrics. After five seconds, aggregator will sum them up and write to graphite.

How to intrepret values of the ASP.NET Requests/sec performance counter?

If I monitor ASP.NET Requests/sec performance counter every N seconds, how should I interpret the values? Is it number of requests processed during sample interval divided by N? or is it current requests/sec regardless the sample interval?
It is dependent upon your N number of seconds value. Which coincidentally is also why the performance counter will always read 0 on its first nextValue() after being initialized. It makes all of its calculations relatively from the last point at which you called nextValue().
You may also want to beware of calling your counters nextValue() function at intervals less than a second as it can tend to produce some very outlandish outlier results. For my uses, calling the function every 5 seconds provides a good balance between up to date information, and smooth average.

Resources