Graphite not aggregating data - graph

I'm using Graphite and Carbon-cache and trying to understand why it doesn't appear to be applying aggregation to data.
I have a whipser database:
whisper-create.py /opt/graphite/storage/whisper/test/test.wsp 60:1y
From the metadata I am using an average aggregation method:
Meta data:
aggregation method: average
max retention: 31536000
xFilesFactor: 0.5
And I am writing two values to it:
echo "test.test 1 `date +%s`" | nc localhost 2003;
echo "test.test 100 `date +%s`" | nc localhost 2003;
When I look at my whisper databse I see the following value:
42: 1395315780, 100
I would have expected this value to be 100+1 / 2 = 50.5
It appears to be using the last value, rather than an average of the two values.
I feel like I may be missing something here. Could anyone explain?

The answer was to use the carbon-aggregator, not the carbon-cache.
The carbon-cache will always replace the value, no matter what. If time per point is 1 second and you send more than one value within a second then the last value is what will be stored.
If you want more than one value to be kept you need to use the carbon-aggregator (running on a different port) and configure how it should aggregate the data (sum, average).

I had the same problem and no access on the graphite / whisper settings.
There is another solution. Aggregate data externally and the send it to graphite data port.
https://github.com/floringavrila/graphite-feeder

Related

How do I create accumulated bandwidth usages in RRDtool (ie. GB's per month down)?

The following data comes from a mobile phone provider, it's a list of kb's downloaded at a certain time, usually on a per minute basis.
It's not the average, not the max, but the total of that time interval, which allows to track the data consumption precisely. These graphs were made with PIL, and instead of showing spikes to indicate a large data consumption, large steps can be seen, which is much more revealing, because it doesn't just tell "much happened here", but "exactly this much happened here". For example second graph Sat 10 at night 100mb. A rate-change graph wouldn't be as informative.
I'm also trying to find a way to do this with rrd.
I was mislead when using the COUNTER to track my networks data usage into thinking that I would be able to precisely compute the monthly/weekly accumulated data usage, but now it turned out to be a wrong assumption.
How I store my data in rrd in order to be able to easily generate graphs like below? Would that be by using ABSOLUTE and before updating it I would subtract the previous insertion value? Would that be precise down to the byte when checking the monthly usage?
You can add up all the value in your chart quite easily:
CDEF:sum=data,$step_width,*,PREV,ADDNAN
if your chart covers just one month, that should be all you have todo. If you want to have it cover multiple months, you will have to use a combination of IF and TIME operators to reset the line to 0 at the start of the month.
Version 1.5.4 will contain an additional operator called STEPWIDTH, which pushes the step width onto the stack, making this even simpler.
This is a common question which very few answers online but I first encountered a method to do this with RRD in 2009.
The DS type to use is a GAUGE and in your update script manually handle resetting the GAUGE to 0 at the start of the month for monthly usage graphs.
Then came along RRDTool's ' mrtg-traffic-sum ' package.
More recently I've had to monitor both traffic bandwidth and traffic volume so I created a standard RRD for that first and confirmed that was working.
So with the bandwidth being sampled (captured to the RRD), then use the mrtg-traffic-sum tool to generate the stat's needed as in the example below then pump them into another RRD created with just the GAUGE DS type and just LAST (no need for MIN/AVG/MAX).
This allows using RRDs to collect both traffic bandwidth as well as monthly traffic volumes / traffic quota limits.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From mrtg-traffic-sum just write up a script that will populate your 2nd rrd with these values & presto you have a traffic volume / quota graph also.

How does Graphite handle oversamples

I am trying to understand how Graphite treats over samples. I read the documentation but could not find the answer.
For example, If I specify in Graphite that the retention policy should be 1 sample in 60 seconds and graphite receives something like 200 values in 60 seconds, what will be stored exactly ? Will graphite take an average or a random point in those 200 points ?
Short answer: it depends on the configuration, default is to take the last one.
Long answer, Graphite can configure, using regexp a strategy to aggregate several points in one sample.
These strategies are configured in storage-aggregations.conf file, using regexp to select metrics:
[all_min]
pattern = \.min$
aggregationMethod = min
This example conf, will aggregate points using their minimum.
By default, the last point to arrive wins.
This strategy will always be used to aggregate from higher resolutions to lower resolutions.
For example, if storage-schemas.conf contains:
[all]
pattern = .*
retentions = 1s:8d,1h:1y
Given the sum aggregation method, all points arrived for the same second will be summed and stored with a second resolution.
Points older than 8 days will be summed again to one hour resolution.
The aggregation configuration only applies when moving from archive i to archive i+1. For oversampling, it's always pick the last sample in the period.
The recommendation is to match sampling rate with the configuration.
see graphite issue

Non standard intervals in date histogram in Kibana

I'm using Kibana 4 to graph response times. When there is low load on the system, the average responses vary a lot if I aggregate them by second (because there might only be a couple of requests coming in during that second).
I could aggregate them by minute, but then I would lose a lot of detail. I would like to aggregate by some other interval, like 5 or 10 seconds, but I cannot find a way to do that.
Solved! (maybe in 4.1?)
Now you can select the "Custom" interval.

graphite summarize function not working as expected

I am feeding data into a metric, let say it is "local.junk". What I send is just that metric, a 1 for the value and the timestamp
local.junk 1 1394724217
Where the timestamp changes of course. I want to graph the total number of these instances over a period of time so I used
summarize(local.junk, "1min")
Then I went and made some data entries, I expected to see the number of requests that it received in each minute but it always just shows the line at 1. If I summarize over a longer period like 5 mins, It is showing me some random number... I tried 10 requests and I see the graph at like 4 or 5. Am I loading the data wrong? Or using the summarize function wrong?
The method summarize() just sums up your data values so co-relate and verify that you indeed are sending correct values.
Also, to localize weather the function or data has issues, you can run it on metricsReceived:
summarize(carbon.agents.ip-10-0-0-1-a.metricsReceived,"1hour")
Which version of Grahite are you running?
You may want to check your carbon aggregator settings. By default carbon aggregates data for every 10 seconds. Without adding any entry in aggregation-rules.conf, Graphite only saves last metric it receives in the 10second duration.
You are seeing above problem because of that behaviour. You need to add an entry for your metric in the aggregation-rules.conf with sum method like this
local.junk (10) = sum local.junk

Graphite is not graphing anything for ranges bigger than 7 hours

My current retention rule is like so:
[whatever]
priority = 110
pattern = ^stats\.whatever\..*
retentions = 60:10080,600:262974
If I understand correctly, this will save 2 days of 1 minute data and 5 years of ten minute data.
I have been sending data to graphite for the last couple of hours and I can see the a graph of this data but only for ranges less than 7 hours. If I try to visualize this data for a range of, for example, 1 day, the resulting graph doesn't show a single data point.
Is this caused by my retention rule?
thanks in advance.
I had this same problem. After you change your retention rules, you need to restart carbon-cache.py. If you want to keep the data you have you need to run whisper-resize.py on your whisper files (.wsp).
This link should help too:
https://answers.launchpad.net/graphite/+question/140289
However in that link, the parameters passed to whisper-resize.py are in the wrong order. It should be whisper-resize.py <file> <retention rate>
Here's a helpful command for resizing:
find /opt/graphite/storage/whisper -type f -name "*.wsp" -exec whisper-resize.py {} <retention rate> \;
Adjust it as needed.
I had a similar problem; for me it wasn't the retention rules, but the aggregation rules. By default, my counters were being assigned to --agggregationMethod average and -xFilesFactor 0.5. But my data was nowhere near that dense, so the aggregator was throwing away my data on the grounds that there wasn't a statistically significant sample available.
In my particular use case, I was interested in the peak value over some time period, so I used whisper-resize.py to reconfigure my database: --aggregationMethod max, --xFilesFactor 0.0 gave me the behavior I was expecting.
See also storage-aggregation.conf

Resources