Non standard intervals in date histogram in Kibana - kibana

I'm using Kibana 4 to graph response times. When there is low load on the system, the average responses vary a lot if I aggregate them by second (because there might only be a couple of requests coming in during that second).
I could aggregate them by minute, but then I would lose a lot of detail. I would like to aggregate by some other interval, like 5 or 10 seconds, but I cannot find a way to do that.

Solved! (maybe in 4.1?)
Now you can select the "Custom" interval.

Related

How do I create accumulated bandwidth usages in RRDtool (ie. GB's per month down)?

The following data comes from a mobile phone provider, it's a list of kb's downloaded at a certain time, usually on a per minute basis.
It's not the average, not the max, but the total of that time interval, which allows to track the data consumption precisely. These graphs were made with PIL, and instead of showing spikes to indicate a large data consumption, large steps can be seen, which is much more revealing, because it doesn't just tell "much happened here", but "exactly this much happened here". For example second graph Sat 10 at night 100mb. A rate-change graph wouldn't be as informative.
I'm also trying to find a way to do this with rrd.
I was mislead when using the COUNTER to track my networks data usage into thinking that I would be able to precisely compute the monthly/weekly accumulated data usage, but now it turned out to be a wrong assumption.
How I store my data in rrd in order to be able to easily generate graphs like below? Would that be by using ABSOLUTE and before updating it I would subtract the previous insertion value? Would that be precise down to the byte when checking the monthly usage?
You can add up all the value in your chart quite easily:
CDEF:sum=data,$step_width,*,PREV,ADDNAN
if your chart covers just one month, that should be all you have todo. If you want to have it cover multiple months, you will have to use a combination of IF and TIME operators to reset the line to 0 at the start of the month.
Version 1.5.4 will contain an additional operator called STEPWIDTH, which pushes the step width onto the stack, making this even simpler.
This is a common question which very few answers online but I first encountered a method to do this with RRD in 2009.
The DS type to use is a GAUGE and in your update script manually handle resetting the GAUGE to 0 at the start of the month for monthly usage graphs.
Then came along RRDTool's ' mrtg-traffic-sum ' package.
More recently I've had to monitor both traffic bandwidth and traffic volume so I created a standard RRD for that first and confirmed that was working.
So with the bandwidth being sampled (captured to the RRD), then use the mrtg-traffic-sum tool to generate the stat's needed as in the example below then pump them into another RRD created with just the GAUGE DS type and just LAST (no need for MIN/AVG/MAX).
This allows using RRDs to collect both traffic bandwidth as well as monthly traffic volumes / traffic quota limits.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From mrtg-traffic-sum just write up a script that will populate your 2nd rrd with these values & presto you have a traffic volume / quota graph also.

How does Graphite handle oversamples

I am trying to understand how Graphite treats over samples. I read the documentation but could not find the answer.
For example, If I specify in Graphite that the retention policy should be 1 sample in 60 seconds and graphite receives something like 200 values in 60 seconds, what will be stored exactly ? Will graphite take an average or a random point in those 200 points ?
Short answer: it depends on the configuration, default is to take the last one.
Long answer, Graphite can configure, using regexp a strategy to aggregate several points in one sample.
These strategies are configured in storage-aggregations.conf file, using regexp to select metrics:
[all_min]
pattern = \.min$
aggregationMethod = min
This example conf, will aggregate points using their minimum.
By default, the last point to arrive wins.
This strategy will always be used to aggregate from higher resolutions to lower resolutions.
For example, if storage-schemas.conf contains:
[all]
pattern = .*
retentions = 1s:8d,1h:1y
Given the sum aggregation method, all points arrived for the same second will be summed and stored with a second resolution.
Points older than 8 days will be summed again to one hour resolution.
The aggregation configuration only applies when moving from archive i to archive i+1. For oversampling, it's always pick the last sample in the period.
The recommendation is to match sampling rate with the configuration.
see graphite issue

How can I obtain the percentage increase in a graphite value over several consecutive time periods?

I have several queue lengths stored as gauges in Graphite and I need to form a query URL which checks for a certain trend. Specifically, I need to know if a queue is:
...growing at a rate of more than 15% over three consecutive 1 hour periods.
I have tried various approaches including:
summarize(derivative(queue.length),"3h",avg)
...and various options with a timeStack(queue.length,"1h",3) series, but I just can't work it out.
Any suggestions appreciated.

R - Cluster x number of events within y time period

I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
https://dl.dropboxusercontent.com/u/16400709/StackOverflow/DataStack.csv
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.

graphite summarize function not working as expected

I am feeding data into a metric, let say it is "local.junk". What I send is just that metric, a 1 for the value and the timestamp
local.junk 1 1394724217
Where the timestamp changes of course. I want to graph the total number of these instances over a period of time so I used
summarize(local.junk, "1min")
Then I went and made some data entries, I expected to see the number of requests that it received in each minute but it always just shows the line at 1. If I summarize over a longer period like 5 mins, It is showing me some random number... I tried 10 requests and I see the graph at like 4 or 5. Am I loading the data wrong? Or using the summarize function wrong?
The method summarize() just sums up your data values so co-relate and verify that you indeed are sending correct values.
Also, to localize weather the function or data has issues, you can run it on metricsReceived:
summarize(carbon.agents.ip-10-0-0-1-a.metricsReceived,"1hour")
Which version of Grahite are you running?
You may want to check your carbon aggregator settings. By default carbon aggregates data for every 10 seconds. Without adding any entry in aggregation-rules.conf, Graphite only saves last metric it receives in the 10second duration.
You are seeing above problem because of that behaviour. You need to add an entry for your metric in the aggregation-rules.conf with sum method like this
local.junk (10) = sum local.junk

Resources