I can't find the correct mathematical formula to compute a SLA (availability) with Grafana:
I have graph to show the duration of downtime for each days:
From this, i would like to compute the SLA (eg: 99,5%).
On the graph for the selected period (Last 7 days) i can to have this data:
71258 is the sum of duration of downtime in second. I have this with summarize(1day, max, false)
I need to have the sum of duration of time for the selected period (here 7 days = 604800second). But how ?
If i have this last data, after i will do :
(100% * 604800) / 71258 = X %
100% - X % = My SLA!!
My question is: Which formula use to have the duration for a selected period in Grafana ?
One of the database you can run behind Grafana, is Axibase Time Series Database (ATSD). It provides built-in aggregation functions that can perform SLA-type calculations, for example, to compute % of the period when the value exceeded the threshold.
THRESHOLD_COUNT - number of violations in the period
THRESHOLD_DURATION - cumulative duration of the violations
THRESHOLD_PERCENT - duration divided by period
In your example, that would be THRESHOLD_PERCENT.
Here's a sample SLA report for Amazon Web Services instance: https://apps.axibase.com/chartlab/0aa34311/6/. THRESHOLD_PERCENT is visualized on the top chart.
The API request looks as follows:
{
"queries": [{
"startDate": "2016-02-22T00:00:00Z",
"endDate": "2016-02-23T00:00:00Z",
"timeFormat": "iso",
"entity": "nurswgvml007",
"metric": "app.response_time",
"aggregate": {
"types": [
"THRESHOLD_COUNT",
"THRESHOLD_DURATION",
"THRESHOLD_PERCENT"
],
"period": {
"count": 1,
"unit": "HOUR"
},
"threshold": {
"max": 200
}
}
}]
}
ATSD driver: https://github.com/grafana/grafana-plugins/tree/master/datasources/atsd
Disclosure: I work for Axibase.
Related
I have set up a log-based alert in Microsoft Azure. The deployment of the alerts done via ARM template.
Where you can input your query and set threshold like below.
"triggerThresholdOperator": {
"value": "GreaterThan"
},
"triggerThreshold": {
"value": 0
},
"frequencyInMinutes": {
"value":15
},
"timeWindowInMinutes": {
"value": 15
},
"severityLevel": {
"value": "0"
},
"appInsightsQuery": {
"value": "exceptions\r\n| where A_ != '2000' \r\n| where A_ != '4000' \r\n| where A_ != '3000' "
}
As far as I understand we can only set threshold once ON an entire query.
Questions: I have multiple statements in my query which I am excluding since it's just a noise. But now I want to set a threshold on value 3000 to 5 and also want to set a time-window to 30 in the same query. meaning only exclude 3000 when it occurs 5 times in the last 30 minutes(when query get run).
exceptions
| where A_ != '2000'
| where A_ != '4000'
| where A_ != '3000'
I am pretty sure that I can't set a threshold like this in the query and the only workaround is to create a new alert just for value 3000 and set a threshold in ARM template. I haven't found any heavy threshold/time filters in Aure. Is there any way I can set multiple thresholds and time filters in a single query? which is again getting checked by different threshold and time filetrs in the ARM template.
Thanks.
I don't fully understand your question.
But for your time window question you could do something like
exceptions
| summarize count() by A_, bin(TimeGenerated, 30m)
That way you will get a count of A_ in blocks of 30 minutes.
Another way would be to do:
let Materialized = materialize(
exceptions
| summarize Count=count(A_) by bin(TimeGenerated, 30m)
);
Materialized | where Count == 10
But then again it all depends on what you would like to achieve
You can easily set that in the query and fire based on the aggregate result.
exceptions
| where timestamp > ago(30m)
| summarize count2000 = countif(A_ == '2000'), count3000 = countif(A_ == '3000'), count4000 = countif(A_ == '4000')
| where count2000 > 5 or count3000 > 3 or count4000 > 4
If the number of results is greater than one than the aggregate condition applies.
I am using Vegeta to make some stress test but I am having some trouble while generating a json report. Running the following command I am able to see text results:
vegeta attack -targets="./vegeta_sagemaker_True.txt" -rate=10 -duration=2s | vegeta report -output="attack.json" -type=text
Requests [total, rate] 20, 10.52
Duration [total, attack, wait] 2.403464884s, 1.901136s, 502.328884ms
Latencies [mean, 50, 95, 99, max] 945.385864ms, 984.768025ms, 1.368113304s, 1.424427549s, 1.424427549s
Bytes In [total, mean] 5919, 295.95
Bytes Out [total, mean] 7104, 355.20
Success [ratio] 95.00%
Status Codes [code:count] 200:19 400:1
Error Set:
400
When I run the same command changing -type-text to -type=json I receive really weird numbers ad they don't make sense for me:
{
"latencies": {
"total": 19853536952,
"mean": 992676847,
"50th": 972074984,
"95th": 1438787021,
"99th": 1636579198,
"max": 1636579198
},
"bytes_in": {
"total": 5919,
"mean": 295.95
},
"bytes_out": {
"total": 7104,
"mean": 355.2
},
"earliest": "2019-04-24T14:32:23.099072+02:00",
"latest": "2019-04-24T14:32:25.00025+02:00",
"end": "2019-04-24T14:32:25.761337546+02:00",
"duration": 1901178000,
"wait": 761087546,
"requests": 20,
"rate": 10.519793517492838,
"success": 0.95,
"status_codes": {
"200": 19,
"400": 1
},
"errors": [
"400 "
]
}
Does anyone know why this should be happening?
Thanks!
These numbers are nanoseconds -- the internal representation of time.Duration in Go.
For example, the latencies.mean in the JSON is 992676847, which means 992676847 nanoseconds, that is 992676847/1000/1000 = 992.676847ms.
Actually, in vegeta, if you declare type as text (-type=text), it will use NewTextReporter, and print the time.Duration as a user-friendly string. If you declare type as json (-type=json), it will use NewJSONReporter and return time.Duration's internal representation:
A Duration represents the elapsed time between two instants as an int64 nanosecond count. The representation limits the largest representable duration to approximately 290 years.
I have created a table with a collection. Inserted a record and took sstabledump of it and seeing there is range tombstone for it in the sstable. Does this tombstone ever get removed? Also when I run sstablemetadata on the only sstable, it shows "Estimated droppable tombstones" as 0.5", Similarly it shows one record with epoch time as insert time for - "Estimated tombstone drop times: 1548384720: 1". Does it mean that when I do sstablemetadata on a table having collections, the estimated droppable tombstone ratio and drop times values are not true and dependable values due to collection/list range tombstones?
CREATE TABLE ks.nmtest (
reservation_id text,
order_id text,
c1 int,
order_details map<text, text>,
PRIMARY KEY (reservation_id, order_id)
) WITH CLUSTERING ORDER BY (order_id ASC)
user#cqlsh:ks> insert into nmtest (reservation_id , order_id , c1, order_details ) values('3','3',3,{'key':'value'});
user#cqlsh:ks> select * from nmtest ;
reservation_id | order_id | c1 | order_details
----------------+----------+----+------------------
3 | 3 | 3 | {'key': 'value'}
(1 rows)
[root#localhost nmtest-e1302500201d11e983bb693c02c04c62]# sstabledump mc-5-big-Data.db
WARN 02:52:19,596 memtable_cleanup_threshold has been deprecated and should be removed from cassandra.yaml
[
{
"partition" : {
"key" : [ "3" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-01-25T02:51:13.574409Z" },
"cells" : [
{ "name" : "c1", "value" : 3 },
{ "name" : "order_details", "deletion_info" : { "marked_deleted" : "2019-01-25T02:51:13.574408Z", "local_delete_time" : "2019-01-25T02:51:13Z" } },
{ "name" : "order_details", "path" : [ "key" ], "value" : "value" }
]
}
]
}
SSTable: /data/data/ks/nmtest-e1302500201d11e983bb693c02c04c62/mc-5-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1548384673574408
Maximum timestamp: 1548384673574409
SSTable min local deletion time: 1548384673
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0714285714285714
TTL min: 0
TTL max: 0
First token: -155496620801056360 (key=3)
Last token: -155496620801056360 (key=3)
minClustringValues: [3]
maxClustringValues: [3]
Estimated droppable tombstones: 0.5
SSTable Level: 0
Repaired at: 0
Replay positions covered: {CommitLogPosition(segmentId=1548382769966, position=6243201)=CommitLogPosition(segmentId=1548382769966, position=6433666)}
totalColumnsSet: 2
totalRows: 1
Estimated tombstone drop times:
1548384720: 1
Another quuestion was on the nodetool tablestats output - what does slice refer to in cassandra?
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
sstablemetadata does not have the information about your table that is not held within the sstable as it is not guaranteed to be run on system that has Cassandra running, and even if it was its very complex to be able to know how to pull the schema information from it.
Since the gc_grace_seconds is a table parameter and not in the metadata it defaults to assuming a 0 gc grace so the droppable times listed in that histogram will be more a histogram of the tombstone creation times by default. If you know your gc grace you can add it as a -g parameter to your sstablemetadata call. like:
sstablemetadata -g 864000 mc-5-big-Data.db
see http://cassandra.apache.org/doc/latest/tools/sstable/sstablemetadata.html for information on the tools output.
With collections it's just normal range tombstone with all that it entails. They are used to prevent the requirement of a read-before-write when overwriting the value of a multicell collection.
How do synced-cron job executes every 2 hour in a day. I'm using percolate:synced-cron package in meteor.
Here is my server code:
SyncedCron.add({
name: 'Cron job will start on',
schedule: function(parser) {
// parser is a later.parse object
return parser.text('every 2 hour');
},
job: function() {
FetchDailyBasisData();
}
});
Meteor.startup(function () {
SyncedCron.start();
});
It suppose to work but no luck.
It should be return parser.text('every 2 hours');
Notice the s in hours.
In the docs, the following periods are defined :
Periods
s, sec, seconds, m, min, minutes, h, hours, day, day of the month, day instance, day of the week, day of the year, week, week of the year, month, year
so I wanted to start playing with it and test it, so i put this config:
storage-schemas.conf:
[short2]
pattern = ^short2\.
retentions = 10s:1m
storage-aggregation.conf
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
what I think that my config say:
get data every 10 seconds and save it to 1 minutes so total of 10 points will be saved
now if i go to
http://localhost/render/?target=short2.sum&format=json&from=-1h
I see many data with null values a lot more than 10,
ok so I give up on that, than I said let's try to feed it data once every 10 seconds, if i do
echo "short2.sum 22 `date +%s`" | nc -q0 127.0.0.1 2003
wait 11 seconds
echo "short2.sum 23 `date +%s`" | nc -q0 127.0.0.1 2003
now looking at the api I can see only the last point get registerd like:
[
null,
1464781920
],
[
null,
1464781980
],
[
null,
1464782040
],
[
null,
1464782100
],
[
23,
1464782160
],
now if I send it another point (a lot after 10 seconds)
echo "short2.sum 24 `date +%s`" | nc -q0 127.0.0.1 2003
this is what I get:
[
null,
1464781920
],
[
null,
1464781980
],
[
null,
1464782040
],
[
null,
1464782100
],
[
24,
1464782160
],
only once in a couple of tries I will see them count as new but they just overwriting each other instead of acting like new data
Actually:
[short2]
pattern = ^short2\.
retentions = 10s:1m
means: all metrics starts with short2. keep for 1 minute with 10 second resolution (each datapoint represents 10s). It also means if there are not defined other storage schemas for short2., it will have value only for 1 (last) minute.
http://graphite.readthedocs.io/en/latest/config-carbon.html#storage-schemas-conf