I have set alerts for Disk usage and now I want set alerts for Memory usage but I am not able to set it is giving error FAILED TO TEST RULE. Axes B is used and Axes C is Total.
In terms of defining an alert, I think you are doing it correctly.
Looking at Grafana's source code here. The error message "Failed to test rule" seems to be a nasty one as it is returning a http 500 code, which means "internal server error".
In other words, this is possibly a Grafana server bug. Probably raise it with the Grafana team here with the steps to replicate.
I just checked my Data Source it was $datasource and I am using influxDB so I changed it to influxDb.
I recommend to change the avg() in your Conditions.
Set it to last(). So it takes only the last value for your alert.
Related
I'm new in EFK. I have a problem with showing logs in Kibana. I already resolved, but I'm not sure of my approach.
Problem: Kibana shows log after 10 minutes when Elasticsearch is restarted.
I studied in https://docs.fluentd.org/configuration/buffer-section
And I found out that if I configure timekey_wait parameter in buffer section is 0s, Kibana shows logs without delay.
The problem is resolved, but I have still concerned about timekey_wait parameter.
Are there others impacted by the change?
Why is timekey_wait needed? Please give me an example of the necessity of it.
Thank you for your time!
1 & 2 ) According to the documentation by using timekey_wait parameter fluentd waits the specified amount of time, before writing chunks. By this way delayed log lines that needs to be in the same log chunk won't be missed.
If your timekey is 60m and timekey_wait is 10m, now the chunks will be written after 70m not 60m.
If you don't have delayed log lines to be come it becomes less important.In one of my implementations, I use flush_interval parameter. That way timekey is not needed. (buffer chunks will be flushed after this time)
I'm an IBM BPM developer and I have following issue:
On my BPD I have an intermediate event that is triggered by a UCA that is called from an Human Service. Everything is working fine, the only problem is that I always get a warning like this in the System.Out.log:
CWLLG0297W: The intermediate event with ID BpdEvent.51155527-fdce-45de-b2be-5da9fb67ab7a can never receive a message from UCA UCA.5e12e401-0968-49f4-8c63-fb7110fdbfb6 because it is correlated on an invalid output parameter.
I tried correlation on tw.system.process.instanceId and tw.system.currentProcessInstance.id, both works, but both rise the warning. Following my researches that's a common issue for BPM 7.5 and 8, but I'm using 8.5.6 and also in 8.5.5 the behaviour is the same.
Can anyone help me?
Thanks in advance!
No, IME stands for "Intermediate Message Event", which is the widget on the BPD that listens for the UCA completion that it will use to correlate. A UCA can't really have a correlation problem, since it does do correlation. A correlation problem is a problem with the listener, a.k.a. the IME. All I'm suggesting is that you pick one of the listeners on the BPD, delete it and recreate it. The test to see if it is now correlating properly. If so, then you fell victim to a copy/paste bug.
from https://www.ibm.com/developerworks/community/forums/html/topic?id=1fd6422f-c361-413f-a5f9-7280557e269d
Is the description component of a HTTP status code used? For example, in the HTTP response '200 OK', is the OK (i.e. the description) ever used? Or is it just for humans to read?
No, the "reason phrase" is purely there for humans to read. Nothing should be using it programmatically - especially because in HTTP/2, it's been eliminated:
HTTP/2 does not define a way to carry the version or reason phrase that is included in an HTTP/1.1 status line.
The status code is often used. For example, if you are using AJAX, you will most likely check the HTTP status code before using the data returned.
However the description is just for humans as computers recognize the status code and what it entails without the description.
The description is a standard message associated with the code itself to get a quick description of the code. It is used for a diagnostic, along with its full description wich can be found on the https status code list. So, it is not "used" as i think you're implying, but indeed only used for diagnostic by humans
The data sections of messages Error, Forward and redirection responses may be used to contain human-readable diagnostic information.
My client-side sensu metric is reporting a WARN and the data is not getting to my OpenTSDB.
It seems to be stuck, but I don't understand what the message is telling me. Can someone translate?
The command is a ruby script.
In /var/log/sensu/sensu-client.log :
{"timestamp":"2014-09-11T16:06:51.928219-0400",
"level":"warn",
"message":"previous check command execution in progress",
"check":{"handler":"metric_store","type":"metric",
"standalone":true,"command":"...",
"output_type":"json","auto_tag_host":"yes",
"interval":60,"description":"description here",
"subscribers"["system"],
"name":"foo_metric","issued":1410466011,"executed":1410465882
}
}
My questions:
what does this message mean?
what causes this?
Does it really mean we are waiting for the same check to run? if so, how do we clear it?
This error means that sensu is (or thinks it is, actually executing this check currently
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L115
This can be caused by stacking checks, that take longer than their interval to run. (60 seconds in this case)
You can try to set the "timeout" option in the check definition:
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L101
To try to make sensu time out after a while on that check. You could also add internal logic to your check to make it not hang.
In my case, I had accidentally configured two sensu-client instances to have the same name. I think that caused one of them to always think its checks were already running when in reality they were not. Giving them unique names solved the problem for me.
The operation could not be performed because the filter is in the wrong state
I am getting this error when attemting to run hr = m_pGrabber->GetCurrentBuffer(&cbBuffer, NULL);.
Strange part is - it initially worked when I stopped the graph, now it fails on running or stopped graph.
So - what state it should be in??
The sample grabber code in MSDN I copied does not say if the graph should be stopped or running to get the buffer size - but the way it is presented the graph is running. I assume the graph should be running to fill the buffer, but I am not getting pass the sizing the buffer.
The graph is OK, all filters are conncted and renders as required, in may app and in GraphEdit.
I am trying to save the captured still frame into bitmap file so I need the capured data in the buffer.
Buffering and GetCurrentBuffer expose you a copy of last known media sample. Hence, you might hit conditions "no media sample available yet to copy from" and "last known media sample is released due to transition to stopped state". In both cases the request in question might fail. Copy data from SampleCB instead of buffered mode and this is going to be one hundred percent reliable.
See also: ISampleGrabber::GetCurrentBuffer() returning VFW_E_WRONG_STATE
Using GetCurrentBuffer is a bad idea in most cases. Proper way to use sample grabber is by setting your callback and receiving data in SampleCB.