sensu: "previous check command execution in progress" - opentsdb

My client-side sensu metric is reporting a WARN and the data is not getting to my OpenTSDB.
It seems to be stuck, but I don't understand what the message is telling me. Can someone translate?
The command is a ruby script.
In /var/log/sensu/sensu-client.log :
{"timestamp":"2014-09-11T16:06:51.928219-0400",
"level":"warn",
"message":"previous check command execution in progress",
"check":{"handler":"metric_store","type":"metric",
"standalone":true,"command":"...",
"output_type":"json","auto_tag_host":"yes",
"interval":60,"description":"description here",
"subscribers"["system"],
"name":"foo_metric","issued":1410466011,"executed":1410465882
}
}
My questions:
what does this message mean?
what causes this?
Does it really mean we are waiting for the same check to run? if so, how do we clear it?

This error means that sensu is (or thinks it is, actually executing this check currently
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L115
This can be caused by stacking checks, that take longer than their interval to run. (60 seconds in this case)
You can try to set the "timeout" option in the check definition:
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L101
To try to make sensu time out after a while on that check. You could also add internal logic to your check to make it not hang.

In my case, I had accidentally configured two sensu-client instances to have the same name. I think that caused one of them to always think its checks were already running when in reality they were not. Giving them unique names solved the problem for me.

Related

Asynchronous request are not always executed

I'm currently working on some asynchronous calls.
I'm currrently experiencing an issue where sometimes calls will just do nothing and don't throw an error or something.
It happens like once, twice in like 20-30 calls.
This is what I currently have.
DEFINE VARIABLE hAppServer AS HANDLE NO-UNDO.
hAppServer = getServersHandle(AppSrvConnectionEnum:apsvWorkFlow).
RUN ServiceInterface/StartAsync.p ON SERVER hAppServer ASYNCHRONOUS EVENT-PROCEDURE "ProcedureComplete" IN hCallBack (INPUT ipiWorkflowId).
I'm running it on Progress Version 11.6.4. I also have put messages on the ServiceInterface/StartAsync.p procedure, and when the calls do not get through no messages are written to the appserver ofcourse.
Does anyone have an idea?
Off the top of my head, you might have a dead reference to the server. Just for debugging, try to message valid-handle(hAppServer) and hAppServer itself, since I suspect at some point it's returning an invalid reference (though I imagine the enum is not changed during runtime). Nothing wrong in the code you posted, at first look.

In the debugger, can we make `q` choose a given restart?

I am trying a program. On error, I get into the debugger with several custom restarts. The first one retries the operation (and thus does nothing), the fourth one is the one to quit correctly. Pressing q leads to a memory error.
How can the developer make sure programatically that when the user presses q, the right restart is called ? And not the one bound to q that leads to a memory error ? Is that possible ?
That may be too specific to the library I'm trying, or totally the wrong approach.
I only found that q is sldb-quit and that it "invoke[s] a restart which restores to a known program state". q doesn't call the first restart. What does it do ? Is it possible to make it call a given restart ?
thanks

I am not able to set Alert for grafana graph

I have set alerts for Disk usage and now I want set alerts for Memory usage but I am not able to set it is giving error FAILED TO TEST RULE. Axes B is used and Axes C is Total.
In terms of defining an alert, I think you are doing it correctly.
Looking at Grafana's source code here. The error message "Failed to test rule" seems to be a nasty one as it is returning a http 500 code, which means "internal server error".
In other words, this is possibly a Grafana server bug. Probably raise it with the Grafana team here with the steps to replicate.
I just checked my Data Source it was $datasource and I am using influxDB so I changed it to influxDb.
I recommend to change the avg() in your Conditions.
Set it to last(). So it takes only the last value for your alert.

Status of accessing the current offset of a consumer?

I see that there was some discussion on this subject before[*], but I cannot see any way how to access this. Is it possible yet?
Reasoning: we have for example ContainerStoppingErrorHandler. It would be vital, if container was automatically stopped, to know, where it stopped. I guess ContainerStoppingErrorHandler could be improved to log partition/offset, as it has access to Consumer (I hope I'm not mistaken). But if I'm stopping/pausing it manually via MessageListenerContainer, I cannot see a way how to get info about last processed offset. Is there a way?
[*] https://github.com/spring-projects/spring-kafka/issues/431

Kill turtle which causes runtime error

I'm curious as to whether there is a way of reporting the turtle which causes a runtime error?
I have a model which includes many agents and will run fine for hours, however sometimes a runtime error will occur. I have tried a few different things to fix it but always seems like an error occurs to the point I can't spare the time to fix it due to deadlines.
As the occurrence is so rare the easiest solution is to just write in the command center ask turtle X [die] after which I click GO and the problem is 'fixed'.
However I was wondering if anyone knew of a way to kill the turtle producing the error automatically every time a runtime error occurred to save me entering this manually.

Resources