How to force trigger aodh alarm action immediately? - openstack

I have an Openstack Aodh alarm and it will start the action when the memory usage is greater than 85% for one minute. Now I would like to trigger the action immediately, manually, which means force the alarm action start even though the condition doesn't reach the limits, but how?
According to the docs, I've tried to set the state of Aodh alarm to alarm, but it didn't work, it evaluated the memory usage and do nothing(cause its less than 85%), then set the state back to ok again.
Are there any ways to force trigger Aodh alarm action? I would appreciate any help.
Here are the parts of my Aodh alarm:
aggregation_method: mean
alarm_actions: [u'trust+http://192.168.0.100:8004/v1/284e047522bd4adfa3aa5109f1c7513b/stacks/corey_test/d9915fd3-5086-4d38-971b-2694c41e8099/resources/rdgw_scaleup_policy/signal']
alarm_id: e6402673-9a8e-4745-a8df-699edd6ab57a
comparison_operator: gt
enabled: True
evaluation_periods: 1
granularity: 60
metric: memory_util
ok_actions: []
repeat_actions: True
resource_type: instance
severity: low
state: ok
state_reason: Transition to ok due to 1 samples inside threshold, most recent: 11.0
threshold: 85.0
type: gnocchi_aggregation_by_resources_threshold
Update 2020/11/04
The only thing that comes into my mind is to reduce the threshold and evalution_periods temporarily (ex: threshold:1, periods:1), that will force the alarm start scaling, after the new instance is created, recover the threshold and evalution_periods values back. It works but I don't think that is the best method.

The alarm actions are AFAIU just HTTP POSTs to the URLs listed in 'alarm_actions', so you can do it yourself (provided you have access to that URL).
In your particular case it is clearly a Heat stack scaling action. You should be able to make a HTTP POST to appropriately similar URL - replace trust+https://<host>:<port> part with public Heat endpoint (openstack catalog show orchestration) and add a valid Keystone token to the request header.
Alternatively, for Heat stack scaling you can use use the openstack stack resource signal command (that does effectively the same REST call, just helps you with auth and endpoint discovery) - the stack ID and the resource name are visible in the URL, so in your case it will be openstack stack resource signal d9915fd3-5086-4d38-971b-2694c41e8099 rdgw_scaleup_policy

Related

How to setup web hooks to send message to Slack when Firebase functions crash?

I need to actively receive crash notifications for firebase functions.
Is there any way to set up Slack webhooks to receive a message when Firebase Functions throw an Error, functions crash, or something like that?
I would love to receive issue messages by velocity ie: Firebase Functions crash 50 times a day.
Thank you so much.
First you have to create a log based (counter) metric that will be counting specific error occurencies and second - you create alerting policy with Slack notification channel.
Let's start from finding corresponding logs that appear when the function throws an error. Since I didn't have one that would crash I used logs that indicated that it was started.
Next you have to create a log based metric. Ignore the next screen and go to Monitoring > Alerting. Click on "Create new policy", find your metric and select "Rolling Window" to whatever time period you need. For testing I used 1 minute. Then set "Rollind windows function" to "mean".
Now configure when the alert has to be triggered - I chose over 3 (within 1 minute window).
On the next screen you select notification channel. In case of Slack it has to be configured first in "Notification Channels".
You can save policy the policy now.
After a few minutes I gathered enough data to generate two incidents:
And here's some alerting related documentation that may help you understand how to use them.

dp:url-open's timeout value is getting ignored in datapower

I am providing a timeout of one second , however when the URL is down it is taking 120+ seconds for the response to come. Is there some variable or something that overrides the timeout in do:url-open ?
Update: I was calling the dp:url-open on request-transformation as well as on response-transformation. So the overriden timeout is 60 sec, adding both side it was becoming 120 sec.
Here's how I am calling this (I am storing the time before and after dp:url-open calls, and then returning them in the response):
Case 1: When the url is reachable I am getting a result like:
Case 2: When url is not reachable:
Update: FIXED: It seems the port that I was using was getting timed-out in the firewall first there it used to spend 1 minute. I was earlier trying to hit an application running on port 8077, later I changed that to 8088, And I started seeing the same timeout that I was passing.
The do:url-open() timeout only affects the operation done in the script but not the service itself. It depends on how you have built the solution but the time-out from the do:url-open() should be honored.
You can check this by setting logs to debug and adding a <xsl:message>Before url-open</xsl:message> and one after to see in the log if it is your url-open call or teh service that waits 120+ sec.
If it is the url-open you have most likely some error in the script and if it is the service that halts the response you need to return from the script (or throw an error depending on your needs) to halt the service.
You can set the time-out for the service itself or set a time-out in the User Agent for the specific URL you are calling as well.
Please note that the time-out will terminate the service after that time if you set it on service level so 1 sec. would not be recommended!

Retry and Failure queue prioritization in NiFi

I have a queue at NiFi that contains the items that will be processed through an API query (invokeHTTP). These items can be processed and return the answer with the data correctly (status 200), they can not be found (status 404) and also a failure (status 500).
However, in the case of status 404 and 500, false negatives can happen, so if I consult the same data that gave an error again, it returns with status 200. But there are cases that there really is a failure and it is not a false negative.
So I created a queue for retry and failure for them to enter involeHTTP again and consult the API. I put an expiration time of 5 minutes so that the data that is really at fault is not forever consulting the API.
However, I wanted to prioritize this Failure and Retry queue, so that by the time a data reaches it, it will be consulted in the API again, in front of the standard processing queue, so as not to lose the data that gave false negatives.
Is it possible to do this treatment with this self relationship or do you need a new flowfile?
Each queue can have a prioritizer configured on the queue's settings. Currently you have two separate queues for InvokeHttp, the failure/retry queue and the incoming queue from the matched relationship of EvaluateJsonPath. You need to put a funnel in front of InvokeHttp and send both of these queues to the funnel, then the funnel to InvokeHttp. This way you can create a single incoming queue to InvokeHttp and configure the prioritizer there.
In order to prioritize it correctly, you may want to use Flow File Attribute prioritizer. You would use UpdateAttribute to add a "priority" attribute to each flow file, the ones for failure/retry get priority "A" and the others get priority "B" (or anything that sorts after A).

How long does Firebase throttle you?

Even with debug enabled for RemoteConfig, I still managed to get the following:
Error fetching remote config values Optional(Error Domain=com.google.remoteconfig.ErrorDomain Code=8002 "(null)"
UserInfo={error_throttled_end_time_seconds=1483110267.054194})
Here is my debug code:
let debug = FIRRemoteConfigSettings(developerModeEnabled: true)
FIRRemoteConfig.remoteConfig().configSettings = debug!
Shouldn't the above prevent throttling?
How long will the throttle error remain in effect?
I've experienced the same error due to throttling. I was calling FIRRemoteConfig.remoteConfig().fetchWithExpirationDuration with an expiry that was less than 60 seconds.
To immediately get around this issue during testing, use an alternative device. The throttling occurs against a particular device. e.g. move from your simulator to a device.
The intention is not to have a single client flooding the server with fetch requests every second. Make sensible use of the caching it offers out of the box and fetch only when necessary.
When you receive this error, plug the value of error_throttled_end_time_seconds into an epoch converter (like this one at https://www.epochconverter.com) and it will tell you the time when throttling ends. I've tested this myself, and the throttling remains in effect for 1 hour from the first moment you are throttled. So either wait an hour or try some of the other recommendations given here.
UPDATE: Also, if you continue making config requests and receive the throttle error, the expire timeout does not increase (i.e. "you are not further penalized").
The quick and easy hack to get your app running is to delete the application and reinstall it. Firebase identifies your device as new device on reinstalling.
Hope it helps and save your time.

How does Heat set alarm configuration and get alarm back from Ceilometer?

I really need your helps. Currently, I am working on Heat auto-scaling. I already learnt some documents about auto-scaling in Heat. I know that Heat uses Ceilometer API to set alarm configuration and get alarm back from Ceilometer via Webhook. These actions are shown in HOT template (OS::Heat::Ceilometer::Alarm). I tried to look at Heat code but I still cannot find where (what modules) handle alarm actions. In particularly, what module will be responsible for creating alarm url and what module will receive and handle alarm url triggered from Ceilometer.
Thanks
for creating alarm url:
you should see the method _get_ec2_signed_url
alarm url triggered:
It's a singal in heat-cfg service. you can find more code(Liberty) in
heat/api/cfn/v1/__init__.py
mapper.connect('/signal/{arn:.*}',
controller=signal_controller,
action='signal',
conditions=dict(method=['POST']))
and heat/api/cfn/v1/signal.py
def signal(self, req, arn, body=None):
con = req.context
identity = identifier.ResourceIdentifier.from_arn(arn)
try:
self.rpc_client.resource_signal(
con,
stack_identity=dict(identity.stack()),
resource_name=identity.resource_name,
details=body)
except Exception as ex:
return exception.map_remote_error(ex)
then you can follow the call chain to find what you want

Resources