Heat Autoscaling - Scaleup - Change a flavor - openstack

I am using Heat Autoscaling in my environment. Can be able to see that it is working fine in case of scale up (Technically scale out) , that is adding a instance based on the load as well as deleting a instance is working fine as expected.
But I need to do scaling (Technically Scale Up) a resource once the load limit is reached.
That is Once the load limit which we mentioned is reached I need to scale the cpu resource that is changing the flavor of the instance.
Anyone please let me know how we can achieve it.
Any help is appreciable.
Heat yaml:
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: webserver}
cooldown: 60
scaling_adjustment: 1

With Openstack HEAT Autoscaling feature, it is possible to scale in/out only the identical resource (of same flavor,image,..) dynamically based on the Ceilometer metrics. Your requirement is related to resizing an instance which has to be done manually and can't be done through HEAT autoscaling.

Related

aws neptune times out for large graph drop ()

There are some threads on this subject already..
particularly this one
but is there any recommended solution to drop large graph other than batching..?
I tried increasing timeout and it doesn't work
Below is the example..
gremlin> g.V().count()
==>5230885
gremlin> g.V().drop().iterate()
{"requestId":"77c64369-45fa-462f-91d7-5712e3308497","detailedMessage":"A timeout occurred within the script during evaluation of [RequestMessage{, requestId=77c64369-45fa-462f-91d7-5712e3308497, op='eval', processor='', args={gremlin=g.V().drop().iterate(), bindings={}, batchSize=64}}] - consider increasing the timeout","code":"TimeLimitExceededException"}
Type ':help' or ':h' for help.
Display stack trace? [yN]N
gremlin> g.E().count()
==>83330550
gremlin> :remote config timeout none
==>Remote timeout is disabled
gremlin> g.E().drop().iterate()
{"requestId":"d418fa03-72ce-4154-86d8-42225e4b9eca","detailedMessage":"A timeout occurred within the script during evaluation of [RequestMessage{, requestId=d418fa03-72ce-4154-86d8-42225e4b9eca, op='eval', processor='', args={gremlin=g.E().drop().iterate(), bindings={}, batchSize=64}}] - consider increasing the timeout","code":"TimeLimitExceededException"}
Type ':help' or ':h' for help.
Display stack trace? [yN]N
You can increase the timeout of your neptune cluster by using the parameter group option neptune_query_timeout.
If using version 3.3.7 of the Java client, you can specify it for specific requests:
Set Timeouts at a Per-Query Level
Hopefully soon you will be able to run:
g.with("scriptEvaluationTimeout", 600).V().drop()
You have two options currently to drop an entire graph that is large. One option of course is to delete the current cluster and create a new one. To delete the existing graph the best approach is to use multiple threads that drop chunks of the graph in batches. I have been working on some Python code that can do just that. It is currently on a branch at this location.
https://github.com/awslabs/amazon-neptune-tools/tree/master/drop-graph
For a graph of the size you have the tool should work fine as is. It does have some limitations currently that are documented in the code.
UPDATED 2021-Dec-8 to add:
Since this question was asked Amazon Neptune now supports a Fast Reset API that can be used to delete all of the data in a cluster. More details here: https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-fast-reset.html
If you have trouble with timeouts on LOAD or UNLOAD requests consider that the neptune_query_timeout configuration setting applies, but has to be set for every DB instance that you address (and not just once for the cluster). Different configuration parameter sets can be applied to the cluster and the single instances, so make sure the parameter set for the instance in question has the right timeout setting.

GMFBridge DirectShow filter SetLiveTiming effect

I am using the excellent GMFBridge directshow family of filters to great effect, allowing me to close a video recording graph and open a new one, with no data-loss.
My original source graph was capturing live video from standard video and audio inputs.
There is an undocumented method on the GMFBridgeController filter named SetLiveTiming(). From the name, I figured that this should be set to true if we are capturing from a Live graph (not from a file) as is my case. I set this value to true and everything worked as expected
The same capture hardware allows me to capture live TV signals (ATSC in my case), so I created a new version of the graph using the BDA architecture filters, for tuning purposes. Once the data flows out from the MPEG demuxer, the rest of the graph is virtually the same as my original graph.
However, on this ocassion my muxing graph (on the other side of the bridge) was not working. Data flowed from the BridgeSource filter (video and audio) and reached an MP4 muxer filter, however no data was flowing from the muxer output feeding a FileWriter filter.
After several hours I traced the problem to the SetLiveTiming() setting. I turned it off and everything began working as expected. and the muxer filter began producing an output file, however, the audio was not synchronized to the video.
Can someone enlighten me on the real purpose of the SetLiveTiming() setting and perhaps, why one graph works with the setting enabled, while the other fails?
UPDATE
I managed to compile the GMFBridge Project, and it seems that the filter is dropping every received sample because of a negative timestamp computation. However I am completely baffled at the results I am seeing after enabling the filter log.
UPDATE 2: The dropped samples were introduced by the way I launched the secondary (muxer) graph. I inspected a sample using a SampleGrabber (thus inside a streaming thread) as a trigger-point and used a Task.Run() .NET call to instantiate the muxer graph. This somehow messed up the clocks and I ended having a 'reference start point' in the future - when the bridge attempted to fix the timestamp by subtracting the reference start point, it produced a negative timestamp - once I corrected this and spawned the graph from the application thread (by posting a graph event), the problem was fixed.
Unfortunately, my multiplexed video (regardless of the SetLiveTiming() setting) is still out of sync.
I read that the GMFBridge filter can have trouble when the InfTee filter is being used, however, I think that my graph shouldn't have this problem, as no instance of the InfTee filter is directly connected to the bridge sink.
Here is my current source graph:
-->[TIF]
|
[NetworkProvider]-->[DigitalTuner]-->[DigitalCapture]-->[demux]--|-->[Mpeg Tables]
|
|-->[lavAudioDec]-->[tee]-->[audioConvert]-->[sampleGrabber]-->[NULL]
| |
| |
| ->[aacEncoder]----------------
| |--->[*Bridge Sink*]
-->[VideoDecoder]-->[sampleGrabber]-->[x264Enc]--------
Here is my muxer graph:
video
... |bridge source|-------->[MP4 muxer]--->[fileWriter]
| ^
| audio |
---------------------
All the sample grabbers in the graph are read-only. If I mux the output file without bridging (by placing the muxer on the capture graph), the output file remains in sync, (this ended being not true, the out-of-sync problem was introduced by a latency setting in the H264 encoder) but then I can't avoid losing some seconds between releasing the current capture graph, and running the new one (with the updated file name)
UPDATE 3:
The out of sync problem was inadvertently introduced by me several days ago, when I switched off a "Zero-latency" setting in the x264vfw encoder. I hadn't noticed that this setting had desynchronized my already-working graphs too and I was blaming the bridge filter.
In summary, I screwed up things by:
Launching the muxer graph from a thread other than the Application
thread (the thread processing the graph's event loop).
A latency switch in an upstream filter that was probably delaying
things too much for the muxer to be able to keep-up.
Author's comment:
// using this option, you can share a common clock
// and avoid any time mapping (essential if audio is in mux graph)
[id(13), helpstring("Live Timing option")]
HRESULT SetLiveTiming([in] BOOL bIsLiveTiming);
The method enables a special mode of operation which addresses live data. In this mode sample times are converted between the graphs as relative to respective clock start times. Otherwise, the default mode is to expect reset of time stamps to zero with graph changes.

scaling an azure website

I have a Standard website in Azure with a small instance, (1 core and 1.75 GB memory). It seems to be coping fine and handling the requests smoothly, although I am expecting a lot more within the week.
It is unclear though under what circumstances I should be looking to scale the instance size to the next level ie to Medium. (Besides MemoryWorkingSet of course, rather obvious :))
ie. Will moving up to a Medium instance resolve high CPU time ?
What other telltales should I be watching for ?
I am NOT comfortable scaling the number of instances to more than one at the moment until I resolve some cache issues.
I think the key point I am trying to understand is the link between the metrics provided and the means of scaling available regardless of it being scaled horizontally or vertically.
I am trying to keep the average response time as low as possible as the number of users that interact with the website increase.
Which of the other metrics will alert me when the load on the server is getting to its limits & I will need to scale Vertically ?
The idea behind scaling in Azure is to scale horizontally, i.e. add more instances. Azure can do this for you automatically. If you can't add more instances, Azure can't do the scaling for you automatically.
You can move to Medium instance, overall capacity will increase, but it is impossible to say what your application will require under heavy load. I suggest you run profiler and load test to find out the weak parts of your app and improve these before you have an actual increase in useage.

When to scale up an Azure Standard Instance Size

I have 19 websites running on Azure Standard Websites, with the instance size set to Small.
Right now I can't scale out to multiple instances (or use auto scale) because some of these sites are legacy sites that won't play nice across multiple sites.
The sites running now are fairly basic, but there are 3 sites that are growing fast, and I don't want to have them all bogged down because of the small instance, but I also don't want to pay for a large instance if I don't have to.
How to know when I should scale up to a medium or large instance?
There doesn't seem to be any way to see CPU load in the portal, only CPU time.
Instead of scale up, you need to scale out. you can set CPU metric and set the number if instances you need for that metric -

How can I configure Munin to give me a total of all my cloud servers?

I have a dozen load balanced cloud servers all monitored by Munin.
I can track each one individually just fine. But I'm wondering if I can somehow bundle them up to see just how much collective CPU usage (for example) there is among the cloud cluster as a whole.
How can I do this?
The munin.conf file makes it easy enough to handle this for subdomains, but I'm not sure how to configure this for simple web nodes. Assume my web nodes are named, web_node_1 - web_node_10.
My conf looks something like this right now:
[web_node_1]
address 10.1.1.1
use_node_name yes
...
[web_node_10]
address 10.1.1.10
use_node_name yes
Your help is much appreciated.
You can achieve this with sum and stack.
I've just had to do the same thing, and I found this article pretty helpful.
Essentially you want to do something like the following:
[web_nodes;Aggregated]
update no
cpu_aggregate.update no
cpu_aggregate.graph_args --base 1000 -r --lower-limit 0 --upper-limit 200
cpu_aggregate.graph_category system
cpu_aggregate.graph_title Aggregated CPU usage
cpu_aggregate.graph_vlabel %
cpu_aggregate.graph_order system user nice idle
cpu_aggregate.graph_period second
cpu_aggregate.user.label user
cpu_aggregate.nice.label nice
cpu_aggregate.system.label system
cpu_aggregate.idle.label idle
cpu_aggregate.user.sum web_node_1:cpu.user web_node_2:cpu.user
cpu_aggregate.nice.sum web_node_1:cpu.nice web_node_2:cpu.nice
cpu_aggregate.system.sum web_node_1:cpu.nice web_node_2:cpu.system
cpu_aggregate.idle.sum web_node_1:cpu.nice web_node_2:cpu.idle
There are a few other things to tweak the graph to give it the same scale, min/max, etc as the main plugin, those can be copied from the "cpu" plugin file. The key thing here is the last four lines - that's where the summing of values from other graphs comes in.

Resources