Need help to setup a crush rule in ceph for ssd and hdd osd - rules

we are having a proxmox cluster with 3 nodes. Each node have 4 ssd and 12 hdd.
My plan is to create 2 crush rules (one for ssd devices and another one for hdd devices).
With these 2 rules I will create 2 pools. One ssd pool and one hdd pool.
But inside the ceph documentation I found this https://docs.ceph.com/en/latest/rados/operations/crush-map/#custom-crush-rules.
I am trying to understand this rule. Would this rule be more useful for my hardware?
Can somebody explain (with simple words), what this rule is doing?
Thank you so much.

The easiest way to use SSDs or HDDs in your crush rules would be these, assuming you're using replicated pools:
rule rule_ssd {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
rule rule_hdd {
id 2
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
These rules make sure to select the desired device class (ssd or hdd) and choose any host within that selection, depending on your pool size (don't use size=2 except for testing purposes) it will choose that many hosts. So in this case the failure-domain is "host".
The rule you refer to in the docs has its purpose in the name "mixed_replicated_rule". It spreads the replicas across different device classes (by the way, the autoscaler doesn't work well with mixed devices classes), I wouldn't really recommend it unless you have a good reason to. Stick to the easy ruleset and just use devices classes which usually are automatically detected when adding the drives.

Related

What is better practice, create one unix socket with multiple connections or multiple sockets with one connection?

I'm making the design of a program that is going to create multiple process with exec and then create a connection to them using sockets and I have multiple alternatives but I don't know which one is better.
Every child process is going to have around 3 services it would like to communicate to the server.
Should I create a 3 sockets and connect every child to those sockets and distinguish between them sending the ID when the connection starts /temp/service{1|2|3}.sock.
Should I create 3 new sockets for every child /temp/{ID}/service{1|2|3}.sock.
The second options seems a bit better because I don't have to tell the sever who I'm when the connection starts, it is implicit in the name of the socket and each service will have it's own socket butI don't know if it will be inefficient to create a 3 sockets for every child.
Interesting question. Here are my thoughts around it:
(Around option 1)
If you have heavy traffic flowing through those sockets, then at some point they may become a bottleneck. If that's not the case (low traffic), then option 1 would work.
(Around option 2)
Let N be the number of child processes created in a given timeframe. If N * 3 > (total number of file descriptors on your machine), for the same timeframe, then definitely option 2 doesn't seem to be the right fit.
If you can also account for a file descriptor recycling rate, that would give more accuracy to the overall evaluation.
(Overall)
I would think about those 2 tradeoffs and make a decision based on that. Without some numbers around it would be hard to take an informed decision.

How to create a zabbix problem whenever a cisco switch interface utilizes more than 80 mbps (80% of it's bandwidth)

I'm trying to create a trigger in zabbix which will show me a problem and alert me on my email whenever an interface in a cisco switch (with snmpv2) crosses 80% of it's bandwidth (100 mbps or 1000 mbps) without hardcoding anything, I tried using this trigger expression:
{/switch name:net.if.out[ifHCOutOctets./switch interface].min(10)}>80000000
I would like to know how can I write this trigger expression which works fine without applying it to every single interface item in every switch. I think that maybe macros could help in these situations but found no explanation or any guide about how to use them or how to use low level discovery which maybe have a part at the solution for my need.
Thanks in advance.
You are correct, you want to use Low Level Discovery to do this, as well as to discover all your interfaces. Low level discovery at a high level consists of two things. 1) you have to tell zabbix how to go discovery a bunch of dynamic things and asssign a LLD macro to them, that is done a the high level of the Discovery rule. 2) you have to tell zabbix what item protototypes, trigger protorypes, etc to dynamically create as actual items and triggers, every time the discovery rule runs.
Take a look at the Arista SNMPv2 template included with zabbix as an example. There are a number of Discovery Rules included in that template, one of which is the Network Interfaces discovery rule. Within the network interfaces discovery rule, zabbix basically does an snmp walk and gets a list of all the interfaces and assigns LLD(Low level discovery macros) for each interfaces such as #IFINDEX, #IFSTATUS, etc. The discovery Rule, like all LLD rules, takes the output of the "Network Interfaces" discovery rule, and uses them to dynamically create actual items on each host the template applied to.
The next part of this to understand is the prototypes. Once zabbbix finds all the network interfaces, your question should be, how do i get it to create new Items in my host for each interface it finds, and how do i get it to create triggers for each interface it finds dynamically, automatically and without user intervention. The answer is protoyptes. Prototypes are child elements of a Low Level Discovery. They are what actually creates the new items and triggers for every thing it discovered.
Take a look here for some examples and docs on low level discovery rules.
https://www.zabbix.com/documentation/4.2/manual/discovery/low_level_discovery#trigger_prototypes
Zabbix can create LLD rules via numerous discovery methods including SNMPv#, which is all capable of being configured in the UI or api, and other customer discovery rules not included through the use of user parmaters, externel checks, etc.
If your make and model of switch is already known to zabbix, a template in "Templates/Network Devices", at least i think thats the path, will exist just like the arista and juniper ones.
You can create custom low level discovery rules as well for non snmp stuff. basically you write a script that will go find the things you want to dynamically add to zabbix and your script needs to return a valid json output with #macronames and values you want added. For example a custom file system discovery rules, which shouldn't be needed because there already included if your using the agent, would produce lines like the ones shown in this example in the official docs.
https://www.zabbix.com/documentation/4.2/manual/discovery/low_level_discovery#creating_custom_lld_rules
In short, check to see if a template exists for your switch already and a discovery rule with the item prototypes to discover things the way you want them already. LLD basically allows zabbix too walk a dynamic data structure of any source, as long as that data structure has a definition known to zabbix, and you tell it what keys and values in the JSON you want to create as items, triggers, etc.
You should leverage the low level discovery feature of Zabbix.
After a standard setup you should have a "Template Module Interfaces SNMPv2", which is used by some other templates as a standard for interface discovery.
Within the template you will find a "Network Interfaces Discovery" discovery rule that runs every hour to:
query the target device for an interface list
create N items for each interface (bits in, bits out, speed, type etc), defined as Item Prototypes
create some triggers, defined as Trigger Prototypes
The speed item is the negotiated interface speed (ie: 1000 for a gigabit switch port), which is your 100% limit.
You can add some definitions to this template to get the alert you need:
Create a calculated item prototype and set its formula to
100*(currentOutputBits/speed)
Create a template macro to define your alert threshold, ie {$INTERFACE_OUTPUT_THRESHOLD}, and set it to 80
Create a trigger prototype that matches when the calculated item is
greater than {$INTERFACE_OUTPUT_THRESHOLD} for N minutes
Optionally, do the same for the currentInputBits item
This setup will create 1 additional item and 1 additional trigger for each physical or logical interface found on the target device: it makes sense to unflag the "create enable" option on the trigger prototype and enable it only on specific ports of specific devices.
The threshold macro can be changed at template level, affecting any device linked to it, or at host level for a custom threshold.
Thanks for your replies, I actually checked again the Template Module Interfaces SNMPv2 and saw that there is a prototype trigger that solves my question.
for anyone who wants to do the same with a switch:
Add to your switch (host) the "Template Module Interfaces SNMPv2" template.
Change the IF_UTIL_MAX macro to whatever value you want it to be, the default is 90 (this is the macro which is responsible for the percentage of the bandwidth which will trigger a problem, for example, if you change it to 60, when a host bandwidth utilization averages more than 60% in any interface for 15 minutes, a problem will be added to the problems tab or to the dashboard).
If the time of 15 minutes isn't right for you, you can change it by going to: configuration -> templates -> Template Module Interfaces SNMPv2 -> Discovery rules -> Network Interfaces Discovery -> Trigger prototypes -> search for a trigger name which contains high bandwidth usage -> in the problem expression and the recovery expression find the .avg() function and change the value inside it to whatever value is right for you, for an example: 15m = 15 minutes, 1s = 1 second etc...
I actually recommend to clone the trigger prototype, change it and then disable the built in one instead of just changing the time inside it for easily debugging errors in the long run. So you can change the name of the trigger prototype and then clone it by pressing clone in the bottom left corner of the screen, then change it's name and settings for whatever suits you the best.
Press add in the bottom left corner of the screen and if you took my advice you also need to click on the green "Yes" link of the built in trigger in the trigger prototypes table for it to disable.
You can also go ahead and try the other answers on this thread, I'm seriously thankful for them but I don't have enough time to check if those work since I already figured it out after reading Simone Zabberoni's and helllordkb's answers and checking the built in low level discovery in the "Template Module Interfaces SNMPv2" template.

What's an elegant/scalable way to dispatch hash queries, whose key is a string, to multiple machines?

I want to make it scalable. Suppose letters are all in lower case. For example, if I only have two machines, queries whose first character is within a ~ m can be dispatched to the first machine, while the n ~ z queries can be dispatched to the second machine.
However, when the third machine comes, to make the queries spread as even as possible, I have to re-calculate the rules and re-distribute the contents stored in the previous two machines. I feel it could be messy. For example, the more complex case, when I already have 26 machines, what should I do when the 27th one comes? What do people usually do to achieve the scalability here?
The process of (self-) organizing machines in a DHT to split the load of handling queries to a pool of objects is called Consistent Hashing:
https://en.wikipedia.org/wiki/Consistent_hashing
I don't think there's a definitive answer to your question.
First is the question of balance. The DHT is balanced when:
each node is under similar load? (load balancing is probably what you're after)
each node is responsible for similar amounts of objects? (this is what you seemed to suggest)
(less likely) each node is responsible for similar amount of the addressing space?
I believe your objective is to make sure none of the machines is overloaded. Unless queries to a single object are enough to saturate a single machine, this is unlikely to happen if you rebalance properly.
If one of the machines is under significantly lower load than the other, you can make the less-load machine take over some of the objects of the higher-load machine by shifting their positions in the ring.
Another way of rebalancing is through virtual nodes -- each machine can simulate being k machines. If its load is low, it can increase the amount of virtual nodes (and take over more more objects). If its load is high, it can remove some of its virtual nodes.

Not able to set cpuset atribute to vcpu element in instance xml

In relation to this patch, I have not been able to use it. I mean, as it has been merged so it is there in my code, but I am not sure how to enable/use it?
This particular patch adds cpuset attribute to the vcpu element of the instance's xml file. I cant figure out how exactly I can do that for a particular instance. Suppose I want to pin pcpu 2 and 3, so the vcpu entry would be
<vcpu cpuset="2,3">2</vcpu>
How can I do that?
From what I understand, this patch does not allow you to do pinning on a per-instance basis. Instead, it allows you to specify a subset of physical CPUs, guaranteeing that the instances will only run on those CPUs. You specify the CPUs with the vcpu_pin_set config option in /etc/nova/nova.conf. Here's an example from the patch:
vcpu_pin_set=4-12,^8,15
Presumably, this would ensure that all instances only run on CPUs 4,5,6,7,9,10,11,12,15

How can I configure Munin to give me a total of all my cloud servers?

I have a dozen load balanced cloud servers all monitored by Munin.
I can track each one individually just fine. But I'm wondering if I can somehow bundle them up to see just how much collective CPU usage (for example) there is among the cloud cluster as a whole.
How can I do this?
The munin.conf file makes it easy enough to handle this for subdomains, but I'm not sure how to configure this for simple web nodes. Assume my web nodes are named, web_node_1 - web_node_10.
My conf looks something like this right now:
[web_node_1]
address 10.1.1.1
use_node_name yes
...
[web_node_10]
address 10.1.1.10
use_node_name yes
Your help is much appreciated.
You can achieve this with sum and stack.
I've just had to do the same thing, and I found this article pretty helpful.
Essentially you want to do something like the following:
[web_nodes;Aggregated]
update no
cpu_aggregate.update no
cpu_aggregate.graph_args --base 1000 -r --lower-limit 0 --upper-limit 200
cpu_aggregate.graph_category system
cpu_aggregate.graph_title Aggregated CPU usage
cpu_aggregate.graph_vlabel %
cpu_aggregate.graph_order system user nice idle
cpu_aggregate.graph_period second
cpu_aggregate.user.label user
cpu_aggregate.nice.label nice
cpu_aggregate.system.label system
cpu_aggregate.idle.label idle
cpu_aggregate.user.sum web_node_1:cpu.user web_node_2:cpu.user
cpu_aggregate.nice.sum web_node_1:cpu.nice web_node_2:cpu.nice
cpu_aggregate.system.sum web_node_1:cpu.nice web_node_2:cpu.system
cpu_aggregate.idle.sum web_node_1:cpu.nice web_node_2:cpu.idle
There are a few other things to tweak the graph to give it the same scale, min/max, etc as the main plugin, those can be copied from the "cpu" plugin file. The key thing here is the last four lines - that's where the summing of values from other graphs comes in.

Resources