nova boot baremetal, select specific machine in pool to 'boot' - openstack

I am using Ironic to help me deploy bare metal in a data center environment using 1U Dell servers. It works very well, I can use Ironic to marshall dozens of servers in the rack, then when I need a bare metal instance (via nova) I just use the flavor associated with those servers and I get one of them. Is there a way I can get a specific one? For example, my servers are numbered from the top, starting with control0, control1 all the way down to control39. So, first I create all of the baremetal servers, introspect them. Then I create a flavor (like below, please forgive the pseudo code) and associate each baremetal server with that profile.
openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 control
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control
i = 0
for each baremetal server's uuid:
ironic node-update server-uuid add name=control$i
i = i + 1
ironic node-update server-uuid add properties/capabilities="profile:control,boot_option:local"
When I loop through the list I know that the servers are in top down physical order. What I would like to be able to do is get nova to create a boot instance on a specific ironic bare metal (like control3). I could create separate flavors for each one but I think there must be a way to select a specific piece of hardware? Or a strategy that would pick them in the order I specify.
I am pretty new to Ironic. I have done quite a bit of googling on the topic but haven't found anything. Here is how I start nova:
nova boot --flavor control --image rhel-server-7.1-x86_64-dvd.iso --nic 'net-id=723e7b11-3e61-481a-827e-e58b369dd28f' mybootinstance1
Which works fine. What I would like to do is have a nova boot line which uses the flavor control, and also the name (control0) or any other property that I can assign to make that machine unique. Something like:
nova boot --flavor control --ironic-instance-name control0 --image rhel-server-7.1-x86_64-dvd.iso --nic 'net-id=723e7b11-3e61-481a-827e-e58b369dd28f' mybootinstance1
This is actually a simplification of the nova pool selection process. I don't want to use a pool, but rather, a specific piece of hardware.
If that isn't possible, is there a big drawback to using 40 flavors to create individual 'pools'?

I think you can use --hint in nova boot to select specific machine in pool.
Preconditions: edit /etc/nova/nova.conf, add 'JsonFilter' in scheduler_default_filters and restart nova-scheduler.Then use nova boot command like this:
nova boot --flavor <flavor> --image <image_id> --nic net-id=<net_id> --hint reservation=<reservation_id> --hint query='["=","$hypervisor_hostname", "<node_uuid>"]' <instance_name>

I'm not quite familiar to this topic, but I'd like to share how to boot an instance to specific host via availability zone.
In my devstack (master) development environment, the procedure is:
$ nova availability-zone-list
+---------------------+----------------------------------------+
| Name | Status |
+---------------------+----------------------------------------+
| internal | available |
| |- fcwszq | |
| | |- nova-conductor | enabled :-) 2015-11-23T06:31:46.000000 |
| | |- nova-cert | enabled :-) 2015-11-23T06:31:41.000000 |
| | |- nova-scheduler | enabled :-) 2015-11-23T06:31:43.000000 |
| | |- nova-network | enabled :-) 2015-11-23T06:31:44.000000 |
| nova | available |
| |- fcwszq | |
| | |- nova-compute | enabled :-) 2015-11-23T06:31:41.000000 |
+---------------------+----------------------------------------+
Note that my environment only gets one compute node whose hostname is fcwszq, but still can be specified as:
nova boot --availability-zone nova:fcwszq --flavor 1 --image c38f0c7e-8ee0-4b0f-8a56-022040b4696f test02
If I specify a non-existent node, for example, nova:non-existent, the instance cannot be created correctly (state is ERROR).
Hope this can help you.
Another way is using host aggregate and flavor metadata to boot instance on a random server in a group, reference: http://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html#d6e21786

Related

Possible to pass through all functions of PCI devices in Openstack?

Many PCI devices (e.g. GPUs) are multifunction.
For instance, for an NVIDIA 3090:
02:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
However, OpenStack only supports passing through the root device (like 02:00.0). Separately, you can pass through the other sub-devices.
When passing through all devices separately on multi-GPU servers, we sometimes get this situation:
mysql> use nova;
mysql> select address, product_id, dev_id, status from pci_devices where deleted = 0;
+--------------+------------+------------------+-----------+
| address | product_id | dev_id | status |
+--------------+------------+------------------+-----------+
| 0000:01:00.0 | 24b0 | pci_0000_01_00_0 | allocated |
| 0000:01:00.1 | 228b | pci_0000_01_00_1 | available |
| 0000:45:00.0 | 24b0 | pci_0000_45_00_0 | available |
| 0000:45:00.1 | 228b | pci_0000_45_00_1 | allocated |
+--------------+------------+------------------+-----------+
14 rows in set (0.00 sec)
As you can see, in this machine, OpenStack has passed through the audio device of PCI 45:00, whereas it has passed through the GPU of PCI 01:00.
This configuration is problematic, as we need the audio devices and the GPUs themselves to be on the same physical PCI card.
Any thoughts or advice?
One option is to just use Libvirt and use the "multifunction=on" and skip OpenStack entirely, but I'm really curious as to if OpenStack has some similar functionality so that all slots / subfunctions are passed through into VMs form the same physical PCI device.
Thanks!

Maxscale is writing on slave with router_options=master (slave/master replication) and listeners stopped

I've configured on 2 servers(srv50/51),
one of them is Master and the second one is slave,
Here the configuration of my configuration file /etc/maxscale.cnf :
[Read-Only Service]
type=service
router=readconnroute
servers=server50, server51
user=YYYYYYYYYYYYY
passwd=XXXXXXXXXXXXXX
router_options=slave
[Write-Only Service]
type=service
router=readconnroute
servers=server50, server51
user=YYYYYYYYYYYYY
passwd=XXXXXXXXXXXXXX
router_options=master
[Read-Only Listener]
type=listener
service=Read-Only Service
protocol=MySQLClient
port=4008
[Write-Only Listener]
type=listener
service=Write-Only Service
protocol=MySQLClient
port=4009
As i understool the router_options look who is the master and send the writing query to the master
Maxscale (via maxadmin) seems to discover the 2 serveur and understand witch one is the Master :
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server51 | 192.168.0.51 | 3306 | 0 | Slave, Running
server50 | 192.168.0.50 | 3306 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
But even if I connect in Mysql in local on my Maxscale Write-Only Listener port (4009), Listener are in Stopped mode, is it normal ?
MaxScale> list listeners
Listeners.
---------------------+--------------------+-----------------+-------+--------
Service Name | Protocol Module | Address | Port | State
---------------------+--------------------+-----------------+-------+--------
Read-Only Service | MySQLClient | * | 4008 | Stopped
Write-Only Service | MySQLClient | * | 4009 | Stopped
MaxAdmin Service | maxscaled | * | 6603 | Running
---------------------+--------------------+-----------------+-------+--------
I've try to create a database in srv51 (slave), and it was created only on srv51, not in srv50.
Is something wrong in my configuration ? It's strange because it's not my first cluster, and on other cluster all write go to the master (but listeners are Running). Do i don't understand well the meaning of "router_options=master" ? How to start listeners ? I prefere to keep the 51 in Write list to detect topology change
===== UPDATE =====
After watching Log file /var/log/maxscale/maxscale1.log
I found that my monitor user didn't have the correct password :
[MySQL Monitor]
type=monitor
module=mysqlmon
servers=server50, server51
user=MONITOR
passwd=MONITOR_PASS
monitor_interval=10000
I corrected password for user and restarted maxscale, Now everything is running :
MaxScale> list listeners
Listeners.
---------------------+--------------------+-----------------+-------+--------
Service Name | Protocol Module | Address | Port | State
---------------------+--------------------+-----------------+-------+--------
Read-Only Service | MySQLClient | * | 4008 | Running
Write-Only Service | MySQLClient | * | 4009 | Running
MaxAdmin Service | maxscaled | * | 6603 | Running
---------------------+--------------------+-----------------+-------+--------
But write query are still done on Slave and not on Master
Thanks to MariaDb support, I was trying to connect like this :
mysql -h localhost --port=4009 -u USER -p
But Maxscale & Mysql were installed in the same server, even if Mysql bind port 3306, when you specify 'localhost', the connection is done on Mysql port 3306 and not in Maxscale port 4009, the port is ignore !!
The solution is to connect like this :
mysql -h 127.0.0.1 --port=4009 -u USER -p
or like this :
mysql -h localhost --protocol=tcp --port=4009 -u USER -p
I've try both solution and they works.
The solution about the listener not Running is on update of the question.
If writes are done on the slaves, the simplest explanation would be that you're executing writes on the wrong port or your configuration is wrong. To diagnose these problems, enable the info log level by adding log_info=true under the [maxscale] section.
If enabling the info log and inspecting the log files does not provide any clues, I'd suggest opening a bug report on the Maxscale Jira.

OpenStack: add security rules failed due to multiple security groups named with "default"

After installed devstack, I follow the doc to set security rules:
$ openstack security group rule create --proto icmp --dst-port 0 default
More than one security_group exists with the name 'default'.
It claims I have multiple security_group which named with default, but I check with nova secgroup-list:
$ nova secgroup-list
WARNING: Command secgroup-list is deprecated and will be removed after Nova 15.0.0 is released. Use python-neutronclient or python-openstackclient instead.
+--------------------------------------+---------+------------------------+
| Id | Name | Description |
+--------------------------------------+---------+------------------------+
| e5466481-f656-46fa-ac72-56f7ab118c70 | default | Default security group |
+--------------------------------------+---------+------------------------+
So, there is only one security group named with default...
Could any one give me some suggestion about that?
Yes that's a common problem when you have multiple tenants, each with a "default" group. The solution I have applied before is to use the group ID instead of the group name when it comes to the "default" group.

how to let bosh errand execute on existed vms

All,
I have an issue, how to let bosj errand on existed vms,
I have a vm deployed a mongodb, and i want to run some errand command in the vm, but i don't know how?
Do any one know this?
Director task 2121
Task 2121 done
+---------+---------+---------+--------------+
| VM | State | VM Type | IPs |
+---------+---------+---------+--------------+
| app-p/0 | stopped | app | 10.62.90.171 |
+---------+---------+---------+--------------+
How to run an errand:
bosh run errand ERRAND_NAME [--download-logs] [--logs-dir DESTINATION_DIRECTORY]
ref: http://bosh.io/docs/sysadmin-commands.html#errand
Assuming you have already marked the deployment job 'lifecycle' as 'errand' in your manifest
ref: Jobs Block section of http://bosh.io/docs/deployment-manifest.html
Like Ben Moss mentioned in his comment, a new VM is created for running an errand job. The errand VM does not stay around after the errand completes. Errand is a short lived job and can be run multiple times

WebLogic OBIEE Scheduler Component Down

I have an OBIEE 11g installation in a Red Hat machine, but I'm finding problems to make it running. I can start WebLogic and its services, so I’m able to enter the WebLogic console and Enterprise Manager, but problems come when I try to start OBIEE components with opmnctl command.
The steps I’m performing are the following:
1) Start WebLogic
cd /home/Oracle/Middleware/user_projects/domains/bifoundation_domain/bin/
./startWebLogic.sh
2) Start NodeManager
cd /home/Oracle/Middleware/wlserver_10.3/server/bin/
./startNodeManager.sh
3) Start Managed WebLogic
cd /home/Oracle/Middleware/user_projects/domains/bifoundation_domain/bin/
./startManagedWebLogic.sh bi_server1
4) Set up OBIEE Components
cd /home/Oracle/Middleware/instances/instance1/bin/
./opmnctl startall
The result is:
opmnctl startall: starting opmn and all managed processes...
================================================================================
opmn id=JustiziaInf.mmmmm.mmmmm.9999
Response: 4 of 5 processes started.
ias-instance id=instance1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ias-component/process-type/process-set:
coreapplication_obisch1/OracleBISchedulerComponent/coreapplication_obisch1/
Error
--> Process (index=1,uid=1064189424,pid=4396)
failed to start a managed process after the maximum retry limit
Log:
/home/Oracle/Middleware/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/
coreapplication_obisch1/console~coreapplication_obisch1~1.log
5) Check the status of components
cd /home/Oracle/Middleware/instances/instance1/bin/
./opmnctl status
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component | process-type | pid | status
---------------------------------+--------------------+---------+---------
coreapplication_obiccs1 | OracleBIClusterCo~ | 8221 | Alive
coreapplication_obisch1 | OracleBIScheduler~ | N/A | Down
coreapplication_obijh1 | OracleBIJavaHostC~ | 8726 | Alive
coreapplication_obips1 | OracleBIPresentat~ | 6921 | Alive
coreapplication_obis1 | OracleBIServerCom~ | 7348 | Alive
Read the log files from /home/Oracle/Middleware/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/
coreapplication_obisch1/console~coreapplication_obisch1~1.log.
I would recommend trying the the steps in the below link as this is a common issue when upgrading OBIEE.
http://www.askjohnobiee.com/2012/11/fyi-opmnctl-failed-to-start-managed.html
Not sure, what your log says, but try these below steps and check if it works or not
Login as superuser
cd $ORACLE_HOME/Apache/Apache/bin
chmod 6750 .apachectl
logout and login as ORACLE user
opmnctl startproc process-type=OracleBIScheduler

Resources