All,
I have an issue, how to let bosj errand on existed vms,
I have a vm deployed a mongodb, and i want to run some errand command in the vm, but i don't know how?
Do any one know this?
Director task 2121
Task 2121 done
+---------+---------+---------+--------------+
| VM | State | VM Type | IPs |
+---------+---------+---------+--------------+
| app-p/0 | stopped | app | 10.62.90.171 |
+---------+---------+---------+--------------+
How to run an errand:
bosh run errand ERRAND_NAME [--download-logs] [--logs-dir DESTINATION_DIRECTORY]
ref: http://bosh.io/docs/sysadmin-commands.html#errand
Assuming you have already marked the deployment job 'lifecycle' as 'errand' in your manifest
ref: Jobs Block section of http://bosh.io/docs/deployment-manifest.html
Like Ben Moss mentioned in his comment, a new VM is created for running an errand job. The errand VM does not stay around after the errand completes. Errand is a short lived job and can be run multiple times
Related
I am trying to install three node airflow cluster. Each node has airflow scheduler, airflow worker, airflow webserver, also it has celery, RabbitMQ cluster and Postgres multi master cluster(implemented with Bucardo). Versions of software:
Airflow 2.0.1
Postregsql 13.2
Ubuntu 20.04
pyhton 3.8.5
celery 4.4.7
bucardo 5.6.0
RabbitMQ 3.8.2
And I occur the problem starting airflow scheduler.
When I launch the first one(database is empty), it successfully starts.
But then when I'm launching another scheduler on another machine(I tried to launch on the same machine too), it fails with the following:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "job_pkey"
DETAIL: Key (id)=(25) already exists.
[SQL: INSERT INTO job (dag_id, state, job_type, start_date, end_date, latest_heartbeat, executor_class, hostname, unixname) VALUES (%(dag_id)s, %(state)s, %(job_type)s, %(start_date)s, %(end_date)s, %(latest_heartbeat)s, %(executor_class)s, %(hostname)s, %(unixname)s) RETURNING job.id]
[parameters: {'dag_id': None, 'state': 'running', 'job_type': 'SchedulerJob', 'start_date': datetime.datetime(2021, 4, 21, 7, 39, 20, 429478, tzinfo=Timezone('UTC')), 'end_date': None, 'latest_heartbeat': datetime.datetime(2021, 4, 21, 7, 39, 20, 429504, tzinfo=Timezone('UTC')), 'executor_class': 'CeleryExecutor', 'hostname': 'hostname', 'unixname': 'root'}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
After trying to launch a few times eventually scheduler is working. I am assuming id is incremented and then data is successfully added into database:
airflow=> select * from job order by state;
id | dag_id | state | job_type | start_date | end_date | latest_heartbeat | executor_class | hostname | unixname
----+--------+---------+--------------+-------------------------------+-------------------------------+-------------------------------+----------------+------------------------------+----------
26 | | running | SchedulerJob | 2021-04-21 07:39:22.243721+00 | | 2021-04-21 07:39:22.243734+00 | CeleryExecutor | machine name | root
25 | | running | SchedulerJob | 2021-04-21 07:39:14.515009+00 | | 2021-04-21 07:39:19.632811+00 | CeleryExecutor | machine name | root
There is a warning with log tables as well(If the second and subsequent schedulers successfully started):
WARNING - Failed to log action with (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "log_pkey"
DETAIL: Key (id)=(40) already exists.
I understand why scheduler cannot insert data into table, but how should it work correctly, how to launch multiple schedulers? Official documentation tells no additional configuration required. Hope I explained very clear. Thanks!
Looks like there is a race condition between the Airflow Schedulers and Bucardo.
Probably the easiest way to fix it is to query all servers sequentially with a connection string like this in your airflow.cfg (the same on all nodes):
[core]
sql_alchemy_conn=postgresql://USER:PASS#/DB?host=node1:port1&host=node2B&host=node3
For this to work you'll need sqlalchemy >= 1.3
Why this happens
There is a race condition between your schedulers and bucardo trying to read and write data from the table in different hosts. Changes does not propagate as quickly as they should and server writes to the table fail.
Even if you treat all your nodes as "multimaster", making all nodes look first at the same server will remediate this problem. In case of failure, they will use the second one.
I asked Airflow developers. The problem is in Bucardo since it does not support
'SELECT ... FOR UPDATE' :
I suspect that the problem is with Bucardo, which does not support record locking properly. We have high expectations, because it is a key protection mechanism against running the same task by many schedulers.
http://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#database-requirements
If that doesn't work you will have problems with duplicate keys.
Thanks!
I want to debug the issue of Horizon Dashboard being slow, thereby taking too much of time to load. Any possible hints/references to code or something will indeed be helpful to me.
Waiting for the guided response regarding the same.
Thanks!
I also met this problem, not only the openstack dashboard response slowly, but also the CLI command(Just dashboad response is more slow than it)
the results as below:
*[root#kolla ~]# time neutron port-list | wc -l
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
98
real 0m16.366s
user 0m0.735s
sys 0m0.164s*
*[root#kolla ~]# time nova list | wc -l
25
real 0m10.767s
user 0m0.798s
sys 0m0.121s
[root#kolla ~]# time port list | wc -l*
from dashboard, it would need takes 18 seconds to get all instances of the special tenant.
I was in riak-shell when ssh lost its connection to the server. After reconnecting, I do the following:
sudo riak-shell
and get:
An instance of riak-shell is already running
So, I restarted the riak node in question. This did not seem to solve the problem. I do not see anything using ps -aux to kill. According to the docs, only one instance can run at a time. That makes sense, but when I run riak-shell from another node and try to connect to any node, I now get the following:
Error: invalid function call : connection_EXT:connect ["riak#<<<ip_address_elided>>>"]
You can connect to a specific node (whether in your riak_shell.config
or not) by typing 'connect "dev1#127.0.0.1";' substituting your
node name for dev1.
You may need to change the Erlang cookie to do this.
See also the 'reconnect' command.
Unhandled message received is {#Ref<0.0.0.135>,disconnected}
riak-shell(3)>
I have not changed the cookies during this process, and the cookie appears to be the same (at least in /etc/riak/riak_shell.config). (I am running the Riak TS AMI on AWS.)
riak-shell runs in its own Erlang VM - entirely separate from the riak node
(You don't need to run riak-shell from the machine your node is on - it uses the normal riak-erlang-client to talk to riak)
If you you are on a Linux do ps aux | grep riak_shell_app it will give you the process number you need to kill that instance:
08:30:45:~ $ ps aux | grep riak_shell_app
vagrant 4671 0.0 0.3 493260 34884 pts/4 Sl+ Aug17 0:03 /home/vagrant/riak_ee/dev/dev1/erts-5.10.3/bin/beam.smp -- -root /home/vagrant/riak_ee/dev/dev1 -progname erl -- -home /home/vagrant -- -boot /home/vagrant/riak_ee/dev/dev1/releases/2.1.1/start_clean -run riak_shell_app boot debug_off /home/vagrant/riak_ee/dev/dev1/bin/../log/riak_shell/riak_shell -noshell -config /home/vagrant/riak_ee/dev/dev1/bin/../etc/riak
I wrote a good chunk of it so let me know how you got on:
https://github.com/basho/riak_shell/graphs/contributors
I am using Ironic to help me deploy bare metal in a data center environment using 1U Dell servers. It works very well, I can use Ironic to marshall dozens of servers in the rack, then when I need a bare metal instance (via nova) I just use the flavor associated with those servers and I get one of them. Is there a way I can get a specific one? For example, my servers are numbered from the top, starting with control0, control1 all the way down to control39. So, first I create all of the baremetal servers, introspect them. Then I create a flavor (like below, please forgive the pseudo code) and associate each baremetal server with that profile.
openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 control
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control
i = 0
for each baremetal server's uuid:
ironic node-update server-uuid add name=control$i
i = i + 1
ironic node-update server-uuid add properties/capabilities="profile:control,boot_option:local"
When I loop through the list I know that the servers are in top down physical order. What I would like to be able to do is get nova to create a boot instance on a specific ironic bare metal (like control3). I could create separate flavors for each one but I think there must be a way to select a specific piece of hardware? Or a strategy that would pick them in the order I specify.
I am pretty new to Ironic. I have done quite a bit of googling on the topic but haven't found anything. Here is how I start nova:
nova boot --flavor control --image rhel-server-7.1-x86_64-dvd.iso --nic 'net-id=723e7b11-3e61-481a-827e-e58b369dd28f' mybootinstance1
Which works fine. What I would like to do is have a nova boot line which uses the flavor control, and also the name (control0) or any other property that I can assign to make that machine unique. Something like:
nova boot --flavor control --ironic-instance-name control0 --image rhel-server-7.1-x86_64-dvd.iso --nic 'net-id=723e7b11-3e61-481a-827e-e58b369dd28f' mybootinstance1
This is actually a simplification of the nova pool selection process. I don't want to use a pool, but rather, a specific piece of hardware.
If that isn't possible, is there a big drawback to using 40 flavors to create individual 'pools'?
I think you can use --hint in nova boot to select specific machine in pool.
Preconditions: edit /etc/nova/nova.conf, add 'JsonFilter' in scheduler_default_filters and restart nova-scheduler.Then use nova boot command like this:
nova boot --flavor <flavor> --image <image_id> --nic net-id=<net_id> --hint reservation=<reservation_id> --hint query='["=","$hypervisor_hostname", "<node_uuid>"]' <instance_name>
I'm not quite familiar to this topic, but I'd like to share how to boot an instance to specific host via availability zone.
In my devstack (master) development environment, the procedure is:
$ nova availability-zone-list
+---------------------+----------------------------------------+
| Name | Status |
+---------------------+----------------------------------------+
| internal | available |
| |- fcwszq | |
| | |- nova-conductor | enabled :-) 2015-11-23T06:31:46.000000 |
| | |- nova-cert | enabled :-) 2015-11-23T06:31:41.000000 |
| | |- nova-scheduler | enabled :-) 2015-11-23T06:31:43.000000 |
| | |- nova-network | enabled :-) 2015-11-23T06:31:44.000000 |
| nova | available |
| |- fcwszq | |
| | |- nova-compute | enabled :-) 2015-11-23T06:31:41.000000 |
+---------------------+----------------------------------------+
Note that my environment only gets one compute node whose hostname is fcwszq, but still can be specified as:
nova boot --availability-zone nova:fcwszq --flavor 1 --image c38f0c7e-8ee0-4b0f-8a56-022040b4696f test02
If I specify a non-existent node, for example, nova:non-existent, the instance cannot be created correctly (state is ERROR).
Hope this can help you.
Another way is using host aggregate and flavor metadata to boot instance on a random server in a group, reference: http://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html#d6e21786
I have an OBIEE 11g installation in a Red Hat machine, but I'm finding problems to make it running. I can start WebLogic and its services, so I’m able to enter the WebLogic console and Enterprise Manager, but problems come when I try to start OBIEE components with opmnctl command.
The steps I’m performing are the following:
1) Start WebLogic
cd /home/Oracle/Middleware/user_projects/domains/bifoundation_domain/bin/
./startWebLogic.sh
2) Start NodeManager
cd /home/Oracle/Middleware/wlserver_10.3/server/bin/
./startNodeManager.sh
3) Start Managed WebLogic
cd /home/Oracle/Middleware/user_projects/domains/bifoundation_domain/bin/
./startManagedWebLogic.sh bi_server1
4) Set up OBIEE Components
cd /home/Oracle/Middleware/instances/instance1/bin/
./opmnctl startall
The result is:
opmnctl startall: starting opmn and all managed processes...
================================================================================
opmn id=JustiziaInf.mmmmm.mmmmm.9999
Response: 4 of 5 processes started.
ias-instance id=instance1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ias-component/process-type/process-set:
coreapplication_obisch1/OracleBISchedulerComponent/coreapplication_obisch1/
Error
--> Process (index=1,uid=1064189424,pid=4396)
failed to start a managed process after the maximum retry limit
Log:
/home/Oracle/Middleware/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/
coreapplication_obisch1/console~coreapplication_obisch1~1.log
5) Check the status of components
cd /home/Oracle/Middleware/instances/instance1/bin/
./opmnctl status
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component | process-type | pid | status
---------------------------------+--------------------+---------+---------
coreapplication_obiccs1 | OracleBIClusterCo~ | 8221 | Alive
coreapplication_obisch1 | OracleBIScheduler~ | N/A | Down
coreapplication_obijh1 | OracleBIJavaHostC~ | 8726 | Alive
coreapplication_obips1 | OracleBIPresentat~ | 6921 | Alive
coreapplication_obis1 | OracleBIServerCom~ | 7348 | Alive
Read the log files from /home/Oracle/Middleware/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/
coreapplication_obisch1/console~coreapplication_obisch1~1.log.
I would recommend trying the the steps in the below link as this is a common issue when upgrading OBIEE.
http://www.askjohnobiee.com/2012/11/fyi-opmnctl-failed-to-start-managed.html
Not sure, what your log says, but try these below steps and check if it works or not
Login as superuser
cd $ORACLE_HOME/Apache/Apache/bin
chmod 6750 .apachectl
logout and login as ORACLE user
opmnctl startproc process-type=OracleBIScheduler