Restart Autosys job when terminated via another Autosys job - autosys

I am setting up an Autosys job to restart another job when the main job is terminated.
Insert_job: Job_Restarter
Job_type: cmd
Condition: t(main_job)
Box_name: my_test_box
Permission: gx,get
Command: send -E FORCE_STARTJOB -J main_job
When the main job is terminated, the restart job runs but fails and I get an error code of 1. I know this is a generic error code, but dose anyone have an idea of what I am doing wrong?
Edit:
Did some digging. "Sendevent" is not recognized as a command. Is there another way to restart the job through another job?

Related

Error while running a slurm job through crontab which uses Intel MPI

I am trying to run WRF (real.exe, wrf.exe) through the crontab using compute nodes but compute nodes are not able to run slurm job. I think there is some issue with the MPI library when it's running through the cron environment.
Tried to replicate the terminal path variables while running the crontab job.
The log file generated while running job on terminal and crontab is attached as with_terminal and with_crontab respectively in the link.
https://drive.google.com/drive/folders/1YE9OchSB8alpZSdRl-8uIbBPm6lI-0DQ
Error while running the job from crontab is as follows
#################################
compute-dy-c5n18xlarge-1
#################################
Processes 4
[mpiexec#compute-dy-c5n18xlarge-1] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on compute-dy-c5n18xlarge-1 (pid 23070, exit code 65280)
[mpiexec#compute-dy-c5n18xlarge-1] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec#compute-dy-c5n18xlarge-1] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec#compute-dy-c5n18xlarge-1] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event
[mpiexec#compute-dy-c5n18xlarge-1] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1938): error setting up the boostrap proxies
Thanks for looking in the issue.

Docker: Permission denied when running rocker/shiny-verse and rocker/shiny images

I'm trying to run Shiny Server on an EC2 instance running Ubuntu.
Following this page, I run this command: docker run --rm -p 3838:3838 rocker/shiny.
I get the following warning and error:
s6-supervise shiny-server: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Exec format error
After that, those two outputs just repeat every 10 seconds. I can't even kill the process, so I have to close the terminal and start another one.
I'm new to Docker and have no real clue how to proceed from here. Any help would be appreciated.
Updates:
At least I've tracked down what s6-supervise is: part of a set of utilities revolving around process supervision and management, logging, and system initialization.

how to trigger AutoSys insert_job and sendevent stored in jil file in one shot?

I'm very new to AutoSys jobs and I have following commands stored in single jil file. let's call it, test.jil.
insert_job: job_A
command: echo 'mock'
description : mock job A
sendevent -E JOB_ON_ICE -J job_A
I'm trying to run jil < test.jil. it doesn't recognize sendevent. How can i get it working ?
In jil file we can write commands like insert_job,
delete_job,update_job but sendevent is different command which should be triggered by autosys agent.
So you can separately create executable file in which you can write that sendevent command and execute it through CLI.
Thanks.
Actually there is a change in one of the last service packs. For your JIL you can specify status:
insert_job: test_job2
command:dir
machine:localhost
status:on_ice
The valid parms are:
FAILURE, INACTIVE, ON_HOLD, ON_ICE, ON_NOEXEC, SUCCESS, or TERMINATED.

nohup command in submitting jobs to cluster

I am trying to submit a job to a cluster that may take up to a few days. Usually, for a shorter job, I simply do qsub Arun1_scr and then wait for the job to finish while monitoring its status with qstat. the Arun_scr is a basic script. If I want to be able to exit the shell and maybe even turn off the computer while the job is being done on the cluster all I have to do is nohup qsub Arun1_scr?
Thank you!
If you submit your job using qsub Arun1_scr you can exit the shell and it will still continue to run on the cluster. So, you do not need to change anything.
If you use nohup command and you want to run it into background the syntax is nohup command-name & ( without & your job will not be run in background and will be stopped after you close the shell).

Unable to stop the cron job

I have a cron job in cronjob.txt as follows
* * * * * nohup sh cronScheduleInit.sh >> cronlog.txt &
and ran it using command,
crontab cronjob.txt
After my testing ,i deleted the cron job entry using following command,
crontab -e
and when display the list of jobs using
crontab -l
showing no entries but still the cron job is running, i mean it is generating the entries in log file. Even i commented the job entry in cronjob.txt file
Also, tried deleting cron job and listed the jobs. its showing no cron jobs but still the log is running...
crontab -r
What to do.. Please help!!!!
Process can be find out using command ps aux. So check
ps aux|grep crontab #or
ps aux|grep cronjob
Then you will get something like
user 29587 2.0 1.1 748804 88968 pts/31 Sl+ Mar04 19:55 grunt
This result refers for service grunt.You have to search crontab or cronjob
Then kill process using process id
Here:
sudo kill -9 29587
Format
sudo kill -9 <process_id>

Resources