how to make a dummy job to run for 30 minutes in controlm - control-m

I have the requirement to run dummy jobs for 30 minutes and 60 minutes respectively.
I have tried with --delay 30 in command line jobs, but I did not get the expected delay.

Designating a job as type ‘dummy’ will bypass anything contained within the command line field.
You have two options to create a 30/60minute timer job.
Option a:
Make the job a command line type job and put sleep 1800 or sleep 3600 in the command line field.
Option b:
Make the job a dummy type job and put sleep 1800 or sleep 3600 in either the pre-execution or post-execution fields.
By default the sleep command operates on seconds. For windows you may want to look into using the power shell version which would be powershell.exe -command start-sleep 1800

Use _sleep over sleep instead
Another way to enable a waiting time, either before or after an OS-type Job is by using the pre-execution or post-execution command options, as appropriate.
The use of _sleep is more convenient because it is operating system independent and is provided by the Control-M/Agent, which means that you do not require an extra deployment for that functionality.

Related

Is there a way to make a Denodo 8 VDP scheduler job WAIT() for a certain amount of time?

I want to make a VDP scheduler job in Denodo 8 wait for a certain amount of time. The wait function in the job creation process is not working as expected so I figured I'd write it into the VQL. However when i try the suggested function from the documentation (https://community.denodo.com/docs/html/browse/8.0/en/vdp/vql/stored_procedures/predefined_stored_procedures/wait) the Denodo 8 VQL shell doesn't recognize the function.
--Not working
SELECT WAIT('10000');
Returns the following error:
Function 'wait' with arity 1 not found
--Not working
WAIT('10000');
Returns the following error:
Error parsing command 'WAIT('10000')'
Any suggestions would be much appreciated.
There are two ways of invoking WAIT:
Option #1
-- Wait for one minute
CALL WAIT(60000);
Option #2:
-- Wait for ten seconds
SELECT timeinmillis
FROM WAIT()
WHERE timeinmillis = 10000;

How to make zsh's REPORTTIME work? (time for long-running commands)

This is my .zshrc:
export REPORTTIME=3
When I run sleep 4 it doesn't output anything.
If I change to REPORTTIME=blablabla (or anything non-sensical) it doesn't raise an error and starts behaving as REPORTTIME=0, i.e. returning the time taken for everything.
Interestingly, if I try REPORTTIME=3s I get the following message:
zsh: bad math expression: operator expected at `s'
sleep 4 0.00s user 0.00s system 0% cpu 4.004 total
So I get the error and still the output.
I tried RERPORTTIME="3" and even REPORTTIME=1+2. None of these work.
Also, if I run python -c "import time; time.sleep(4)" I get the same results (so the problem is not with sleep).
Of course, I tried other values too (other than 3).
I'm running MacOS with iterm2 and zsh is my default shell.
You need to set it explicitly to a non-negative integer representing seconds; i.e.
% REPORTTIME=3
Setting to other non-negative values does not work on my Zsh v5.4.2
either. Running something like a system update (e.g., yaourt) then
acts as if I had put time on the front of it. Pretty slick!
So you need a command that eats some user/system time; sleep does
not. Although total elapsed is long enough, user and system time is
not:
% time sleep 3
sleep 3 0.00s user 0.00s system 0% cpu 3.002 total
Also, no need to export this since it's directly used by Zsh.
You can undo/turn off this behavior with:
% unset REPORTTIME
Docs on REPORTTIME from man zshparam:
If nonnegative, commands whose combined user and system execution times (measured in seconds) are greater than this value have timing statistics printed for them. Output is suppressed for commands executed within the line editor, including completion; commands explicitly marked with the time keyword still cause the summary to be printed in this case.

Using MPI on Slurm, is there a way to send messages between separate jobs?

I'm new to using MPI (mpi4py) and Slurm. I need to run about 50000 tasks, so to obey the administrator-set limit of about 1000, I've been running them like this:
sbrunner.sh:
#!/bin/bash
for i in {1..50}
do
sbatch m2slurm.sh $i
sleep 0.1
done
m2slurm.sh:
#!/bin/bash
#SBATCH --job-name=mpi
#SBATCH --output=mpi_50000.out
#SBATCH --time=0:10:00
#SBATCH --ntasks=1000
srun --mpi=pmi2 --output=mpi_50k${1}.out python par.py data_50000.pkl ${1} > ${1}py.out 2> ${1}.err
par.py (irrelevant stuff omitted):
offset = (int(sys.argv[2])-1)*1000
comm = MPI.COMM_WORLD
k = comm.Get_rank()
d = data[k+offset]
# ... do something with d ...
allresults = comm.gather(result, root=0)
comm.Barrier()
if k == 0:
print(allresults)
Is this a sensible way to get around the limit of 1000 tasks?
Is there a better way to consolidate results? I now have 50 files I have to concatenate manually. Is there some comm_world that can exist between different jobs?
I think you need to make your application divide the work among 1000 tasks (MPI ranks) and consolidate the results after that with MPI collective calls i.e. MPI_Reduce or MPI_AllReduce calls.
trying to work around the limit won't help you as the jobs you started will be queued one after another.
Jobs arrays will give similar behavior like what you did in the batch file you provided. So still your application must be able to processes all data items given only N tasks(MPI ranks).
No need to pool to make sure all other jobs are finished take a look at slurm job dependency parameter
https://hpc.nih.gov/docs/job_dependencies.html
Edit:
You can use job dependeny to make a new job that will run after all other jobs finish and this job will collect the results and merge them into one big file. I still believe you are over thinking the obvious solution make rank 0 (master collect all results and save them to the disk)
This looks like a perfect candidate for job arrays. Each job in an array is identical with the exception of a $SLURM_ARRAY_TASK_ID environment variable. You can use this in the same way that you're using the command line variable.
(You'll need to check that MaxArraySize is set high enough by your sysadmin. Check the output of scontrol show config | grep MaxArraySize )
what do you mean by 50000 tasks ?
do you mean one MPI job with 50000 MPI tasks ?
or do you mean 50000 independant serial programs ?
or do you mean any combination where (number of MPI jobs) * (number of tasks per job) = 5000
if 1), well, consult your system administrator. of course you can allocate 50000 slots in multiple SLURM jobs, manually wait they are all running at the same time and then mpirun your app outside of SLURM. this is both ugly and unefficient, and you might get in trouble if this is seen as an attempt to circumvent the system limits.
if 2) or 3), then job array is a good candidate. and if i understand correctly your app, you will need an extra post processing step in order to concatenate/merge all your outputs in a single file.
and if you go with 3), you will need to find the sweet spot
(generally speaking, 50000 serial program are more efficient than fifty 1000 tasks MPI jobs or one 50000 tasks MPI program but merging 50000 file is less efficient than merging 50 files (or not merging anything at all)
and if you cannot do the post-processing on a frontend node, then you can use job dependency to start it after all the computation have completed

MRJob - Limit Number of Task Attemps

In MyJob, how do you limit the number of task attempts (if a task fails)?
I have long running tasks (have increased the timeout, accordingly), but I want the job to end after 2 failed attempts at the same task, rather than 4-5.
I couldn't find anything like this in the docs:
http://mrjob.readthedocs.org/en/latest//en/latest/guides/configs-reference.html
For map jobs, you can set mapreduce.map.maxattempts in Hadoop 2. For reduce jobs, set mapreduce.reduce.maxattempts (source).
The equivalents in Hadoop 1 are: mapred.map.max.attempts and mapred.reduce.max.attempts.
If you are using a conf file in MRJob, you can set this as:
runners:
emr:
jobconf:
mapreduce.map.maxattempts: 2

Bind query resolution time in munin

Is it possible to graph the query resolution time of bind9 in munin?
I know there is a way to graph it in a unbound server, is it already done in bind? If not how do I start writing a munin plugin for that? I'm getting stats from http://127.0.0.1:8053/ in the bind9 server.
I don't believe that "query time" is a function of BIND. About the only time that I see that value (with individual lookups) is when using dig. If you're willing to use that, the following might be a good starting point:
#!/bin/sh
case $1 in
config)
cat <<'EOM'
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
EOM
exit 0;;
esac
echo -n "time.value "
dig www.redhat.com|grep Query|cut -d':' -f2|cut -d\ -f2
Note that there's two spaces after the "-d\" in the second cut statement. If you save the above as "querytime" and run it at the command line, output should look something like:
root#pi1:~# ./querytime
time.value 189
root#pi1:~# ./querytime config
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
I'm not sure of the value in tracking the above though. The response time can be affected: if the query is an initial lookup, if the answer is cached locally, depending on server load, depending on intervening network congestion, etc.
Note: the above may be a bit buggy as I've written it on the fly, but it should give you a good starting point. That it returned the above output is a good sign.
In any case, recommend reading the following before you write your own: http://munin-monitoring.org/wiki/HowToWritePlugins

Resources