How to reschedule a coordinator job in OOZIE without restarting the job? - oozie

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.

You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

Related

OOZIE stuck in RUNING status

I use OOZIE to run a workflow. But a simple official example shell-wf (echo hello oozie) stuck in RUNNING state and never end. The workflow can be submitted but stuck at RUNNING state. There is not any error in job log in OOZIE UI.
When submitting a shell with spark-submit inside, the job will be never submitted and can not be seen in Spark UI. I suspect the shell didn't run at all.
What's the possible problem?
A Quick Checklist
For those who have the same problem, there is a checklist to check your system. Hope it helps!
Check jobTracker in your Oozie configuration. Note: If a job has been successfully run, it probably not the problem of jobTracker. Related discussion can be found here
Check your disk usage. If ## Heading ##disk usage is greater than 90%, remove some files to make sure disk usage is less than 90%. (That's my case!)
Check Console URL of the stuck action. It can be found in Job - Job Info tab - Actions - Action - Action Info tab. Job state here may help you to find the problem.
Check Oozie log. It's typically in /usr/local/oozie/logs. Check oozie.log* to find if there are exceptions.
Details
Disk usage
If your action state is
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
That may be the disk problem. Relative discussion can be found in MapReduce job hangs, waiting for AM container to be allocated. Solutions can be found in Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?.

Symfony - background task from form setup

Would you know how to run a background task on Symfony 4, based on the setup of a form ? This would avoid that the user has to remain on the form until the task is finished.
The idea would be that when the form is validated, it starts an independant background task. Then the user can continue its navigation and come back once the task is finished to get the results.
Thanks for your help,
You need to use pattern Message Bus. Symfony has own implementation of this pattern since version 4.1 introducing Messenger Component.
You can see documentation here: https://symfony.com/doc/current/components/messenger.html
To get it work you need some external program that will implement AMQP protocol. Most popular in PHP world IMHO RabbitMQ.
A very simple solution for this could be the following procedure:
Form is valid.
A temporary file is created.
Cronjob gets executed every five minutes and starts a symfony command.
The command checks if the file exists and if it's empty.
If so, the command works of the background task. But before this, the command write it's process id in the file to prevent from beeing excuted a second time.
Remove the file when the command has finished.
As long as the file exists you can show a hint for the user that the task is running.

Autosys Email Generation if the job runs more than the time specified

Currenty I have requirement in my enviroment for the autosys email notifition.
Requirement: If the job runs more than the specified time it should trigger an email.
What I am trying is using max_run_alam, but I am not successful.
Lets say i have a job that runs for 10mins(lets say the time as 10.00). i set max_run_alarm as 3. i should get an email at 10.03 where i can goahead and see why the job is running more than the max_run_alarm. if i use max_run_alarm i am able to see in the logs triggering that alarm, but I cannot spend all day monitoring the logs to see which job is taking long as i have many jobs. my question is am i using max_run_alarm in the correct way or is there something else i am missing or is there entirely different way for the emails to generate.
pls advise.
We are using autosys R11 at work. I believe the triggering of emails is already automated in higher versions of autosys, but, in our version, to send automatic emails after a certain time, we create two extra autosys jobs. One autosys job starts at the same time as the job you want to "monitor". This job contains a 'sleep' command. (in your example, the command would be "sleep 180" for the job to run for 3 minutes until completion). The second extra job is the sending of the email and only starts after successful completion of the sleep-job.
To prevent the mail from being send every time the autosys box starts, you have to add your first job as BOX_SUCCESS condition. The sleep-job will run to completion, but the mail-job went from the "ACTIVATED" state to the "INACTIVE" state because the autosys box isn't RUNNING anymore.

Any code can trigger a batching override action?

I am working on a project which will batching some 834 records in file.
I setup the batching trigger as when the record count reaches a number, a batch file will release. But I also want release a batch even the record count is not reached (for example, every night, release all queueing record as a final file).
I know it can be done by click the override button in Batch Configuration window, but it need be done automatically.
So, basically, my question is, what did BizTalk do when I clicked the override button? Does BizTalk prove anyway to let me do that in a program?
I must say I did not try to send a controlmessage to a batch setting as release per record count, if you know this works, please let me know.
You're almost there and to complete the process isn't that difficult.
Leave the Batch configuration at the records count as it is.
Then, setup a process where an External Release trigger is sent at the appropriate time. A Windows Scheduled Task is a viable option, it can copy a file to a File Receive Location.
This article describes how to create the trigger message: http://msdn.microsoft.com/en-us/library/bb246108.aspx

Pagodabox or PHPfog + Gearman

All,
I'm looking for a good way to do some job backgrounding through either of these two services.
I see PHPFog supports IronWorks, but i need something more realtime. Through these cloud based PaaS services, I'm not able to use popen(background.php --token=1234). So I'm thinking the best solution, might be to try to kick off a gearman worker to handle the job. (Actually my preferred method would be to use websockets to keep a connection open and receive feedback from the job, rather than long polling a db table through AJAX, but none of these guys support websockets)
Question 1 is, is there a better solution than using gearman to offload the job?
Question 2 is, http://help.pagodabox.com/customer/portal/articles/430779 I see pagodabox supports 'worker listeners' ... has anybody set this up with gearman? Would it work?
Thanks
I am using PagodaBox with a background worker in an application I am building right now. Basically, PagodaBox daemonizes a PHP process for you (meaning it will continually run in the background), so all you really have to do is create a script that checks a database table for tasks to run, runs them, and then sleeps a bit so it's not running too many queries against your database.
This is a simplified version of what I have running:
// Remove time limit
set_time_limit(0);
// Show ALL errors
error_reporting(-1);
// Run daemon
echo "--- Starting Daemon ---\n";
while(true) {
// Query 'work_queue' table for new tasks
// Loop over items and do whatever tasks are associated with them
// Update row to mark task as completed
// Wait a bit
sleep(30);
}
A benefit to this approach is that it's easy to test via CLI:
php tasks.php
You will see all the echo statements come through in console as it's running, and of course it's much easier to do than a more complicated setup with other dependencies like Gearman.
So whenever you add a new task to the table, the maximum amount of time you'll wait for that task to be started in a batch is 30 seconds (or whatever your sleep time is). This is better and preferable to cron jobs, because if you setup a cron job to run every minute (the lowest possible interval) and the work you have to do takes longer than a minute, another cron process will start working on the same queue and you could end up with quite a lot of duplicated task work that could cause a lot of issues that are hard to debug and troubleshoot. So if you instead have either only one background worker that runs all tasks, or multiple background workers that work on different task types, you will never run into this issue.

Resources