Oozie coordinator app, how to configure action triggered by external data source? - oozie

I would like to run a job every time when a external data source is updated, for example, some government file is update, http://www.ic.gc.ca/folder/filename.zip. Is there way of doing it?
Please provide some code examples with external URL data source.

Related

Control-M: SMART Folder - get information of inside job

I use SMART Folder to get email notifications if jobs inside that folder changed their status to failed.
When an email notification is sent I need to get the name of the failing job inside the SMART folder.
Is there a way to get information about failed jobs inside SMART folder via some variables?
I tried %%SCHEDTAB and %%JOBNAME but this only relates to the SMART folder and not the failing job inside.
On Do Action in SMART folder
Monitoring view example of SMART table with failed job
You can set the alert to go from individual jobs instead. I often just have a standardised post-proc panel for the job that generates an alert using local variables, e.g. -
%%JOBNAME failed on %%NODEID with code = %%COMPSTAT.. This is a %%APPLGROUP job running on the %%APPLIC system.

ADF Pipeline Bulk copy activity to Log files

I have a bulk copy template to azure blob storage data transfer set up in ADF. This activity will dynamically produce 'n' number of files.
I need to write log file (txt format) after pipeline activity completed finished.
The log file should have pipeline start & completion datetime and also number of files outputted, status etc.
What is the best way or to choose the activity to do this?
Firstly,i have to say that ADF won't generate log files about the execution information automatically. You could see Visually monitor and Programmatically monitor for activities in ADF.
In above link, you could get the start time of pipeline: Run Start.Even though it does not have any Run End, you could calculate by yourself: Run End = Run Start + Duration.
As for the number of files, please refer to this link.
Anyway,all these metrics need to be got programatically i think,you could choose the language you are good at.

How to reschedule a coordinator job in OOZIE without restarting the job?

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

Any code can trigger a batching override action?

I am working on a project which will batching some 834 records in file.
I setup the batching trigger as when the record count reaches a number, a batch file will release. But I also want release a batch even the record count is not reached (for example, every night, release all queueing record as a final file).
I know it can be done by click the override button in Batch Configuration window, but it need be done automatically.
So, basically, my question is, what did BizTalk do when I clicked the override button? Does BizTalk prove anyway to let me do that in a program?
I must say I did not try to send a controlmessage to a batch setting as release per record count, if you know this works, please let me know.
You're almost there and to complete the process isn't that difficult.
Leave the Batch configuration at the records count as it is.
Then, setup a process where an External Release trigger is sent at the appropriate time. A Windows Scheduled Task is a viable option, it can copy a file to a File Receive Location.
This article describes how to create the trigger message: http://msdn.microsoft.com/en-us/library/bb246108.aspx

Automatically fetch data every 10 minute (Simple html dom)

im working on a project, where i want to fetch last minute flights and then save them into my database. The problem is that i don't want scrape everytime the user visits the website and then save into my database because that will only cause alot of duplicates. Can i somehow make the website fetch the data for me on a scheduled time and then delete previous records in the database?
If you want the OS to execute a task periodically, cron job is what you want.
Either get the cron job to call your program via the command line, or use wget to fetch the page that would trigger the data fetching.
More on cron jobs:
http://www.thesitewizard.com/general/set-cron-job.shtml

Resources