How to trigger an oozie workflow job with Bamboo? - oozie

I am new to bamboo. I know in general how to trigger an oozie workflow job in CDH env. Could someone please suggest some good documentation which describes this?
In Bamboo I have just created a plan which does the code build pointing to my repository each time I check in. Now I need to know - how can I trigger a workflow job from bamboo?
I understand that this should be some kind of command which needs to trigger from bamboo to execute. Please, suggest

You could use oozie command line interface from script execution step.
Install oozie client on the machine with Bamboo (or on any other and ssh it). On CDH cluster machines it should be preinstalled.
Create bamboo script task (Details: Bamboo Script)
Run start job command in script task, e.g. oozie job -oozie http://localhost:8080/oozie -start 14-20090525161321-oozie-joe. See details on cli usage here

Related

How to watch tasks logs via CLI in Airflow?

So I am having a problem: there are no logs displaying in Airflow UI I am currently working with, I don't know the reason, but I've already informed my colleagues about it and they're looking for a solution.
Meanwhile I need to watch logs of certain tasks of my Dag. Is there any way to do it via airflow CLI?
I am using airflow tasks run command but it only seems to run tasks, but doesn't show a thing in command line.
By default Airflow should store your logs at $AIRFLOW_HOME/logs/ maybe you'll find them there, if they are still generated.
In the meantime you can use airflow tasks test <dag_id> <task_id> this tests a specific task and displays the logs in your terminal.

Using Airflow to Run .bat file or PowerShell program located in remote Windows Box

Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.
Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm

How to configure oozie shell action to run on all nodes

I have to make oozie shell action to run on all nodes for e.g creating parent directory for logs on local directory.
Thanks in advance!
It is not possible, as far as I know.
But you can try the below approaches initially proposed here:
MapReduce action can run on all nodes, but requires Java application. link
hadoop Streaming + MapReduce shell scripts. link; You can launch it as ssh or shell action in oozie

Oozie editor in HUE doesn't show workflows I submited by oozie command

I submited a workflow and run the job by oozie command:
oozie job -oozie http://node1:11000/oozie -config job.properties -submit
oozie job -oozie http://node1:11000/oozie -start [job_id]
it worked well.
And I wanted to edit the workflow by oozie editor in HUE but couldn't find it. What shall I do to make the workflow shown in oozie editor?
CDH version: 5.9
I would not recommend o use HUE for editing workflows. As it uses set of metadata stored in DB and owned by itself it will not allow you to introduce changes into workflow submitted from the outside.
Hue is pretty nice tool for prototyping and monitoring the workflow/coordinator progress, checking the logs and having a first view in case of any issue investigation. You can import your xml and allow editing via GUI - http://blog.cloudera.com/blog/2013/03/how-to-import-a-pre-existing-oozie-workflow-into-hue/ , but this is hard to manage and maintain in production like environments. For example running multiple instances of the workflow with various parameters.

Can oozie control jobs outside of Hadoop?

From documentation, it isn't very clear whether oozie can schedule and control jobs outside of Hadoop? Can someone shed some light on this? If not, is there any open source based workflow engine which can do that?
Try consider using chronos (from airbnb) advanced version of cron with a UI, built on top of mesos. airbnb.github.com/chronos/
Cheers.
I believe no. Because Oozie itself does not have a resource management policy, all it does is submitting jobs to Hadoop's job tracker at the right time. Besides, for each Oozie workflow, there will be one launcher job which is responsible for submitting the real jobs in the workflow to Hadoop. The launcher job is itself a Hadoop job. So, I think for the versions earlier than Oozie 3.2, the answer should be no.
You might consider trying azkaban by linked in. It was specifically built for hadoop. But unix commands can be specified in the job file of azkaban. So you may develop a workflow for any application(s) that can be run using command line.
I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. It can run any Java POJO as well as running shell processes, so you can kick off a bash script or whatever.

Resources