How to configure oozie shell action to run on all nodes - oozie

I have to make oozie shell action to run on all nodes for e.g creating parent directory for logs on local directory.
Thanks in advance!

It is not possible, as far as I know.
But you can try the below approaches initially proposed here:
MapReduce action can run on all nodes, but requires Java application. link
hadoop Streaming + MapReduce shell scripts. link; You can launch it as ssh or shell action in oozie

Related

Using Airflow to Run .bat file or PowerShell program located in remote Windows Box

Currently some of the jobs are running in different Windows VM's.
for eg.,
Task Scheduler to run
Powershell files
.bat files
python files.
Sql Agent jobs
To run SSIS packages
We are planning to use Airflow to trigger all these jobs to have better visibility and manage dependencies.
Our Airflow in Ubuntu.
I would like know if there is any way to trigger above mentioned jobs in Windows via Airflow.
Can I get some examples on how to achieve my objectives? Please suggest what packages/libraries/plugins/operators I can use.
Yes there is. I would start by looking into the winrm operator and hook that you find in under Microsoft in providers:
http://airflow.apache.org/docs/apache-airflow-providers-microsoft-winrm/stable/index.html
and maybe also:
https://github.com/diyan/pywinrm

Is it possible to run a unix script using oozie outside hadoop cluster?

We have written a unix batch script and it is hosted on a unix server outside Hadoop Cluster. So is it possible to run that script via oozie?
If it is possible then how can this be achieved?
What is the script doing? If the script just needs to run regulary you can as well use a cronjob or something like that.
Besides this, Oozie has a action for SSH Actions on Remote hosts.
https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html
Maybe you can work something out with that by loging into the remotehost, run the script, wait for completetion and work on from there.

How to trigger an oozie workflow job with Bamboo?

I am new to bamboo. I know in general how to trigger an oozie workflow job in CDH env. Could someone please suggest some good documentation which describes this?
In Bamboo I have just created a plan which does the code build pointing to my repository each time I check in. Now I need to know - how can I trigger a workflow job from bamboo?
I understand that this should be some kind of command which needs to trigger from bamboo to execute. Please, suggest
You could use oozie command line interface from script execution step.
Install oozie client on the machine with Bamboo (or on any other and ssh it). On CDH cluster machines it should be preinstalled.
Create bamboo script task (Details: Bamboo Script)
Run start job command in script task, e.g. oozie job -oozie http://localhost:8080/oozie -start 14-20090525161321-oozie-joe. See details on cli usage here

oozie reading and writing in hdfs as mapred user

I am running a python script in oozie workflow. The python script reads file from hdfs manipulate and write it back to hdfs in new folder. I am not getting any error while running the oozie workflow. But manipulated data is not written in hdfs. I do see that new folder by default has the user a mapred. I am not much sure whether this is related to mapred user. I am running the oozie workflow as hdfs user. The python script when ran from shell script it runs successfully and gives the expected result.
Any help would be appreciated.
Thanks!
Oozie would be running your script as hdfs user.
Try logging in using hdfs and then browse HDFS to view your output folder.

Can oozie control jobs outside of Hadoop?

From documentation, it isn't very clear whether oozie can schedule and control jobs outside of Hadoop? Can someone shed some light on this? If not, is there any open source based workflow engine which can do that?
Try consider using chronos (from airbnb) advanced version of cron with a UI, built on top of mesos. airbnb.github.com/chronos/
Cheers.
I believe no. Because Oozie itself does not have a resource management policy, all it does is submitting jobs to Hadoop's job tracker at the right time. Besides, for each Oozie workflow, there will be one launcher job which is responsible for submitting the real jobs in the workflow to Hadoop. The launcher job is itself a Hadoop job. So, I think for the versions earlier than Oozie 3.2, the answer should be no.
You might consider trying azkaban by linked in. It was specifically built for hadoop. But unix commands can be specified in the job file of azkaban. So you may develop a workflow for any application(s) that can be run using command line.
I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. It can run any Java POJO as well as running shell processes, so you can kick off a bash script or whatever.

Resources