I am attempting to create oozie workflows from templates that can be pushed into a workspace.
I have the process in place as far as getting the workflow into the right place in HDFS and if I call
oozie job -oozie http://hostname:11000/oozie -config config.xml -submit
then call the -start command on the oozie commmand line with the returned job Id the job runs successfully.
Ideally, I'd like the workflow definition to now be available in the workflow manager. This can be done through Hue using the import button on the workflow manager dashboard (http://hostname:8888/oozie/list_workflows/) but I want to import the definition from a script.
Is this possible and if so how do I do it?
Related
I was trying to run a R script on Oozie.
Though the R script was triggered, the Oozie job kept printing Heartbeat in the stdout file, without proceeding further into the R-code.
What may have caused it? Also, how to avoid the same?
Edit :
The script was supposed to read data from HDFS and a series of R scripts were supposed to be operating on the data to generate a json output.
The script was triggered by the oozie workflow from command line.
This oozie job is further planned to be exposed as an API.
I have tried changing the scheduler to fair scheduler as mentioned in this post, but still it didnt work.
I have to make oozie shell action to run on all nodes for e.g creating parent directory for logs on local directory.
Thanks in advance!
It is not possible, as far as I know.
But you can try the below approaches initially proposed here:
MapReduce action can run on all nodes, but requires Java application. link
hadoop Streaming + MapReduce shell scripts. link; You can launch it as ssh or shell action in oozie
After running an oozie workflow using the command line I am unable to edit it using the Hue Workflow editor or even find it in the list of workflows.
I have an R script that generates the workflow.xml and job.properties, and will run the commands necessary to run the scripts:
workflow.path <- "workflow.xml" # Hard coded for the sake of this example
system2("hadoop", args = c("fs -put -f ", workflow.path, "/User/service/Test/" ))
system("oozie job --oozie http://localhost:11000/oozie -config job.properties -run")
Moving the workflow into HDFS works fine, and I have verified it is a valid workflow using oozie. Running the job also works like a charm, however if I open up Hue, and navigate to the Workflow and find it, I cannot edit it, only rerun it.
Some background on what I am trying to do: We have a large amount of automated workflows and we are always adding more. They all follow the same pattern as well so automating the creation of the coordinator and workflows is simple. Sometimes these workflows have to modified by people and they need to be able use the web interface.
Any help would be appreciated.
Indeed, only workflows created via the Drag&Drop Editor can be edited.
Workflows submitted via the CLI can only be visualized.
I submited a workflow and run the job by oozie command:
oozie job -oozie http://node1:11000/oozie -config job.properties -submit
oozie job -oozie http://node1:11000/oozie -start [job_id]
it worked well.
And I wanted to edit the workflow by oozie editor in HUE but couldn't find it. What shall I do to make the workflow shown in oozie editor?
CDH version: 5.9
I would not recommend o use HUE for editing workflows. As it uses set of metadata stored in DB and owned by itself it will not allow you to introduce changes into workflow submitted from the outside.
Hue is pretty nice tool for prototyping and monitoring the workflow/coordinator progress, checking the logs and having a first view in case of any issue investigation. You can import your xml and allow editing via GUI - http://blog.cloudera.com/blog/2013/03/how-to-import-a-pre-existing-oozie-workflow-into-hue/ , but this is hard to manage and maintain in production like environments. For example running multiple instances of the workflow with various parameters.
I am new to bamboo. I know in general how to trigger an oozie workflow job in CDH env. Could someone please suggest some good documentation which describes this?
In Bamboo I have just created a plan which does the code build pointing to my repository each time I check in. Now I need to know - how can I trigger a workflow job from bamboo?
I understand that this should be some kind of command which needs to trigger from bamboo to execute. Please, suggest
You could use oozie command line interface from script execution step.
Install oozie client on the machine with Bamboo (or on any other and ssh it). On CDH cluster machines it should be preinstalled.
Create bamboo script task (Details: Bamboo Script)
Run start job command in script task, e.g. oozie job -oozie http://localhost:8080/oozie -start 14-20090525161321-oozie-joe. See details on cli usage here