Oozie editor in HUE doesn't show workflows I submited by oozie command - oozie

I submited a workflow and run the job by oozie command:
oozie job -oozie http://node1:11000/oozie -config job.properties -submit
oozie job -oozie http://node1:11000/oozie -start [job_id]
it worked well.
And I wanted to edit the workflow by oozie editor in HUE but couldn't find it. What shall I do to make the workflow shown in oozie editor?
CDH version: 5.9

I would not recommend o use HUE for editing workflows. As it uses set of metadata stored in DB and owned by itself it will not allow you to introduce changes into workflow submitted from the outside.
Hue is pretty nice tool for prototyping and monitoring the workflow/coordinator progress, checking the logs and having a first view in case of any issue investigation. You can import your xml and allow editing via GUI - http://blog.cloudera.com/blog/2013/03/how-to-import-a-pre-existing-oozie-workflow-into-hue/ , but this is hard to manage and maintain in production like environments. For example running multiple instances of the workflow with various parameters.

Related

Editing Oozie Workflow after submitting through Command Line

After running an oozie workflow using the command line I am unable to edit it using the Hue Workflow editor or even find it in the list of workflows.
I have an R script that generates the workflow.xml and job.properties, and will run the commands necessary to run the scripts:
workflow.path <- "workflow.xml" # Hard coded for the sake of this example
system2("hadoop", args = c("fs -put -f ", workflow.path, "/User/service/Test/" ))
system("oozie job --oozie http://localhost:11000/oozie -config job.properties -run")
Moving the workflow into HDFS works fine, and I have verified it is a valid workflow using oozie. Running the job also works like a charm, however if I open up Hue, and navigate to the Workflow and find it, I cannot edit it, only rerun it.
Some background on what I am trying to do: We have a large amount of automated workflows and we are always adding more. They all follow the same pattern as well so automating the creation of the coordinator and workflows is simple. Sometimes these workflows have to modified by people and they need to be able use the web interface.
Any help would be appreciated.
Indeed, only workflows created via the Drag&Drop Editor can be edited.
Workflows submitted via the CLI can only be visualized.

Apache oozie under Hue

I need a step-by-step documentation to set up a workflow scheduler with oozie under hue with the configuration or parameterization steps.
I have a school project: "Workflows for the Big Data" or one asks me to use oozie for the scheduling of tasks in hadoop I do not know at all. After searching on the internet (site of apache oozie, site of hue and documentation) and in some books I do not find satisfactory result.
However I defined some job in xml files but when I try to systematically oozie it kills the job. I'm new to the forum and big data

How to trigger an oozie workflow job with Bamboo?

I am new to bamboo. I know in general how to trigger an oozie workflow job in CDH env. Could someone please suggest some good documentation which describes this?
In Bamboo I have just created a plan which does the code build pointing to my repository each time I check in. Now I need to know - how can I trigger a workflow job from bamboo?
I understand that this should be some kind of command which needs to trigger from bamboo to execute. Please, suggest
You could use oozie command line interface from script execution step.
Install oozie client on the machine with Bamboo (or on any other and ssh it). On CDH cluster machines it should be preinstalled.
Create bamboo script task (Details: Bamboo Script)
Run start job command in script task, e.g. oozie job -oozie http://localhost:8080/oozie -start 14-20090525161321-oozie-joe. See details on cli usage here

Oozie import workflow from command line (Cloudera)

I am attempting to create oozie workflows from templates that can be pushed into a workspace.
I have the process in place as far as getting the workflow into the right place in HDFS and if I call
oozie job -oozie http://hostname:11000/oozie -config config.xml -submit
then call the -start command on the oozie commmand line with the returned job Id the job runs successfully.
Ideally, I'd like the workflow definition to now be available in the workflow manager. This can be done through Hue using the import button on the workflow manager dashboard (http://hostname:8888/oozie/list_workflows/) but I want to import the definition from a script.
Is this possible and if so how do I do it?

Can oozie control jobs outside of Hadoop?

From documentation, it isn't very clear whether oozie can schedule and control jobs outside of Hadoop? Can someone shed some light on this? If not, is there any open source based workflow engine which can do that?
Try consider using chronos (from airbnb) advanced version of cron with a UI, built on top of mesos. airbnb.github.com/chronos/
Cheers.
I believe no. Because Oozie itself does not have a resource management policy, all it does is submitting jobs to Hadoop's job tracker at the right time. Besides, for each Oozie workflow, there will be one launcher job which is responsible for submitting the real jobs in the workflow to Hadoop. The launcher job is itself a Hadoop job. So, I think for the versions earlier than Oozie 3.2, the answer should be no.
You might consider trying azkaban by linked in. It was specifically built for hadoop. But unix commands can be specified in the job file of azkaban. So you may develop a workflow for any application(s) that can be run using command line.
I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. It can run any Java POJO as well as running shell processes, so you can kick off a bash script or whatever.

Resources