Hue - Oozie Job failing - Unable to resolve parameters - oozie

I am using Hue to run my workflow which uses parameters. I would like the workflow to pickup parameter from job.properties file without prompting the user. I intend to generate/modify this job.properties before every run with new parameter values.
My current setup, I have manually created job.properties file in the same working directory as workflow.xml. I have not added parameters to the hive action since this results in prompt. But the Hive SQL uses the same parameter as specified in the job.properties file.
When I run the Workflow it fails for being unable to resolve the parameters. I believe it is not picking up my job.properties file for some reason.
Any pointers will realy help? Beating my head for almost 2 days now!

Are you using the Workflow Editor? At this time (Hue 3.7) job.properties is only picked up when submitting a workflow from File Browser.
Properties need to be entered as 'Oozie parameters' in the Properties section of the workflow. Would just doing this solve your problem?

Related

How to control the execution of tasks that depends on the download of a file

I have a DAG that runs every two minutes. The first task tries to download a file and the subsequents tasks manipulate this downloaded file.
I'm using a control file that sets a True value when the download is done successfully and then my other scripts check first if the download is set to True in this control file.
I was just wondering if there is a better way to execute my other scripts instead of running them all in every two minutes.
Could you give more precisions your problem?
If I understood your problem here some indications:
Instead of using a control file use xcom to pass parameters between tasks. This isn't a solution to your problem but don't use files to pass the parameters since you could end up with concurrency issues.
To verify the download you could use a file sensor instead. And then define the dependencies as follow : download_task >> file_sensor >> script_to_exec_task. Don't forget to correctly configure the timeout on the sensor depending on your constraints and needs.

Heartbeat print on running R script via Oozie

I was trying to run a R script on Oozie.
Though the R script was triggered, the Oozie job kept printing Heartbeat in the stdout file, without proceeding further into the R-code.
What may have caused it? Also, how to avoid the same?
Edit :
The script was supposed to read data from HDFS and a series of R scripts were supposed to be operating on the data to generate a json output.
The script was triggered by the oozie workflow from command line.
This oozie job is further planned to be exposed as an API.
I have tried changing the scheduler to fair scheduler as mentioned in this post, but still it didnt work.

Editing Oozie Workflow after submitting through Command Line

After running an oozie workflow using the command line I am unable to edit it using the Hue Workflow editor or even find it in the list of workflows.
I have an R script that generates the workflow.xml and job.properties, and will run the commands necessary to run the scripts:
workflow.path <- "workflow.xml" # Hard coded for the sake of this example
system2("hadoop", args = c("fs -put -f ", workflow.path, "/User/service/Test/" ))
system("oozie job --oozie http://localhost:11000/oozie -config job.properties -run")
Moving the workflow into HDFS works fine, and I have verified it is a valid workflow using oozie. Running the job also works like a charm, however if I open up Hue, and navigate to the Workflow and find it, I cannot edit it, only rerun it.
Some background on what I am trying to do: We have a large amount of automated workflows and we are always adding more. They all follow the same pattern as well so automating the creation of the coordinator and workflows is simple. Sometimes these workflows have to modified by people and they need to be able use the web interface.
Any help would be appreciated.
Indeed, only workflows created via the Drag&Drop Editor can be edited.
Workflows submitted via the CLI can only be visualized.

Oozie Error: E1310 : E1310: Bundle Job submission Error: [null]

I created a oozie bundle consisting of several coordinators and respective workflows. The bundle ran fine previously but with adding a new workflow it stopped working completely.
For simplification and debugging I stripped the bundle down to the absolute minimum consisting of one coordinator starting one workflow.
The XMLs seem to be valid (validated with oozie) and the coordinator and workflow is working fine on their own (with fitting properties).
Problem is, that I do not get any meaningful errors on -dryrun or run.
Dryrun is producing the error: Error: E1310 : E1310: Bundle Job submission Error: [null] which does not lead me anywhere.
Just running the job results in the bundle being submitted and markes as "FAILED" with no coordinator started. Therefore I do not get any error reports on the coordinator to work with.
After playing around with the coordinator and workflow and the propagation of the variables from the bundle.properties file to the coordinator and the workflow I found a couple of important things to take note of that solved my problem in the end:
-dryrun on a bundle does not work as intended it seems. The above error is persistent even after fixing the bundle to run fine in oozie. I could not find anything noting that dryrun is not supported on bundles but the [null] is indicating that dryrun can not handle bundles
HDFS Paths have to be added with port numbers to work correctly. I had several paths in the format of hdfs://nodename/hdfs/dir/.... that did not seem to be propagated correctly without the correct path in the format of hdfs://nodename:8020/hdfs/dir/.... After adding the portnumber they worked fine
I missed a couple of variables in the bundle.xml that were used in the coordinator.xml. This did not get reported by oozie at all but instead followed in the coordinator not being started at all. The bundle will just be listed with -info without any sheduled coordinators with the status "running". This is pretty hard to debug because of missing feedback on oozie. Make sure to test your coordinator with a properties file and use that "working" properties file as a schema to check the bundle.properties and .xml for any missed variable

Does Flyway have s.th. like dbmaintain's "markDatabaseAsUpToDate" (maven-)task?

Our common workflow when creating a new sql migration script is
to write and execute every single statement in the developers local datatase schema. When finished, it's checked into the source control system.
Problem is: at the database scheme of the creating developer, the script is already "executed". For scripts not beeing reentrant - it would be convenient to have s.th. like Dbmaintain's maven task "markDatabaseAsUpToDatemaven".
Does Flyway have s.th. equivalent?
P.S.: Our current workflow (as a workaround) is as follows:
"mvn flyway:migrate" this file as an empty file (without content - so it never fails).
put the sql statments in, save & "migrate" again.
"mvn flyway:repair"
Thanks
While the workflow you describe sounds like it can do the job, you can achieve the same in a simpler and fully automated way: set cleanOnValidationError to true (on for dev!) and everytime the script changes, the DB gets recreated.
More info: http://flywaydb.org/documentation/maven/migrate.html#cleanOnValidationError

Resources