Can Oozie HDFS action use file patterns or glob? - oozie

Can I use wildcards (e.g. *) or file patterns (e.g. {}) in Oozie move actions?
I am trying to move the results of my job into archiving directory.
State of the directory structure:
output
- 201304
- 201305
archive
- 201303
My action:
<fs name="archive-files">
<move source="hdfs://namenode/output/{201304,201305}"
target="hdfs://namenode/archive" />
<ok to="next"/>
<error to="fail"/>
</fs>
resulting error:
FS006: move, source path [hdfs://namenode/output/{201304,201305}] does not exist
Is there an easy way to move more than one file in a glob or bash like syntax? Looking to do something similar to this hadoop command:
hadoop fs -mv hdfs://namenode/output/{201304,201305} hdfs://namenode/archive
Am I missing something? The hadoop fs command accepts glob. Does Oozie?

Oozie HDFS action has quite limited functionality, which is fully described in functional specification. To do something more complicated you can use Shell action. It allows to run arbitrary shell commands as part of workflow, e.g. hadoop fs in your case.

No - from my experience it doesn't look like it works.
FS006: move, source path [hdfs://nodename:8020/projects/blah/201*.gz] does not exist

In workflow.xml use this:
<action name="Movefiles">
<fs>
<move source='${SourcePath}' target='${DestinationPath}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
and in job.properties write:
SourcePath=output/*/
DestinationPath=archive

Related

Is it possible to run an Oozie Spark Action without specifying inputDir & outputDir

According to https://oozie.apache.org/docs/3.3.1/WorkflowFunctionalSpec.html#a4.1_Workflow_Job_Properties_or_Parameters we know ..
When submitting a workflow job for the workflow definition above, 3 workflow job properties must be specified:
jobTracker:
inputDir:
outputDir:
I have a PySpark script that has specified input & output locations in the script itself. I don't need and want an inputDir and outputDir in my workflow XML. When running my PySpark script via Oozie, I get this error message.
WARN ParameterVerifier:523 - SERVER[<my_server>] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal parameters in its XML definition
WARN JobResourceUploader:64 - SERVER[<my_server>] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-05-24 11:52:29,844 WARN JobResourceUploader:171 - SERVER[<my_server>] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
Based on https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/ParameterVerifier.java , my first warning is caused by the fact that I dont have a "inputDir"
else {
// Log a warning when the <parameters> section is missing
XLog.getLog(ParameterVerifier.class).warn("The application does not define formal parameters in its XML "
+ "definition");
}
Can I get around this at all ?
Update -- my XML structure
<action name="spark-node">
<spark xmlns="uri:oozie:spark-action:0.1" >
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
<master>yarn-master</master>
<!-- <mode>client</mode> -->
<name>oozie_test</name>
<jar>oozie_test.py</jar>
<spark-opts>--num-executors 1 --executor-memory 10G --executor-cores 1 --driver-memory 1G</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
</action>

How to copy files from a Unix share to a Windows machine using ant?

I have some files on a Unix machine which I can access from my Windows PC with Windows Explorer using \host\directory
However, when using an ant copy task, ant keeps on saying the directory doesn't exist...
So, the ant part is:
<if>
<available file="${unix-dbs-dir}" type="dir" />
<then>
<echo message="${unix-dbs-dir} exists"/>
</then>
<else>
<echo message="${unix-dbs-dir} doesn't exist"/>
</else>
</if>
<copy todir="${dbsDir}" verbose="true">
<fileset dir="${unix-dbs-dir}">
<include name="*.bd"/>
</fileset>
</copy>
The output of this is:
15:28:42 [echo] \\hyperion\dbs doesn't exist
15:28:42
15:28:42 BUILD FAILED
15:28:42 ... \\hyperion\dbs does not exist.
If I try the same with a remote Windows network path, it does work...
Any idea how to fix this? Seems strange that I can just access \hyperion\dbs with my Windows Explorer, but ant apparently can't...
The Unix is a CentOs 6.5, but I guess that doesn't matter.
Some extra information. I've created a small build.xml script to copy a file from our Unix machine to a Windows machine. If I execute the build.xml ant script from the command line (not started as administrator by the way), then the output is:
C:\Users\lievenc\TestCopyHyperion>%ANT_HOME%/bin/ant.bat -lib lib
Unable to locate tools.jar. Expected to find it in C:\Program Files\Java\jre1.8.
0_45\lib\tools.jar
Buildfile: C:\Users\lievenc\TestCopyHyperion\build.xml
[echo] Load them from directory \\srv-linuxdev\pde\appl\samplenet\dbs
[echo] \\srv-linuxdev\pde\appl\samplenet\dbs exists
[copy] Copying 1 file to C:\Users\lievenc\TestCopyHyperion
[copy] Copying \\srv-linuxdev\pde\appl\samplenet\dbs\apif.d to C:\Users\lievenc\TestCopyHyperion\apif.d
When executing this build.xml script from Jenkins, I get following output:
[workspace] $ cmd.exe /C '"C:\Jenkins\tools\hudson.tasks.Ant_AntInstallation\1.9.4\bin\ant.bat -lib lib && exit %%ERRORLEVEL%%"'
Buildfile: C:\Jenkins\jobs\test-copying-from-hyperion\workspace\build.xml
[echo] Load them from directory \\srv-linuxdev\pde\appl\samplenet\dbs
[echo] \\srv-linuxdev\pde\appl\samplenet\dbs doesn't exist
Can't seem to figure out what the difference is. cmd.exe must be executed as some other user? I'm just guessing here, but from my command line in Windows, I'm executing ant as a Domain User. Maybe this is different from Jenkins?
Ant script:
<?xml version="1.0"?>
<project basedir="." xmlns:ac="antlib:net.sf.antcontrib">
<!-- antcontrib -->
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<echo message="Load them from directory \\srv-linuxdev\pde\appl\samplenet\dbs" />
<if>
<available file="\\srv-linuxdev\pde\appl\samplenet\dbs" type="dir" />
<then>
<echo message="\\srv-linuxdev\pde\appl\samplenet\dbs exists"/>
</then>
<else>
<echo message="\\srv-linuxdev\pde\appl\samplenet\dbs doesn't exist"/>
</else>
</if>
<copy todir="${basedir}" verbose="true">
<fileset dir="\\srv-linuxdev\pde\appl\samplenet\dbs">
<include name="apif.d"/>
</fileset>
</copy>
</project>
Can't seem to figure out what the difference is. cmd.exe must be executed as some other user?
100%. Not only the user different, but so is the %PATH%, and any credentials you may have cached. Additionally, your ant executable is different too. Running from cmd you have whatever copes from %PATH%. Running through Jenkins, uses one of Jenkins' installations. However this wasn't the question here.
The Jenkins user depends on how you have it setup. If a Windows service, manage the user through Windows Services dialog, change it from "Local System" to something you are more familiar with, such as your own user.
Several things to check for first.
Can you even ping the host, through Jenkins.
Configure an "Execute Batch Command" step, and just type ping srv-linuxdev. Execute through Jenkins. See if that works.
Can you still copy the file if you omit the available tag altogether?
How are the permissions setup to access the linux share? Is it 100% open? For which user? I don't see any credentials being passed in your case. Are the credentials cached on your user session? This all ties to Jenkins running as different user.

How to make Hue - Oozie workflow run a java job which has config file?

I have a buildModel.jar, and a folder "conf" which contain a configuration file named config.properties.
The command line running it look like this:
hadoop jar /home/user1/buildModel.jar -t fp-purchased-products -i hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together
After doing some analyze, it use the db information in "config.properties" file to store data to a mongo db.
Now i need to run it with Hue Oozie workflow, so I used Hue to upload the jar file and folder "conf" to hdfs then created a workflow. I also added "config.properties" file in workflow
This is the workflow.xml
<workflow-app name="test_service" xmlns="uri:oozie:workflow:0.4">
<start to="run_java_file"/>
<action name="run_java_file">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>xxx.xxx.recommender.buildModel.Application</main-class>
<arg>-t=fp-purchased-products</arg>
<arg>-i=hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together</arg>
<file>/user/user2/service/build_model/conf/config.properties#config.properties</file>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
And this is the workflow-metadata.json
{"attributes": {"deployment_dir": "/user/hue/oozie/workspaces/_user2_-oozie-31-1416890719.12", "description": ""}, "nodes": {"run_java_file": {"attributes": {"jar_path": "/user/user2/service/build_model/buildModel.jar"}}}, "version": "0.0.1"}
After doing analyze, it got error when save data to mongo db. It seem that the java file can't see the config.properties.
Can anyone guide me how to use Hue Oozie run java which has config file ?
Sorry for late answer.
As Romain explained above. Hue will copy the config.properties to the same directory with the BuildModel.jar. So i changed the code to let BuildModel.jar read config file at the same directory. It worked !

integrating grunt with ant

Are there any good tutorials for integrating grunt with ant? Our current build uses ant because we are a Java shop. However, the front-end is beginning to become a first class citizen, and we are examining using node and grunt for the front-end build. I need to integrate the front-end build with the ant build. I need to know how to normalize the exit codes for all my custom tasks as well as the built in grunt tasks and limit the console output to these predefined codes when the grunt tasks are called by ant. Any help would be greatly appreciated.
You could use this macro:
<macrodef name="exec-node">
<attribute name="module" description="The name of the NodeJS module to execute"/>
<attribute name="failonerror" default="true" description="Fail if the exit code is not 0"/>
<element name="args" implicit="yes" description="Argument to pass to the exec task"/>
<sequential>
<exec executable="cmd.exe" failonerror="#{failonerror}" osfamily="winnt">
<arg line="/c #{module}" />
<args/>
<!-- Windows cmd output workaround: http://stackoverflow.com/a/10359327/227349 -->
<!-- Forces node's stderror and stdout to a temporary file -->
<arg line=" > _tempfile.out 2<&1"/>
<!-- If command exits with an error, then output the temporary file -->
<!-- to stdout delete the temporary file and finally exit with error level 1 -->
<!-- so that the apply task can catch the error if #failonerror="true" -->
<arg line=" || (type _tempfile.out & del _tempfile.out & exit /b 1)"/>
<!-- Otherwise, just type the temporary file and delete it-->
<arg line=" & type _tempfile.out & del _tempfile.out &"/>
</exec>
<exec executable="#{module}" failonerror="#{failonerror}" osfamily="unix">
<args/>
</exec>
</sequential>
</macrodef>
And you can call any command: example:
<target name="jshint">
<exec-node module="grunt">
<arg value="jshint" />
</exec-node>
</target>
works like a charm: also ensures the stderr is also printed, which is a common problem when calling grunt.
Grunt can call out to the command line, so you could easily create several tasks in grunt that do nothing but execute an ant task via the shell.
The grunt-shell library makes it especially easy to execute external commands from a grunt task: https://github.com/sindresorhus/grunt-shell
Since you're talking about custom exit codes, though, you'll probably have to end up writing your own custom grunt task that executes a shell command and then looks at the response code (perhaps using the grunt.helpers.spawn helper): https://github.com/gruntjs/grunt/blob/master/docs/api_utils.md#gruntutilsspawn
My advice? My organization recently went through the same thing and it's best if possible to just make a clean break from ant and get rid of it entirely for your JavaScript-related projects.
Grunt has such a growing and useful library of plugins I'd be surprised if you couldn't duplicate your ant build files and create a 100% javascript solution.
you might use http://abc.tools.qafoo.com/ which includes an npm module *1)
The only thing you then need is a custom Target like:
…
<target
name="-mm:compile:main~hooked"
extensionOf="-compile:main~hook"
depends="
-my-compile-npm-hook
"
>
<target
name="-my-compile-npm-hook"
>
<echo>install local grunt-cli</echo>
<antcall target="npm:install">
<param name="in.npm.package.name" value="grunt-cli" />
</antcall>
</target>
…
after that you might run grunt in the .npm/node_modules/.bin/ directory alias ${npm.local.modulesdir}/.bin/
^^ don't miss to include or define properties from src/main/resources/extensions/npm/npm.properties
*1): unfortunatly buggy with current node.js version

How can i visit a webpage with ant?

Ant task GET will download an http request.
How can i visit a webpage and throw the response to the current logger, and maybe take some decition according to the response?
Thanks in advance
Edit:
It worked out like:
<target name="genera">
<exec executable="curl" outputproperty="webProcess" errorproperty="error">
<arg line="http://web/web.php"/>
</exec>
<echo message="${webProcess}" />
<condition property="isOk">
<equals arg1="OK" arg2="${webProcess}"/>
</condition>
<echo message="${isOk}" />
<antcall target="doStuffIfOk" />
</target>
I believe you would have to <exec> an external program like curl or wget to get this kind of functionality — the Ant get task only seems to handle basic file downloading.
There is a small problem with executing an external program: it will not work on different OS platforms. You'll need to distinguish and support the various platforms in the build file, and this will become a mess.
Have a look at the POST task in the ant-contrib package (http://ant-contrib.sourceforge.net/tasks/tasks/post_task.html). It is similar to the build-in GET task, but you can specify a property for the response.

Resources