File as command line argument to spark action in Oozie workflow - oozie

How to pass a file as command line argument to spark job in Oozie workflow? My spark job is expecting a file as command line argument, but when I am pass that file in the workflow as /file/location it is not picking up that file.

I got one workaround, if we put the file in a custom-directory in ozzie shared library with a few additional change in job.properties
oozie.use.system.libpath=true
oozie.action.sharelib.for.spark=spark,custom-directory
oozie.libpath=true
Then we need to update the shared lib using below command:
oozie admin -auth SIMPLE -sharelibupdate
After that we can directly pick up the file by just using the name of the file, which we placed in custom-directory, in the oozie workflow.

Related

Unix script in informatica post command task

I write script to find the particular filename in a folder and copy the files after loaded into target table using informatica.
I use this script in informatica post command task but my session got failed it did not loaded into target tables but copy the files to backup directory.
cd /etl_mbl/SrcFiles/MainFiles
for f in Test.csv
do
cp -v "$f" /etl_mbl/SrcFiles/Backup/"${f%.csv}"
done
I want to correct my script based on the source files loaded into target using informatica and copy the loaded files into backup directory.
Do not use a separate command task. Use informatica's Post session success command and Post session failure command to achieve this. Put your unix code in Post session success command so it will only be triggered after session is successful.
Go with #Utsav's approach. Alternatively you can use a condition $YourSessionName.Status = SUCCEEDED on your link between Session and Command Taks
The benefit of this approach is that the command is clearly visible at the first glance.

How to save Robot framework test run logs in some folder with timestamp?

I am using Robot Framework, to run 50 Testcases. Everytime its creating following three files as expected:
c:\users\<user>\appdata\local\output.xml
c:\users\<user>\appdata\local\log.html
c:\users\<user>\appdata\local\report.html
But when I run same robot file, these files will be removed and New log files will be created.
I want to keep all previous run logs to refer in future. Log files should be saved in a folder with a time-stamp value in that.
NOTE: I am running robot file from command prompt (pybot test.robot). NOT from RIDE.
Could any one guide me on this?
Using the built-in features of robot
The robot framework user guide has a section titled Timestamping output files which describes how to do this.
From the documentation:
All output files listed in this section can be automatically timestamped with the option --timestampoutputs (-T). When this option is used, a timestamp in the format YYYYMMDD-hhmmss is placed between the extension and the base name of each file. The example below would, for example, create such output files as output-20080604-163225.xml and mylog-20080604-163225.html:
robot --timestampoutputs --log mylog.html --report NONE tests.robot
To specify a folder, this too is documented in the user guide, in the section Output Directory, under Different Output Files:
...The default output directory is the directory where the execution is started from, but it can be altered with the --outputdir (-d) option. The path set with this option is, again, relative to the execution directory, but can naturally be given also as an absolute path...
Using a helper script
You can write a script (in python, bash, powershell, etc) that performs two duties:
launches pybot with all the options you wan
renames the output files
You then just use this helper script instead of calling pybot directly.
I'm having trouble working out how to create a timestamped directory at the end of the execution. This is my script it timestamps the files, but I don't really want that, just the default file names inside a timestamped directory after each execution?
CALL "C:\Python27\Scripts\robot.bat" --variable BROWSER:IE --outputdir C:\robot\ --timestampoutputs --name "Robot Execution" Tests\test1.robot
You may use the directory creation for output files using the timestamp, like I explain in RIDE FAQ
This would be in your case:
-d ./%date:~-4,4%%date:~-10,2%%date:~-7,2%
User can update the default output folder of the robot framework in the pycharm IDE by updating the value for the key "OutputDir" in the Settings.py file present in the folder mentioned below.
..ProjectDirectory\venv\Lib\site-packages\robot\conf\settings.py
Update the 'outputdir' key value in the cli_opts dictionary to "str(os.getcwd()) + "//Results//Report" + datetime.datetime.now().strftime("%d%b%Y_%H%M%S")" of class _BaseSettings(object):
_cli_opts = {
# Update the abspath('.') to the required folder path.
# 'OutputDir' : ('outputdir', abspath('.')),
'OutputDir' : ('outputdir', str(os.getcwd()) + "//Results//Report_" + datetime.datetime.now().strftime("%d%b%Y_%H%M%S") + "//"),
'Report' : ('report', 'report.html'),

how to add a jar file stored outside of the oozie project's ./lib folder?

I'm writing an oozie java action which has my custom code in a jar file in the job ./lib folder.
I would also like to add to the classpath a jar in a folder external to my job (i.e. /home/me/otherjars/spark-assembly.jar).
The ./lib folder gets added to the classpath automatically. How can I get oozie to also add the external jar?
The oozie.libpath property is definitely what you need. Please check...
the Oozie documentation
this Oozie JIRA about global/local scope for that property
this orphan thread about precedence order (search for that
phrase)
this post and this other post, for example
The Bestway to use any custom Jars in Oozie useing, Once Oozie Sharedlib Installed in Cluster, you can mention place the Jar, in Sub Folder and pass the parameter
oozie.use.system.libpath = true
These will call Jar when every the Jobs are getting started.
Another option you can use, is adding Custom Path with UDF jar in hadoop_env.sh file under Hadoop ClassPath, These required your Hadoop restart to take effect, along with it also required you Custom JAR Path should be available in all the Nodes of Hadoop Cluster.

Job.properties file for Oozie using Hue dashboard

I have a setup with Hue 3.8 and HDP 2.3 installed through Ambari.
When I am trying to run a dummy script using Oozie dashboard, it creates a job.properties file for the same. This file contains wrong mapping for hdfs URL because of which the script fails.
Need help to understand from where this properties file is getting populated.
Any help would be highly appreciated.
Thanks.
It comes from the HDFS section of the hue.ini config file.
You should check this value:
[hadoop]
[[hdfs_clusters]]
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://localhost:8020

how to delete the first file in a directory using fs action in oozie

How to delete the first subdirectory in a directory using Oozie HDFS action?
I have a directory named error_dir, now I process the first subdirectory (another action)
and then I want it to be removed so I can process others.
<fs>
<delete path='${error_dir}/<blank>'/>
</fs>
I don't know what to fill in that blank.
Oozie FS action doesn't have such specialized functionality. You can implement this in Java action using HDFS API.

Resources