Running shell script with Oozie - oozie

I am trying to run a sh script through Oozie, but I am facing a problem:
Cannot run program "script.sh" (in directory
"/mapred/local/taskTracker/dell/jobcache/job_201312061003_0001/attempt_201312061003_0001_m_000000_0/work"):
java.io.IOException: error=2, No such file or directory.
Please help me with necessary steps.

This error is really ambiguous. Here are some issues that have helped me to solve this issue.
-If you are running oozie workflows on a kerberized cluster, make sure to authenticate by passing your Kerberos Keytab as a argument:
...
<shell>
<exec>scriptPath.sh</exec>
<file>scriptPath.sh</file>
<file>yourKeytabFilePath</file>
</shell>
...
-In your shell File (scriptPath.sh), make sure ro remove first line shell reference.
#!usr/bin/bash
indeed, if this shell reference isn't deployed on all data nodes, this can lead to this error code.

I had this same issue because of something really silly. I added a shell block in the workflow, then I selected the corresponding sendMail.sh, but I forgot to add the file sendMail.sh in FILE +.

An Oozie shell action is executed on a random Hadoop node, i.e. not locally on the machine where the Oozie server is running. As Oleksii says, you have to make sure that your script is on the node that executes the job.
See the following complete examples of executing a shell action and an ssh action:
https://github.com/airawat/OozieSamples/tree/master/oozieProject/workflowShellAction
https://github.com/airawat/OozieSamples/tree/master/oozieProject/workflowSshAction

workflow.xml :
...
<shell>
<exec>script.sh</exec>
<file>scripts/script.sh</file>
</shell>
...
Make sure you have scripts/script.sh in the same folder in hdfs.

In addition to what others said, this can also be caused by a shell script having wrong line endings (e.g. CRLF for Windows). At least this happened to me :)

Try to give full path for HDFS like
<exec>/user/nathalok/run.sh</exec>
<file>/user/nathalok/run.sh#run.sh</file>
and ensure that in job.properties the path is mentioned correctly for the library and workflow.xml
oozie.libpath=hdfs://server/user/oozie/share/lib/lib_20150312161328/oozie
oozie.wf.application.path=hdfs://bcarddev/user/budaledi/Teradata_Flow

if your shell file exist in your project correlated dir. then it's your shell file format cause this error. you need to convert the format from dos to linux using dos2linux:dos2linux xxxx.sh

Also make sure that the shell scripts are UNIX compliant. If these shell scripts were written in windows environment then it appends windows specific end of lines (EOL) and these scripts are not recognized by the oozie. So you will get "no such file or directory found" in oozie shell actions.

workflow.xml would look something like this
<workflow-app name="HiveQuery_execution" xmlns="uri:oozie:workflow:0.5">
<start to="shell-3c43"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-3c43">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>/user/path/hivequery.sh</exec>
<file>/user/path/hivequery.sh#hivequery.sh</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
Job.properties
jobTracker=xxxx.xxx.xxx.com:port
nameNode=hdfs://xxxx.xxx.xxx.com:port
better configure through UI, as suggested above

Related

httpuv fails to start oauth server when running batch file via Task Scheduler

I have a batch file that kicks off an R script:
"C:\Program Files\R\R-3.3.1\bin\x64\Rscript.exe" "C:\Folder\Script.R"
Within this R script, I use the gs_auth function to grab my auth token and use it in the R session. I can successfully run my R script by double-clicking on the batch file from Folder Explorer, and everything runs perfectly.
When I set up a Task Scheduler task to run the batch file, I instead get the following error:
Error in httpuv::startServer(use$host, use$port, list(call = listen)) :
Failed to create server
Calls: gs_title ... init_oauth2.0 -> oauth_authorize -> oauth_listener -> <Anonymous>
Execution halted
I have tried 'Running with highest privileges' in Task Scheduler and also entered my Windows password, but nothing seems to work.
Is there something I am missing in terms of loading the token?
Code
The commented out lines are there because they show the initialisation steps to 'bake' the token so that I can use it later on. These steps were followed as per here.
##Need to run gs_auth line only because need to paste stuff in from IE
#token <- gs_auth(cache = FALSE)
#gd_token()
##Bake token to working directory
#saveRDS(token, file = "googlesheets_token.rds")
setwd("C:/Folder")
gs_auth(token = "googlesheets_token.rds")
for_gs <- gs_title("data_log")
for_gs_sheet <- gs_read(for_gs)
XML:
<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.4" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
<RegistrationInfo>
<Date>2018-08-18T15:11:51.0347984</Date>
<Author>abc-PC\User</Author>
<URI>\ozb</URI>
</RegistrationInfo>
<Principals>
<Principal id="Author">
<UserId>S-1-5-21-2866463773-2659414307-4023886308-1002</UserId>
<LogonType>InteractiveToken</LogonType>
</Principal>
</Principals>
<Settings>
<DisallowStartIfOnBatteries>true</DisallowStartIfOnBatteries>
<StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>
<Enabled>false</Enabled>
<MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>
<IdleSettings>
<StopOnIdleEnd>true</StopOnIdleEnd>
<RestartOnIdle>false</RestartOnIdle>
</IdleSettings>
<UseUnifiedSchedulingEngine>true</UseUnifiedSchedulingEngine>
</Settings>
<Triggers>
<CalendarTrigger>
<StartBoundary>2018-08-18T15:00:00</StartBoundary>
<Repetition>
<Interval>PT15M</Interval>
</Repetition>
<ScheduleByDay>
<DaysInterval>1</DaysInterval>
</ScheduleByDay>
</CalendarTrigger>
</Triggers>
<Actions Context="Author">
<Exec>
<Command>"C:\Folder\run_it.bat"</Command>
</Exec>
</Actions>
</Task>

Passing variables to sshexec command resource file in ANT

I have created a large script to be passed to an sshexec commandResource in ant as follows.
<sshexec host="${host.server}"
username="${username}"
password="${oracle.password}"
commandResource="path-to-file/script.txt"
/>
This is working as intended.
There are several lines that will get executed on the server in that script. The contents of the script.txt file is as follows :
/script/compile_objects.sh /path-to-code/ login password
/script/compile_code.sh /path-to-code/ login password
However, I would prefer to not store the login and password as clear text in the script.txt file. Is it possible to pass parameters to each line of the command resource. I'm aware that each line in the command resource file gets executed in its own shell.
I tried string replacement from a property in the Ant script, but it failed with a bad substitution error like so. Is there another way to do this?
<property name="oracle.password" value="thepassword"/>
and then in the script file, alter to:
/script/compile_objects.sh /path-to-code/ login ${oracle.password}
/script/compile_code.sh /path-to-code/ login ${oracle.password}
This type of replacement works when using the command feature, but seems to fail when using commandResource.
Edit :
I am using Apache Ant 1.9.4
and jsch-0.1.54.jar

Informatica XML Import Issue from command line: PMREP

I wanted to import an XML file into my informatica repositry from the command line using PMREP command.
The command I executed:
pmrep objectimport -i .XML -c Control.XML -l Import_Log.txt
The control file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE IMPORTPARAMS SYSTEM "impcntl.dtd">
<!--apply label name LABEL_IMPORT_NEW to imported objects-->
<IMPORTPARAMS CHECKIN_AFTER_IMPORT="YES" CHECKIN_COMMENTS="NEWOBJECTS"
APPLY_LABEL_NAME="LABEL_IMPORT_NEW">
<FOLDERMAP SOURCEFOLDERNAME="sOURCE_FOLDER_NAME" SOURCEREPOSITORYNAME="SOURCE_REP_NAME" TARGETFOLDERNAME="TARGET_FOLDER_NAME"
TARGETREPOSITORYNAME="TARGET_REP_NAME"/>
<!--replace all mappings-->
<RESOLVECONFLICT>
<TYPEOBJECT OBJECTTYPE="ALL" RESOLUTION="REPLACE"/>
</RESOLVECONFLICT>
</IMPORTPARAMS>
I renamed the control file as Control.XML or Control.dtd or Control.cnf. But nothing worked.
when I executed the command, The repository was invoked, but immediately got this failure message.
Invoked at Fri May 01 06:26:22 2015
failed to execute objectimport
When I checked the log file, I got the following error:
FATAL:Error at (file /.../Control.XML, line 2, char 45(: An exception occured! Type:XMLPlatformException, Message:Could not close the file.
I tried to name the control file as impcntl.dtd(this was just a try) But got this error,
FATAL:Error at (file /.../Control.XML, line 2,char 3): Expected a markup declaration.
When I removed the line "--> !DOCTYPE IMPORTPARAMS SYSTEM "impcntl.dtd" <--" from the control file, I got the following error,
Label [LABEL_IMPORT_NEW] cannot be found in the repository [SOURCE_REP_NAME]
I am using Unix version: Solaris SunOS 5.10(sparc), and Infa version: 9.1 hotfix 4.
It would be great if somebody can give me a solution for this.
Thanks!
I do a lot with pmrep, in general your control file looks fine and it is obeying the dtd definition. Just making sure of a few things:
Did you make a call to connect before the objectimport call?
[like /> connect -r RepositoryName -n UserID -x password -h serverPath -o port# -s SecDomain

Opentripplanner Graph.obj file not found error

Trying to run the opentripplanner and using the Graph.obj to run it for a country.
But whenever trying to run the server it is giving the filenotfound exception at location /otp/Graph.obj although i have put the Graph.obj in the same location.
Stacktrace while running the server is :
Graph file not found or not openable for routerId '' under file:/otp
java.io.FileNotFoundException: /otp/Graph.obj (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at java.io.FileInputStream.(FileInputStream.java:101)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
I had the same problem trying to run OTP.
First problem: the routerId is missing
Second problem: the default directory is /var/otp/graph and OTP seems to search the graph under /var/otp/graph/{routerId}
Third problem (may not be yours): I tried to run OTP in Cygwin (Windows) and probably there was some trouble with the slashes/backslashes, so I decided to copy the graph under a "../gtfs/gurgaon"
Solution:
Step 1) create a subdirectory like "/var/otp/graph/gurgaon" and copied Graph.obj over there
Step 2) run something like: $ java -jar target/otp.jar --router gurgaon --graphs ../gtfs --server
It worked like a charm for me!
Try this code: java -Xmx5G -jar target/otp-0.20.0-SNAPSHOT-shaded.jar --build path your gtfs and osm.pbf files --inMemory
It worked perfect for me.

Hive query execution for custom udf is exepecting hdfs jar path instead of local path in CDH4 with Oozie flow

We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.
I am not sure if I am missing some configuration here.
Execption Trace:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files. Execution log at:
/tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log
java.io.FileNotFoundException: File does not exist:
hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
..... .....
any help on this is highly appreciated.
Regards,
GHK.
There are some few options. All the required jar should be in the classpath before you run hive query.
option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file> in oozie workflow
option 2: use attribute --auxpath /local/path/to/your/jar while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}

Resources