oozie reading and writing in hdfs as mapred user - oozie

I am running a python script in oozie workflow. The python script reads file from hdfs manipulate and write it back to hdfs in new folder. I am not getting any error while running the oozie workflow. But manipulated data is not written in hdfs. I do see that new folder by default has the user a mapred. I am not much sure whether this is related to mapred user. I am running the oozie workflow as hdfs user. The python script when ran from shell script it runs successfully and gives the expected result.
Any help would be appreciated.
Thanks!

Oozie would be running your script as hdfs user.
Try logging in using hdfs and then browse HDFS to view your output folder.

Related

How to execute Python Script through Informatica Cloud

I have a python script that I need to execute and automate via IICS. The output of the script is a csv file. This output should be loaded to the Target. How can I achieve this via Informatica cloud. Please help with some info and documentations to the same.
Thanks
There are two ways to do this.
You can create an executable(using py2exe or some tool) from your py script. Then put that file in Informatica cloud agent server. Then you can call it using shell command. Please note, you do not need to install python or any packages.
You can also put the .py file in agent server and run it using shell like $PYTHON_HOME/python your_script.py You need to make sure py version is compatible and you have all packages installed in agent server.
You can refer to the below screenshot for how to setup shell command. Then you can run it as part of some workflow. Schedule it if needed.
https://i.stack.imgur.com/wnDOV.png

Heartbeat print on running R script via Oozie

I was trying to run a R script on Oozie.
Though the R script was triggered, the Oozie job kept printing Heartbeat in the stdout file, without proceeding further into the R-code.
What may have caused it? Also, how to avoid the same?
Edit :
The script was supposed to read data from HDFS and a series of R scripts were supposed to be operating on the data to generate a json output.
The script was triggered by the oozie workflow from command line.
This oozie job is further planned to be exposed as an API.
I have tried changing the scheduler to fair scheduler as mentioned in this post, but still it didnt work.

How to configure oozie shell action to run on all nodes

I have to make oozie shell action to run on all nodes for e.g creating parent directory for logs on local directory.
Thanks in advance!
It is not possible, as far as I know.
But you can try the below approaches initially proposed here:
MapReduce action can run on all nodes, but requires Java application. link
hadoop Streaming + MapReduce shell scripts. link; You can launch it as ssh or shell action in oozie

Is it possible to run a unix script using oozie outside hadoop cluster?

We have written a unix batch script and it is hosted on a unix server outside Hadoop Cluster. So is it possible to run that script via oozie?
If it is possible then how can this be achieved?
What is the script doing? If the script just needs to run regulary you can as well use a cronjob or something like that.
Besides this, Oozie has a action for SSH Actions on Remote hosts.
https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html
Maybe you can work something out with that by loging into the remotehost, run the script, wait for completetion and work on from there.

Script runs in Unix but not in Informatica command task

My script ran successfully in Unix but not in a command task of an Informatica workflow. The permissions are fine, and the parameter file and variables have been declared in the workflow. Why is this happening?
Make sure that the machine you are running informatica on, is running in a unix box.
If it is on a windows machine, you will have to run the DOS equivalent command for your script.
Check whether informatica repository is pointing to same UNIX server, in which the script to executed from informatica is present.
I to faced same situation ,please check the propertied of script file make it as 755(CHMOD),It should work.
Regards
Rama
This could be a permission issue. If you are executing a shell command from Informatica, right click the file in SFTP and click on "Properties" (I am using Winscp).Give full permission to the file and now it should work.

Resources