Cant run Shell script in Oozie using Hue - oozie

I am trying to run shell script in Oozie using Hue. However it is not running and give exception.
My Shell script file:
#! /bin/bash
hive -e 'desc emptable;'
=======================================
Also added same script file in FILE option in script action.
=======================================
Gives exceptions:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
=========================================
I also tried with hive-site file, added in FILE option. but not worked.

Oozie has a specific Hive action, and for a good reason: on a standard Hadoop setup, Hive/Pig/whatever are not installed on the "slave" nodes.
I suggest that you read some documentation on your own.

You should check the logs of the action on the graph of the workflow dashboard page, it should print what really happened.
In the latest Hue, there is even an HiveServer2 action.

Related

(Dagster) Schedule my_hourly_schedule was started from a location that can no longer be found

I'm getting the following Warning message when trying to start the dagster-daemon:
Schedule my_hourly_schedule was started from a location Scheduler that can no longer be found in the workspace, or has metadata that has changed since the schedule was started. You can turn off this schedule in the Dagit UI from the Status tab.
I'm trying to automate some pipelines with dagster and created a new project using dagster new-project Scheduler where "Scheduler" is my project.
This command, as expected, created a diretory with some hello_world files. Inside of it I put the dagster.yaml file with configuration for a PostgreDB to which I want to right the logs. The whole thing looks like this:
However, whenever I run dagster-daemon run from the directory where the workspace.yaml file is located, I get the message above. I tried runnning running the daemon from other folders, but it then complains that it can't find any workspace.yaml files.
I guess, I'm running into a "beginner mistake", but could anyone help me with this?
I appreciate any counsel.
One thing to note is that the dagster.yaml file will not do anything unless you've set your DAGSTER_HOME environment variable to point at the directory that this file lives.
That being said, I think what's going on here is that you don't have the Scheduler package installed into the python environment that you're running your dagster-daemon in.
To fix this, you can run pip install -e . in the Scheduler directory, although the README.md inside that directory has more specific instructions for working with virtualenvs.

Error that says Rscript is not recognized as an internal or external command, operable program or batch file [duplicate]

shell_exec("Rscript C:\R\R-3.2.2\bin\code.R ");
This is the call to script.On calling the above script, the error occurs.
I am trying to call my R script from the above path but no output is being shown. While checking the error logs of PHP, it says 'Rscript' is not recognized as an internal or external command, operable program or batch file.' The script is working fine on the Rstudio but not running on the command line.
Add the Rscript path to your environment variables in Windows:
Go to Control Panel\System and Security\System and click Advanced System Settings, then environment variables, click on path in the lower box, edit, add "C:\R\R-3.2.2\bin"
Restart everything. Should be good to go. Then you should be able to do
exec('Rscript PATH/TO/my_code.R')
instead of typing the full path to Rscript. Won't need the path to your my_code.R script if your php file is in the same directory.
You need to set the proper path where your RScript.exe program is located.
exec ("\"C:\\R\\R-3.2.2\\bin\\Rscript.exe\"
C:\\My_work\\R_scripts\\my_code.R my_args";
#my_args only needed if you script take `args`as input to run
other way is you declare header in your r script (my_code.r)
#!/usr/bin/Rscript
and call it from command line
./my_code.r
If you are running it in Git Bash terminal, you could follow a revised version of the idea suggested by #user5249203: in the first line of your file my_code.R, type the following
#!/c/R/R-3.2.2/bin/Rscript.exe
I assumed that your path to Rscript.exe is the one listed above C:\R\R-3.2.2\bin. For anyone having a different path to Rscript.exe in Windows, just modify the path-to-Rscript accordingly. After this modification of your R code, you could run it in the Git Bash terminal using path-to-the-code/mycode.R. I have tested it on my pc.
I faced the same problem while using r the first time in VS Code, just after installing the language package (CRAN).
I restart the application and everything worked perfectly. I think restarting would work for you as well.

How to see logs of running robot script?

The robot scripts when ran on RIDE, generate output.xml, report.html etc files, once run is over.
Is there any way available to view logs when script is still running? (When I use pause on failure)
Also some times I had to Stop/Abort the run in middle, and no logs are generated in such cases.
Kindly help,
Thanks in advance
As for first part — RIDE runs tests adding own listener, providing more verboseness of the output and pausing/resuming functionality.The easiest thing is to run tests not from RIDE, but from console using robot/pybot script. In this case much less logs are written to output (though it doesn't provide pause/resume functionality).
For second part — robot (RIDE starts robot script — you can see it in execution log: command: pybot.bat...) generates output.xml file not after but during execution, so generated output.xml is not valid until test is finished. After normal execution rebot tool generating log.html automatically. So generally it's possible to take following steps:
"Fix" your incomplete output.xml file after execution stop with fixml. output.xml location for RIDE execution can be found in the very same execution log of yours (e.g. ...\appdata\local\temp\RIDEv_0yrp.d\ in my case)
Run rebot stand-alone: rebot output.xml --log log.html --report report.html. Rebot options description you can check using rebot --help (as usual)
Please also note that directory where RIDE output files are stored is temporary — exists only when RIDE is started. You will lose your output on exiting RIDE
I'm using RIDE 1.5 so maybe my answer is not valid for other versions
In RIDE, under Run Tab , when you are running the scripts , you have a option, show message log, it will shows the runtime log.
Try this out.

Oozie executing hadoop commands in shell action as yarn

Environment : Hortonworks Sandbox HDP 2.2.4
Issue : Unable to run the hadoop commands present in the shell scripts as a root user. The oozie job is getting triggered as a root user, but when the hadoop fs or any mapreduce command is executed, then it runs as yarn user. As yarn, doesn’t have access to some of the file system , so the shell script is failing to execute. Let me know what changes I need to do , for making it run the hadoop commands as root user.
It is an expected behaviour to get Yarn in place whenever we are invoking shell actions in oozie. Yarn user only have the capabilities to run shell actions. One thing we can do is to give access permissions to Yarn on the file system.
This is more like a shell script question than an Oozie question. In theory, Oozie job runs as the user who submits the job. In a kerberos' env, the user is whoever signed in with keytab/password.
Once job is running on Hadoop cluster, in order to change the ownership of command, you can use "sudo" within your shell script. In your case, you may also want to make sure user "yarn" is allowed to sudo to the commands you want to execute.
Add below property to workflow:
HADOOP_USER_NAME=${wf:user()}

cron job not creating logs

I am trying to execute some sql statements using an unix script. The script is placed in crontab to run everyday at 12.00 midnight and get the output in a log file.
Though my script is running and I can see the changes in DB but the log file is not generating. However manually running the script is generating the log file. Please suggest a solution.
now=`date "+%d%m%y"`
LOG="table_partition_$now.log"
test=`sqlplus -s ${USER}/${CPWD}#${DB} << THEEND > $LOG
...
...
...
exit
This is my code snippet. Please suggest
sqlplus has no closing `
Also, can you say whether the script runs correctly outside of cron? If it only fails in cron, you may want to call
env > /tmp/mylatestslog.txt
at the start and compare the differences with your local environment. (May be differences in the user, or variables used from you personal .bashrc).
(PS. also edited the question to show one command per line.)

Resources