Unable to find out why oozie job remains in Running state - oozie

WE are facing issue with Oozie where the Oozie job is launched and it goes into running state. The oozie job is a sqoop job internally that connects mysql and imports a tabledata into hdfs. The internal sqoop job works fine when run independantly. The jar files mysql-connector-java-5.1.36.jar is located in the hdfs folders /user/oozie/share/lib/lib_20180601073112/oozie and /user/oozie/share/lib/lib_20180601073112/sqoop
WE are running oozie on Cloudera 5.14.2

Related

Changes made to airflow.cfg file not reflecting when running Airflow using Ubuntu

I have created a Dag file and saved it in the airflow home folder(C:\ubuntu\rootfs\home\admin123\airflow\dags) but it didn't show up in the web UI. So I tried to change the dag folder from the airflow.cfg file. This is the new dags folder location(/c/users/myuser/airflowhome/dags/)
After doing this change I restarted the scheduler and webserver in Ubuntu but still, my Dag file is not showing in the web UI.
I'm using the Airflow 2.2.5 version and Ubuntu 18.04 version.

Airflow command not found for all commands

I am getting an error when attempting to create a user. I have Airflow running on Unbunto Virtualbox and I am SSH from Visual Studio Code. As a sanity test, I ran airflow scheduler and got a "command not found" again. Attempted to run the command with sudo as well.
Turns out if you are working in the sandbox and close out virtual studio code you need to get back in the sandbox and then run the commands.
>source sandbox/bin/activate
>airflow db init
>airflowebserver

How to mount local directory to concourse pipeline job?

I am trying to connect local git repository to concourse so that i can perform automated testing on my local environment even before committing the code to GitRepo. In other terms i want to perform some tasks before git commit using concourse pipeline for which i want to mount my local working directory to concourse pipeline jobs.
You can't run a pipeline or a complete job with a local repository, only a task. But that's OK, as a job main goal is to setup inputs and outputs for a task, and you will be providing them locally
The command is fly execute, and the complete doc is here : https://concourse-ci.org/tasks.html#running-tasks
To run a tasks locally you will have to have the task in a separate yaml file, not inline in your pipeline.
The basic command where you run the task run-tests.yml with the input repository set to the current directory:
fly -t my_target execute --config run-tests.yml --input repository=.

Restart Oozie on EMR cluster after making changes to oozie-site.xml

I am trying to restart Oozie server after making changes to the oozie-site.xml. But, so far, I am not successful. I am using EMR 4.3 version of the cluster.

Oozie Shared Lib: where to place jars

I have installed Cloudera CDH QuickStart VM 5.5, and I'm running a Sqoop action in my Oozie workflow. I encountered an error that says MySQL JDBC driver is missing and I came across to a SO answer here that says the mysql-connector-java.jar should be placed in Oozie's HDFS shared lib path, under sqoop path.
When I browse the Oozie's HDFS shared lib path, however, I've noticed two sqoop subdirectories to copy the jar.
/user/oozie/share/lib/sqoop
and
/user/oozie/share/lib/lib_20151118030154/sqoop
Aside from sqoop, hive, pig, distcp, and mapreduce-streaming paths also exist on both lib and lib/lib_20151118030154.
So the question is: where do I place my connector jar: on the first or the second one?
What's the difference (or difference of purpose) of these two paths in relation to jars of sqoop, hive, pig, distcp, and mapreduce-streaming for Oozie?
The lib_20151118030154 sub-dir would be the current version of the ShareLibs, as of 18-NOV-2015. The versioning allows you to make updates without stopping the Oozie service -- check the documentation here.
In other words: the Oozie service keeps in memory a list of the JARs in each ShareLib (based on what was present for the latest version at boot time), so that adding a JAR will not make a difference until (a) you stop/restart the service or (b) you resync the service as explained in the doc above.

Resources