Loading own dags in Airflow - airflow

When running Airflow server for the first time after installation, in the DAG list, there is already some dags, such as "example_dash_operator", "example_branch_labels", etc.
Airflow doc says that, to create our own DAGs, we should put in a dags folder, which should be in this location AIRFLOW_HOME/airflow/dags/ (AIRFLOW_HOME is the folder where I install Airflow). I put a sample dag1.py in this folder. But after re-logging in into localhost:8080, I still see only the standard list of DAGS after installation. I don't see dag1.py. I have both the server and the scheduler running with :
airflow webserver --port 8080
airflow scheduler
The full folder structure is as following:
\AIRFLOW_HOME\
airflow\
airflow-webserver.pid
airflow.db
logs\
airflow.cfg
dags\
dag1.py
webserver_config.py
This thread here advised to run airflow dags list first. dag1.py does not appear on the list when I run that command. And, after running that, restarting the server and scheduler, the web UI still does not list dag1.py
in airflow.cfg, I have this line defining the dags folder:
dags_folder = /xxxxxx/airflow/dags
where the xxxxxx is the absolute path of AIRFLOW_HOME.
The content of dag1.py is code copied from a tutorial in this Youtube. So I think it is a valid dag.
What am I missing?

The AIRFLOW_HOME is NOT where you install airflow. The AIRFLOW_HOME is where you aither:
have AIRFLOW_HONE environment point to
or if you have no AIRFLOW_HOME variable defined when you run airflow, it defaults to "${HOME}/airflow"
See:
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html?highlight=airflow_home
The first time you run Airflow, it will create a file called airflow.cfg in your $AIRFLOW_HOME directory (~/airflow by default). This file contains Airflow's configuration and you can edit it to change any of the settings. You can also set options with environment variables by using this format: AIRFLOW__{SECTION}__{KEY} (note the double underscores).

Related

How to prepare the nodes folder structure for the network-bootstrapper?

I am trying to create a bootstrap test network on aws, and I am using this -
java -jar corda-tools-network-bootstrapper-4.5.jar --dir ./
I get -
Bootstrapping local test network in /home/ubuntu No nodes found
The jar seems to be correct. The docs state - https://docs.corda.net/docs/corda-os/4.5/network-bootstrapper.html -
java -jar network-bootstrapper-4.5.jar --dir <nodes-root-dir>
I cannot find network-bootstrapper-4.5.jar but only the corda-tools-network-bootstrapper-4.5.jar. The error seems to be something related to the node.conf file.
Has anyone any ideas?
If you follow the steps that are mentioned here, you will see that it says:
Create a directory containing a node config file...for each node
The keywords are node config file; so you must do the following:
Build your nodes: From the root folder of your project run ./gradlew deployNodes; this will create a folder for every node that you defined inside the deployNodes Gradle task of your root build.gradle file.
The folders will be inside path-to-project-folder/build/nodes. If you inspect the folders, you'll see that each node has a node.conf file which the documentation of the bootstrapper is talking about.
Run the bootstrapper command where the <nodes-root-dir> is path-to-project-folder/build/nodes since it contains all of your nodes.

What is export command doing in apache airflow setup

I am following this tutorial.
https://towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b
when I run the export command as below
export AIRFLOW_HOME='pwd' airflow_home
what is this export command doing. it will create a environment variable AIRFLOW_HOME = pwd
is this the purpose?
when I run the next command airflow initdb it creates a folder called pwd inside my newly created project directory and puts the files in there.
Am I missing something here?
I am using macbook, python 3.7, airflow 1.10.9
You're missing the correct backtick ` instead of a single quote '.
On *nix systems `pwd` will be evaluated to the current directory. That's why it creates a folder called pwd instead of using the current directory as the airflow home

Corda plugins and base-directory configuration

I've setup a 4 node corda network with Notary, NodeA, NodeB and NodeC. When I bring up node and webserver instances for individual nodes, network comes up healthy. But,
1) I want to keep the configs under /etc/node.conf and runtime environment under /opt/corda directories for each of the nodes. When I provide --config-file and --base-directory arguments, per documentation, corda refuses to run with both arguments as inputs. Is there a way to isolate runtime environments and configs?
2) How do I make the nodes pick up the jars under plugins ? I've created a plugins directory for each of these nodes under basedirectory path - /opt/corda/plugins. But, created it's own plugins directory. (Albeit, in my current setup I have a node.conf file under /opt/corda/ to keep it going). Where must I deploy my cordapps if corda is not picking up from the plugins folder I've created? Am I missing something here ? I've followed the docs during my setup.
1) As you observed, using the --config-file and --base-directory command line arguments together is currently disallowed.
However, you can store your node.conf file in a separate location by creating a symlink in the root of the node folder that points to the actual location of the node.conf file (e.g. ln -s ./conf/node.conf ./node.conf on Mac if you are storing the node.conf file in a conf folder in the node's root directory).
2) Awaiting clarification.

Oozie: Is there anything that needs to be done after placing an updated jar under lib folder?

I am trying to place an updated jar under lib path and removing the old jar. Unfortunately , I see the old logs in oozie console which were present in old jar. For confidential purpose I am unable to show logs here. But I am doing the below steps:
Replacing a jar (mycode.jar) under lib folder which is mentioned in workkflow.xml
Submitted the oozie job using oozie job -oozie http://host -config job.properties -run
When I see logs in console, I could see old jar(older version of mycode.jar) logs even if jar is replaced.
If you are talking about the lib directory in the oozie workflow application then you need not to do anything. The next execution of the workflow will automatically pick the new (updated) jar.
For updating the jars into share lib /user/oozie/share/lib/lib_*/* then after replacing the jar, you need to execute the following command to update the share lib into oozie server.
oozie admin -sharelibupdate
Hope this will help. Thanks.
To make sure issue is same I'll narrate what I was facing:
created a MapReduce JAR and placed it in lib folder.
Ran oozie(MapReduce action) job and picked the JAR as expected and ran fine.
I had some functionality changes in my code(JAR) so I added new log statements to make sure new JAR is being picked. Built the JAR and replaced the old JAR with newly built JAR in lib folder(hdfs)
Ran oozie job again, code from old JAR was executed because new log statements did not show up.
After few search I found following tips:
Clear the Yarn Cache: found this in HortonWorks site(https://community.hortonworks.com/articles/92339/how-to-clear-local-file-cache-and-user-cache-for-y.html) - pasting content below for reference
Short Description:
To use different version jar file with same name, clear cache on all NodeManager hosts to prevent the application using old jar
a. Find out the cache location by checking the value of the yarn.nodemanager.local-dirs property
< property >
< name >yarn.nodemanager.local-dirs< /name>
< value>/hadoop/yarn/local</value>
< /property>
b. Remove filecache and usercache folder located inside the folders that is specified in yarn.nodemanager.local-dirs.
[yarn#node2 ~]$ cd /hadoop/yarn/local/
[yarn#node2 local]$ ls filecache nmPrivate spark_shuffle usercache
[yarn#node2 local]$ rm -rf filecache/ usercache/
c. Restart YARN service.
I was unable to clear cache because I did not have the necessary access. Thus I followed below workaround
Rename the Package or class, since this package/class was written by me, I had the liberty to simply rename the class, thus in oozie when new Class name was looked up, automatically the new functionality was executed.
Option 2 may not be viable for many and the question remains open as to why oozie does not pick New JAR/Class.

Passing environment variables through jar file which app uses

I am currently trying out on the docker link between my app and db containers. I've checked on my app container and environment variables are automatically set when I link the containers together.
What I want to do is for my config file, which is packaged into a jar file, to receive the environment variables and set the required values to it. Any advice or help?
And this is how I create a config file in my jar file to connect to MySQL
database { url="jdbc:mysql://${MYSQL_PORT_3306_TCP_ADDR}:${MYSQL_PORT_3306_TCP_PORT}/mydb" driver="com.mysql.jdbc.Driver"}
Updating the config file inside the jar could be quite overkill.
It think you have several choices
read the config environment variable directly in you program
use variable either directly or generate the config file there
create launch script (details of this depends of you guest os in docker how to do it; sh/bash for linux etc..)
that script can generate new config file from environment and put it on classpath before jar so you program sees it.
EDIT: added example
You can save this kind of launcher script on docker image which dynamically creates configuration before launching actual program.
#!/bin/bash
# some default values for testing even without links to other container
MYSQL_PORT_3306_TCP_ADDR=${MYSQL_PORT_3306_TCP_ADDR:-127.0.0.1}
MYSQL_PORT_3306_TCP_PORT=${MYSQL_PORT_3306_TCP_PORT:-3306}
cat << EOF > /opt/yourprogram/dbconfig.conf
database { url="jdbc:mysql://${MYSQL_PORT_3306_TCP_ADDR}:${MYSQL_PORT_3306_TCP_PORT}/mydb" driver="com.mysql.jdbc.Driver"
}
EOF
scala -classpath /opt/yourprogram YourProgram
What I did is that I wrote the sh file in my directory /tmp/restcore-1.0-SNAPSHOT/bin like this:
#!/bin/bash echo "database{url="jdbc:mysql://"${MYSQL_PORT_3306_TCP_ADDR}":"${MYSQL_PORT_3306_TCP_PORT}"/mydb" driver="com.mysql.jdbc.Driver" }" > myconf.conf
jar uf /tmp/restcore-SNAPSHOT/lib/com.organization.restcore-1.0-SNAPSHOT.jar /tmp/restcore-1.0-SNAPSHOT/bin/myconf.conf
After building the Dockerfile and running the sh file in CMD, I use cat myconf.conf to check the config file and I'll be able to see the environment set.

Resources