If I set env vars corresponding to airflow config settings after executing the airflow binary and at the same time DAG definitions are being loaded into memory, will this have the same effect as having set these same env vars at the OS level prior to having executed the binary?
I wasn't able to find any documentation on whether this would work as intended and figured that if I had to read through the source to figure this out then it's probably not a good idea to be doing it in the first place.
Instead of setting environment variables at runtime I've created two airflow.cfg files: airflow.prod.cfg and airflow.dev.cfg. I then created a shell script start.sh that cps the appropriate .cfg file to airflow.cfg prior to executing the airflow binary.
I don't love having to use the shell script to boot things up but I'd prefer that to chancing any kind of spooky action as a result of setting env vars at runtime.
Related
The output in GoLand's terminal:
The output of go env in the OS terminal:
GoLand manages some of the environment variables and settings and overrides what's configured in the system so that it can create a reproducible environment between running code in the builtin terminal or when using the editor itself.
From what I can see in the images, I see that there are some differences around the GOFLAGS, GOPROXY, and GOMOD.
If I understand correctly what you are trying to do, then you need to enable the Go Modules support via Preferences | Go | Go Modules (vgo) and enable the Go Modules integration. Once you do this, you'll see that the GOFLAGS value will change.
There you can also set the Proxy field value to configure the GOPROXY environment variable.
The GOMOD difference comes from the directory where you invoked the go env command, as in this case they seem to be different directories. Invoke the command in the same directory in both IDE terminal and OS terminal and you'll see the same value. It indicates which, if any, go.mod file is used in the current command.
Finally, I recommend upgrading to GoLand 2019.3 as it will automatically enable Go Modules support when it detects that the project is created in a directory with a go.mod file present.
If I have multiple airflow dags with some overlapping python package dependencies, how can I keep each of these project deps. decoupled? Eg. if I had project A and B on same server I would run each of them with something like...
source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate
Basically, would like to run dags with the same situation (eg. each dag uses python scripts that have may have overlapping package deps. that I would like to develop separately (ie. not have to update all code using a package when want to update the package just for one project)). Note, I've been using the BashOperator to run python tasks like...
do_stuff = BashOperator(
task_id='my_task',
bash_command='python /path/to/script.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
Is there a way to get this working? IS there some other best-practice way that airflow intendeds for people to address (or avoid) these kinds of problems?
Based on discussion from the apache-airflow mailing list, the simplest answer that addresses the modular way in which I am using various python scripts for tasks is to directly call virtualenv python interpreter binaries for each script or module, eg.
source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate
would translate to something like
do_stuff_a = BashOperator(
task_id='my_task_a',
bash_command='/path/to/virtualenv_a/bin/python /path/to/script_a.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
do_stuff_b = BashOperator(
task_id='my_task_b',
bash_command='/path/to/virtualenv_b/bin/python /path/to/script_b.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
in an airflow dag.
To the question of passing args to the Tasks, it depends on the nature of the args you want to pass in. In my case, there are certain args that depend on what a data table looks like on the day the dag is run (eg. highest timestamp record in the table, etc.). To add these args to the Tasks, I have a "congif dag" that runs before this one. In the config dag, there is a Task that generates the args for the "real" dag as a python dict and converts to a pickle file. Then the "config" dag has a Task that is a TriggerDagRunOperator that activates the "real" dag which has initial logic to read from the pickle file generated by the "config" dag (in my case, as a Dict) and I read it into that bash_command string like bash_command=f"python script.py {configs['arg1']}".
you can use packaged dags, where each dag is packaged with its dependency
http://airflow.apache.org/concepts.html#packaged-dags
There are operators for running Python. There is a relatively new one, the PythonVirtualenvOperator which will create an ephemeral virtualenv, install your dependencies, run your python, then tear down the environment. This does create some per-task overhead but is a functional (if not ideal) approach to your dependency overlap issue.
https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonvirtualenvoperator
When I use SSHOperator to execute the command on the remote server with the user. But it can't use the proper system environment, I think it didn't source the /etc/profile or ~/.bashrc.
This issue is kind of annoying, for example, I have to write the absolute path of the Python interpreter everywhere. Is there any ways to avoid this.
OS: UNIX Solaries, Oracle Application Server 10g
To run shell script from Oracle Forms, I used the following host('/bin/bash /u01/compile.sh') and it works well
Now, I need to run unix command something like
host('mv form1.fmx FORM1.FMX') but it's not working
I tried to append the command mv form1.fmx FORM1.FMX' to the compile.sh shell script but also it's not working although the rest lines of the shell script is running well
The solution is to just add the full path of the mv command and it worked well, as follow
/bin/mv /u01/oracle/runtime/test/form1.fmx /u01/oracle/runtime/test/FORM1.FMX
In case anyone else encounters the same problem, the cause is that Forms process creates a subprocess to execute host() command, and that subprocess inherits environment variables of the parent process, which are derived from default.env (or other env file as defined in server config). There is a PATH variable defined in that file, but it doesn't contain usual /bin or /usr/bin, so the commands will not execute unless full path is specified.
The solution is to set the correct PATH variable either in the executed script (via export PATH=$PATH:...) or in default.env. I set it in the script, since, knowing Oracle, there's no guarantee that modifying default.env won't break something.
I'm trying to run my program using torque scheduler using mpi run. Though in my pbs file I load all the library by
export LD_LIBRARY_PATH=/path/to/library
yet it gives error i.e.
error while loading shared libraries: libarmadillo.so.3:
cannot open shared object file: No such file or directory.
I guess error lies in variable LD_LIBRARY_PATH not set in all the nodes. How would I make it work?
LD_LIBRARY_PATH is not exported automatically to MPI processes, spawned by mpirun. You should use
mpirun -x LD_LIBRARY_PATH ...
to push the value of LD_LIBRARY_PATH. Also make sure that the specified path exists on all nodes in the cluster and that libarmadillo.so.3 is available everywhere.
On some systems, your environment isn't always propagated via mpirun. You should set all those variables in your .bashrc file.