rJava import not working in airflow - r

I am trying to schedule some r script in airflow, I am using rJava library in my script. rJava and xlsx is working fine in R terminal, but not in airflow environment. I am getting this error,
libjvm.so: cannot open shared object file: No such file or directory
In my ~/.bashrc file,
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar
export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
In my ~/.profile file,
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar
export HADOOP_HOME='/home/ubuntu/spark-2.2.0-bin-hadoop2.7/hadoop-2.7.4'
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server:$LD_LIBRARY_PATH
In my /etc/environment,
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/bin/jar";
LD_LIBRARY_PATH="/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server";
Also, I tried to add this line in the top of my R script before importing rJava,
system('export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar')
system('export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server')
Even then I keep getting libjvm.so file missing error. But I can see that file in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
When I checked the log in airflow, the dag is running the script in Temporary script location: /tmp/airflowtmp7Ws3X2//tmp/airflowtmp7Ws3X2/nz-property-report6vTyGr
I think it is not picking the environment variables, getting this error,
Loading required package: xlsx
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - Error: package or namespace load failed for ‘xlsx’:
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - .onLoad failed in loadNamespace() for 'rJava', details:
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - call: dyn.load(file, DLLpath = DLLpath, ...)
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - error: unable to load shared object '/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.4/rJava/libs/rJava.so':
[2018-08-09 21:39:23,756] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - libjvm.so: cannot open shared object file: No such file or directory
Can anyone help me with using rJava in my R script in airflow?
EDIT: As requested, here is my DAG script,
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
#from airflow.models import DAG
from datetime import datetime
dag = DAG(
dag_id='property_report',
schedule_interval=None,
)
task = BashOperator(
task_id='report',
dag=dag,
bash_command="Rscript /home/ubuntu/airflow/dags/scripts/r-scripts/recreate_lastmonthreport_from_snapshotdata.R",
start_date=airflow.utils.dates.days_ago(1),
owner='airflow')

Just to help anyone looking for an answer for this. I just had to source ~/.bashrc in both screens running web server and scheduler separately and restart them. It picked up the env variables fine.

Related

Airflow quickstart DagRunNotFound DagRun example_bash_operator not found

I'm learning Airflow and just want to get up and running with the Quickstart: https://airflow.apache.org/docs/apache-airflow/stable/start.html
I'm not sure if this is a virtual environment issue or something I'm missing with Airflow that should be obvious and this may be a duplicate of this question from 2017: Running Airflow task from the command line does not work but there were no answers there.
My OS is POP_OS (debian)
I have created a new virtual environment and installed airflow by running the script provided:
# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow
# Install Airflow using the constraints file
AIRFLOW_VERSION=2.5.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.0/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
# The Standalone command will initialise the database, make a user,
# and start all components for you.
airflow standalone
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
I was expecting this to be plug-and-play and I haven't even reached the tutorials yet.
airflow standalone works and I can run the DAG from the web UI. However, if I run
# run your first task instance
airflow tasks run example_bash_operator runme_0 2015-01-01
from the CLI, I get
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
full error:
(airflow) jasonstewartnz#pop-os:~$ airflow tasks run example_bash_operator runme_0 2015-01-01
[2022-12-22 11:11:18,776] {dagbag.py:538} INFO - Filling up the DagBag from /home/jasonstewartnz/airflow/dags
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_tag>, delete_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_tag>, create_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/models/dag.py:3492 RemovedInAirflow3Warning: Param `schedule_interval` is deprecated and will be removed in a future release. Please use `schedule` instead.
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered for DAG: example_dag_decorator
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(EmailOperator): send_email>, prepare_email already registered for DAG: example_dag_decorator
Traceback (most recent call last):
File "/home/jasonstewartnz/.venv/airflow/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/cli.py", line 108, in wrapper
return f(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 384, in task_run
ti, _ = _get_ti(task, args.map_index, exec_date_or_run_id=args.execution_date_or_run_id, pool=args.pool)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 159, in _get_ti
dag_run, dr_created = _get_dag_run(
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 115, in _get_dag_run
raise DagRunNotFound(
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
The web UI tells me my config is /home/jasonstewartnz/airflow/airflow.cfg
The dags_folder in this /home/jasonstewartnz/airflow/dags is empty.
When I go to
http://localhost:8080/dags/example_bash_operator/details
I see that the fileloc attribute for the dag is:
fileloc /home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/example_dags/example_bash_operator.py
Even if I copy this file to the DAGs directory or change the DAGs directory in the config to the above, or add it to the path the CLI still seems unable to find the DAG.
did you enable the DAG, 'example_bash_operator' in the UI as the instructions specify? I am referring to this step in the guide's instructios:
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
This should be done before you attempt to execute command, airflow tasks run.

Log messages for DAG import errors in Airflow 2.x

I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:

struct.error: 'i' format requires -2147483648 <= number <= 2147483647 while creating stand alone executable using PyInstaller

I am trying to create a stand alone executable of the script afsara3.py using PyInstaller. I am getting this error
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
but cannot figure out why. It seems like the problem occurs while doing
Building PKG (CArchive) PKG-00.pkg
I cannot provide the script because the script isn't supposed to be made public
After running -
pyinstaller --specpath ./dist --distpath ./dist --workpath ./dist --onefile ./afsara3.py
368 INFO: PyInstaller: 4.0
368 INFO: Python: 3.6.9
377 INFO: Platform: Linux-5.4.0-48-generic-x86_64-with-Ubuntu-18.04-bionic
378 INFO: wrote ./dist/afsara3.spec
394 INFO: UPX is not available.
400 INFO: Extending PYTHONPATH with paths
['/home/afsara-ben/Downloads/coding assignment/src',
'/home/afsara-ben/Downloads/coding assignment/src/dist']
537 INFO: checking Analysis
2786 INFO: Building because /home/afsara-ben/Downloads/coding assignment/src/afsara3.py changed
2786 INFO: Initializing module dependency graph...
2799 INFO: Caching module graph hooks...
2833 INFO: Analyzing base_library.zip ...
6192 INFO: Caching module dependency graph...
6255 INFO: running Analysis Analysis-00.toc
6303 INFO: Analyzing afsara3.py
7149 INFO: Processing pre-find module path hook distutils from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_find_module_path/hook-distutils.py'.
7181 INFO: distutils: retargeting to non-venv dir '/usr/lib/python3.6'
8643 INFO: Processing pre-find module path hook site from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_find_module_path/hook-site.py'.
8650 INFO: site: retargeting to fake-dir '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/fake-modules'
13561 INFO: Processing pre-safe import module hook six.moves from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-six.moves.py'.
23794 INFO: Processing module hooks...
23794 INFO: Loading module hook 'hook-lxml.etree.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/_pyinstaller_hooks_contrib/hooks/stdhooks'...
23800 INFO: Loading module hook 'hook-certifi.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/_pyinstaller_hooks_contrib/hooks/stdhooks'...
23811 INFO: Loading module hook 'hook-openpyxl.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/_pyinstaller_hooks_contrib/hooks/stdhooks'...
23851 INFO: Loading module hook 'hook-pkg_resources.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
24243 INFO: Processing pre-safe import module hook win32com from '/home/afsara-ben/.local/lib/python3.6/site-packages/_pyinstaller_hooks_contrib/hooks/pre_safe_import_module/hook-win32com.py'.
24245 WARNING: Hidden import "pkg_resources.py2_warn" not found!
24245 WARNING: Hidden import "pkg_resources.markers" not found!
24247 INFO: Excluding import '__main__'
24249 INFO: Removing import of __main__ from module pkg_resources
24249 INFO: Loading module hook 'hook-PIL.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
24259 INFO: Import to be excluded not found: 'PySide'
24259 INFO: Import to be excluded not found: 'tkinter'
24259 INFO: Excluding import 'PyQt4'
24261 INFO: Import to be excluded not found: 'FixTk'
24261 INFO: Excluding import 'PyQt5'
24262 INFO: Removing import of PyQt5.QtCore from module PIL.ImageQt
24263 INFO: Removing import of PyQt5.QtGui from module PIL.ImageQt
24263 INFO: Loading module hook 'hook-numpy.core.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
25202 INFO: Loading module hook 'hook-setuptools.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
25889 INFO: Loading module hook 'hook-xml.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
25900 INFO: Loading module hook 'hook-PyQt5.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47189 WARNING: Hidden import "PyQt5.sip" not found!
47189 INFO: Loading module hook 'hook-PyQt5.QtWidgets.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47444 INFO: Loading module hook 'hook-PyQt5.QtCore.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47480 INFO: Loading module hook 'hook-xml.etree.cElementTree.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47481 INFO: Loading module hook 'hook-xml.dom.domreg.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47482 INFO: Loading module hook 'hook-pytz.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47483 INFO: Loading module hook 'hook-PyQt5.QtGui.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
47572 INFO: Loading module hook 'hook-pandas.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
50178 INFO: Loading module hook 'hook-PIL.Image.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
50845 INFO: Loading module hook 'hook-lib2to3.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
50870 INFO: Loading module hook 'hook-matplotlib.backends.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
52994 INFO: Matplotlib backend "GTK3Agg": added
53501 INFO: Matplotlib backend "GTK3Cairo": added
53770 INFO: Matplotlib backend "MacOSX": ignored
cannot import name '_macosx'
54036 INFO: Matplotlib backend "nbAgg": ignored
No module named 'IPython'
-c:12: MatplotlibDeprecationWarning:
The matplotlib.backends.backend_qt4agg backend was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
54508 INFO: Matplotlib backend "Qt4Agg": added
-c:12: MatplotlibDeprecationWarning:
The matplotlib.backends.backend_qt4cairo backend was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
54934 INFO: Matplotlib backend "Qt4Cairo": added
55312 INFO: Matplotlib backend "Qt5Agg": added
55696 INFO: Matplotlib backend "Qt5Cairo": added
56323 INFO: Matplotlib backend "TkAgg": added
56752 INFO: Matplotlib backend "TkCairo": added
57025 INFO: Matplotlib backend "WebAgg": ignored
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/matplotlib/backends/backend_webagg.py", line 27, in <module>
import tornado
ModuleNotFoundError: No module named 'tornado'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 12, in <module>
File "/usr/local/lib/python3.6/dist-packages/matplotlib/backends/backend_webagg.py", line 29, in <module>
raise RuntimeError("The WebAgg backend requires Tornado.") from err
RuntimeError: The WebAgg backend requires Tornado.
57438 INFO: Matplotlib backend "WX": ignored
No module named 'wx'
57697 INFO: Matplotlib backend "WXAgg": ignored
No module named 'wx'
57970 INFO: Matplotlib backend "WXCairo": ignored
No module named 'wx'
58280 INFO: Matplotlib backend "agg": added
58592 INFO: Matplotlib backend "cairo": added
59056 INFO: Matplotlib backend "pdf": added
59466 INFO: Matplotlib backend "pgf": added
59780 INFO: Matplotlib backend "ps": added
60112 INFO: Matplotlib backend "svg": added
60516 INFO: Matplotlib backend "template": added
60708 INFO: Processing pre-safe import module hook gi.repository.Gio from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.Gio.py'.
60724 INFO: Processing pre-safe import module hook gi.repository.GLib from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.GLib.py'.
60725 INFO: Processing pre-safe import module hook gi.repository.GObject from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.GObject.py'.
60735 INFO: Processing pre-safe import module hook gi.repository.Gtk from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.Gtk.py'.
60736 INFO: Processing pre-safe import module hook gi.repository.Gdk from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.Gdk.py'.
61248 INFO: Loading module hook 'hook-gi.repository.Gtk.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78487 INFO: Processing pre-safe import module hook gi.repository.xlib from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.xlib.py'.
78492 INFO: Processing pre-safe import module hook gi.repository.Atk from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.Atk.py'.
78509 INFO: Loading module hook 'hook-sqlite3.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78592 INFO: Loading module hook 'hook-numpy.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78593 INFO: Loading module hook 'hook-distutils.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78593 INFO: Loading module hook 'hook-sysconfig.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78611 INFO: Loading module hook 'hook-gi.repository.Gio.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
78878 INFO: Loading module hook 'hook-_tkinter.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
79036 INFO: checking Tree
79047 INFO: checking Tree
79049 INFO: Loading module hook 'hook-matplotlib.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
79322 INFO: Loading module hook 'hook-encodings.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
79506 INFO: Loading module hook 'hook-gi.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
79541 INFO: Loading module hook 'hook-gi.repository.GLib.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
79848 INFO: Loading module hook 'hook-gi.repository.Gdk.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80002 INFO: Processing pre-safe import module hook gi.repository.cairo from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.cairo.py'.
80003 INFO: Processing pre-safe import module hook gi.repository.Pango from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.Pango.py'.
80004 INFO: Processing pre-safe import module hook gi.repository.GdkPixbuf from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.GdkPixbuf.py'.
80004 INFO: Loading module hook 'hook-gi.repository.cairo.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80147 INFO: Loading module hook 'hook-PIL.SpiderImagePlugin.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80152 INFO: Import to be excluded not found: 'FixTk'
80152 INFO: Excluding import 'tkinter'
80153 INFO: Loading module hook 'hook-gi.repository.Atk.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80347 INFO: Loading module hook 'hook-gi.repository.GObject.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80506 WARNING: Hidden import "gi._gobject" not found!
80507 INFO: Loading module hook 'hook-gi.repository.xlib.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80632 INFO: Loading module hook 'hook-gi.repository.Pango.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
80777 INFO: Loading module hook 'hook-gi.repository.GdkPixbuf.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
81114 INFO: Processing pre-safe import module hook gi.repository.GModule from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/pre_safe_import_module/hook-gi.repository.GModule.py'.
81116 INFO: Loading module hook 'hook-gi.repository.GModule.py' from '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks'...
82059 INFO: Looking for ctypes DLLs
82341 WARNING: library user32 required via ctypes not found
82472 WARNING: library msvcrt required via ctypes not found
82491 INFO: Analyzing run-time hooks ...
82505 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pkgres.py'
82506 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py'
82509 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth__tkinter.py'
82509 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_pyqt5.py'
82510 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_gdkpixbuf.py'
82511 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_gtk.py'
82512 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_glib.py'
82512 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_gio.py'
82513 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_gi.py'
82514 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_mplconfig.py'
82515 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/hooks/rthooks/pyi_rth_mpldata.py'
82515 INFO: Including run-time hook '/home/afsara-ben/.local/lib/python3.6/site-packages/_pyinstaller_hooks_contrib/hooks/rthooks/pyi_rth_certifi.py'
82536 INFO: Looking for dynamic libraries
89670 INFO: Looking for eggs
89670 INFO: Python library not in binary dependencies. Doing additional searching...
89716 INFO: Using Python library /usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0
89733 INFO: Warnings written to ./dist/afsara3/warn-afsara3.txt
89840 INFO: Graph cross-reference written to ./dist/afsara3/xref-afsara3.html
94490 INFO: checking PYZ
94537 INFO: Building because toc changed
94537 INFO: Building PYZ (ZlibArchive) ./dist/afsara3/PYZ-00.pyz
96136 INFO: Building PYZ (ZlibArchive) ./dist/afsara3/PYZ-00.pyz completed successfully.
96443 INFO: checking PKG
96443 INFO: Building PKG because PKG-00.toc is non existent
96443 INFO: Building PKG (CArchive) PKG-00.pkg
Traceback (most recent call last):
File "/home/afsara-ben/.local/bin/pyinstaller", line 11, in <module>
sys.exit(run())
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/__main__.py", line 114, in run
run_build(pyi_config, spec_file, **vars(args))
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/__main__.py", line 65, in run_build
PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/build_main.py", line 720, in main
build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/build_main.py", line 667, in build
exec(code, spec_namespace)
File "./dist/afsara3.spec", line 33, in <module>
console=True )
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/api.py", line 437, in __init__
upx_exclude=self.upx_exclude
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/api.py", line 200, in __init__
self.__postinit__()
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/datastruct.py", line 160, in __postinit__
self.assemble()
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/building/api.py", line 285, in assemble
pylib_name=pylib_name)
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/archive/writers.py", line 332, in __init__
super(CArchiveWriter, self).__init__(archive_path, logical_toc)
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/archive/writers.py", line 64, in __init__
self._finalize()
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/archive/writers.py", line 96, in _finalize
self.save_trailer(toc_pos)
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/archive/writers.py", line 439, in save_trailer
tocstr = self.toc.tobinary()
File "/home/afsara-ben/.local/lib/python3.6/site-packages/PyInstaller/archive/writers.py", line 264, in tobinary
flag, ord(typcd), nm + pad))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
I had the same problems and found the following solution for me:
Use a virtual environment (you can follow this tutorial which provides clear explanations).
Fix pandastable and matplotlib (in my case, the error was connected in some way with pandastable, but I'm not sure which was the exact problem.

airflow trigger_dag command throwing error

I am executing airflow trigger_dag cng-hello_world command in airflow server and it resulted in below error. please suggest.
I followed below link:- http://michal.karzynski.pl/blog/2017/03/19/developing-workflows-with-apache-airflow/
The same Dag is been executed via airflow UI
[2019-02-06 11:57:41,755] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=2000
[2019-02-06 11:57:43,326] {plugins_manager.py:97} ERROR - invalid syntax (airflow_api.py, line 7)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/airflow/plugins_manager.py", line 86, in <module>
m = imp.load_source(namespace, filepath)
File "/home/ec2-user/airflow/plugins/airflow_api.py", line 7
<!DOCTYPE html>
^
SyntaxError: invalid syntax
[2019-02-06 11:57:43,326] {plugins_manager.py:98} ERROR - Failed to import plugin /home/ec2-user/airflow/plugins/airflow_api.py
[2019-02-06 11:57:43,326] {plugins_manager.py:97} ERROR - invalid syntax (__init__.py, line 7)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/airflow/plugins_manager.py", line 86, in <module>
m = imp.load_source(namespace, filepath)
File "/home/ec2-user/airflow/plugins/__init__.py", line 7
<!DOCTYPE html>
^
SyntaxError: invalid syntax
[2019-02-06 11:57:43,327] {plugins_manager.py:98} ERROR - Failed to import plugin /home/ec2-user/airflow/plugins/__init__.py
[2019-02-06 11:57:47,236] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-02-06 11:57:48,420] {models.py:258} INFO - Filling up the DagBag from /home/ec2-user/airflow/dags
[2019-02-06 11:57:48,783] {cli.py:237} INFO - Created <DagRun cng-hello_world # 2019-02-06 11:57:48+00:00: manual__2019-02-06T11:57:48+00:00, externally triggered: True>

how to install dask on google composer

I tried to install dask on google composer (airflow). I used pypi (GCP UI) to add dask and the below required packages(not sure if all the google one are required though, couldn't find requirement.txt):
dask
toolz
partd
cloudpickle
google-cloud
google-cloud-storage
google-auth
google-auth-oauthlib
decorator
when I run my DAG that has dd.read_csv("a gcp bucket") it shows the below error in airflow log:
[2018-10-24 22:25:12,729] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 350, in get_fs_token_paths
[2018-10-24 22:25:12,733] {base_task_runner.py:98} INFO - Subtask: fs, fs_token = get_fs(protocol, options)
[2018-10-24 22:25:12,735] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 473, in get_fs
[2018-10-24 22:25:12,740] {base_task_runner.py:98} INFO - Subtask: "Need to install `gcsfs` library for Google Cloud Storage support\n"
[2018-10-24 22:25:12,741] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/dask/utils.py", line 94, in import_required
[2018-10-24 22:25:12,748] {base_task_runner.py:98} INFO - Subtask: raise RuntimeError(error_msg)
[2018-10-24 22:25:12,751] {base_task_runner.py:98} INFO - Subtask: RuntimeError: Need to install `gcsfs` library for Google Cloud Storage support
[2018-10-24 22:25:12,756] {base_task_runner.py:98} INFO - Subtask: conda install gcsfs -c conda-forge
[2018-10-24 22:25:12,758] {base_task_runner.py:98} INFO - Subtask: or
[2018-10-24 22:25:12,762] {base_task_runner.py:98} INFO - Subtask: pip install gcsfs
so I tried to install gcsfs using pypi but got the below airflow error:
{
insertId: "17ks763f726w1i"
logName: "projects/xxxxxxxxx/logs/airflow-worker"
receiveTimestamp: "2018-10-25T15:42:24.935880717Z"
resource: {…}
severity: "ERROR"
textPayload: "Traceback (most recent call last):
File "/usr/local/bin/gcsfuse", line 7, in <module>
from gcsfs.cli.gcsfuse import main
File "/usr/local/lib/python2.7/site-
packages/gcsfs/cli/gcsfuse.py", line 3, in <module>
fuse import FUSE
ImportError: No module named fuse
"
timestamp: "2018-10-25T15:41:53Z"
}
seems that it is trapped in a loop of required packages!! not sure if I missed anything here? any thoughts?
You don't need to add storage in your PyPi packages, it's already installed. I ran a dag (image-version:composer-1.3.0-airflow-1.10.0) logging the version of the pre-installed package and it appears that it is 1.13.0. I also added in my dag the following, in order to replicate your case:
import dask.dataframe as dd
def read_csv_dask():
df = dd.read_csv('gs://gcs_path/data.csv')
logging.info("csv from gs://gcs_path/ read alright")
Before anything, I added via the UI the following dependencies:
dask==0.20.0
toolz==0.9.0
partd==0.3.9
cloudpickle==0.6.1
The corresponding task failed with the same message as yours ("Need to install gcsfs library for Google Cloud Storage support") at which point I returned to the UI and attempted to add gcsfs==0.1.2. This never succeeded. However, I did not get the error you did, I instead repeatedly failed with "Composer Backend timed out".
At this point, you could consider the following alternatives:
1) Install gcsfs with pip in a BashOperator. This is not optimal as you will be installing gcsfs every time the dag is ran.
2) Use another library. What are you doing with this csv? If you upload it to the gs://composer_gcs_bucket/data/ directory (check here) you can then read it using e.g. the csv standard lib like so:
import csv
def read_csv():
f=open('/home/airflow/gcs/data/data.csv', 'rU')
reader = csv.reader(f)

Resources