How to reduce parser process when running dbt on Airflow - airflow

I am running dbt version 1.0.4 on Airflow. My ETL pipeline is running fine.
But I notice that dbt takes a long time to parse files every time it run on Airflow. Some lines in the log:
[2022-06-14 05:06:54,523] {subprocess.py:78} INFO - 05:06:54.506639 [debug] [MainThread]: Parsing macros/common/helpers/dropif.sql
[2022-06-14 05:06:55,826] {subprocess.py:78} INFO - 05:06:55.809703 [debug] [MainThread]: 1605: jinja rendering because of STATIC_PARSER flag. file: mart/domain_1/model_1.sql
Since my project is quite big, it takes a looooong time to actual run the query.
So, is there any way for me to bypass the parsing?

I add --no-static-parser. But I want to reduce more parsing time.

Related

Error in Cloud composer with data build tool (dbt) path ['name']: 'jaffle_shop' does not match '^[^\\d\\W]\\w*$'

I am testing a deployment of dbt within Cloud composer. On my local machine (Ubuntu 20.04) I have got success in running the dbt models with airflow. When running on Google Cloud composer I get the following error
{subprocess.py:74} INFO - Output:
{subprocess.py:78} INFO - Running with dbt=0.21.0
{subprocess.py:78} INFO - Encountered an error while reading the project:
{subprocess.py:78} INFO - ERROR: Runtime Error
{subprocess.py:78} INFO - at path ['name']: 'jaffle_shop' does not match '^[^\\d\\W]\\w*$'
{subprocess.py:78} INFO -
{subprocess.py:78} INFO - Error encountered in /home/airflow/gcs/dags/dbt_project.yml
{subprocess.py:78} INFO - Encountered an error:
{subprocess.py:78} INFO - Runtime Error
{subprocess.py:78} INFO - Could not run dbt
{subprocess.py:82} INFO - Command exited with return code 2
{taskinstance.py:1503} ERROR - Task failed with exception
We are using a BashOperator to run dbt models in Airflow.
Initially had some problems with dependencies but they were solved.
Using a standard dbt_project.yml file with a single model just to test how this works.
Another way is to use Docker but we need try if this works.
Edit
Versions
dbt: 0.21.0
cloud-composer: 1.17.1
airflow: 2.1.2
Pypi Packages
airflow-dbt: 0.4.0
dbt: 0.21.0
jsonschema: 3.1 (Added this as Pypi was throwing an error about the version
I really appreciate if anyone can help
Pete
The problem here is the jsonschema dependency. Version 3.1.0 does not work, while versions 3.1.1 and 3.2.0 will work--and should also work within Composer's dependency requirements.
There looks to have been an issue with switching to js-regex for the jsonschema folks in 3.1.0, which caused them to revert back to regular re in 3.1.1.
There are some details here, and a couple of related issues described here and here.
In general, it would be much nicer if Cloud Composer supported virtual environments to avoid this entire dependency-collision mess, but apparently Google does not support that approach.

Log messages for DAG import errors in Airflow 2.x

I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:

Airflow Cluster Policy is not getting invoked

I am trying to setup and understand custom policy. Not sure what I am doing wrong however, following this is not working.
Airflow Version: 1.10.10
Expected result: it should throw exception if I try to run DAG with default_owner
Actual Result: no such exception
/root/airflow/config/airflow_local_settings.py
class PolicyError(Exception):
pass
def cluster_policy(task):
print("task_instance_mutation_hook")
raise PolicyError
def task_instance_mutation_hook(ti):
print("task_instance_mutation_hook")
raise PolicyError
/root/airflow/config/airflow_local_settings.pyc file is being created so I know this file is being processed by airflow.
if there is any compilation error in this file all my dags fails. however not with above file.
Not sure what I am doing wrong.
This feature is available from 1.10.12 version only.

Getting error while making a new dag in apache airflow

PendingDeprecationWarning: The requested task could not be added to the DAG because a task with task_id create_tag_template_field_result is already in the DAG. Starting in Airflow 2.0, trying to overwrite a task will raise an exception.
First , This is just a warning that for now but from version 2.0 of Airflow, it will raise exception, so can crash your pipeline if not handled (given you update airflow module)
Warning suggests that you are adding a task twice or using same id (create_tag_template_field_result) for two different tasks, which is causing this warning.

JavaFX to Android - Execution failed for task ':deleteSrcAndLayout'

good day, I got an issue in creating an android project, Im currently using windows7 with JDK8u40 installed and Im using the latest dalvik sdk. But when I attempted to create an android project, an error was thrown:
* What went wrong:
Execution failed for task ':deleteSrcAndLayout'.
> Directory does not exist: C:\AndroidFX\CodeGenerator\src
Here's the complete error log:
C:\dalvik-sdk\samples\Ensemble8>./gradlew --info createProject -PDEBUG -PDIR=C:/
AndroidFX -PPACKAGE="hello" -PNAME="CodeGenerator" -PANDROID_SDK=C:/AndroidSDK/s
dk -PJFX_SDK=C:/dalvik-sdk -PJFX_APP=C:/Jar -PJFX_MAIN="hello.Hello"
Starting Build
Settings evaluated using empty settings script.
Projects loaded. Root project using build file 'C:\dalvik-sdk\samples\Ensemble8\
build.gradle'.
Included projects: [root project 'Ensemble8']
Evaluating root project 'Ensemble8' using build file 'C:\dalvik-sdk\samples\Ense
mble8\build.gradle'.
Starting file lock listener thread.
All projects evaluated.
Selected primary task 'createProject'
Tasks to be executed: [task ':conf', task ':androidCreateProject', task ':delete
SrcAndLayout', task ':writeAntProperties', task ':updateManifest', task ':update
StringsXml', task ':updateBuildXml', task ':createProject']
:conf (Thread[main,5,main]) started.
:conf
Executing task ':conf' (up-to-date check took 0.0 secs) due to:
Task has not declared any outputs.
====================================================
Android SDK: [C:/AndroidSDK/sdk]
Target: [android-21]
Project name: [CodeGenerator]
Package: [hello]
JavaFX application: [C:/Jar]
JavaFX sdk: [C:/dalvik-sdk]
JavaFX main.class: [hello.Hello]
Workdir: [C:/AndroidFX]
debug: [true]
===================================================
:conf (Thread[main,5,main]) completed. Took 0.078 secs.
:androidCreateProject (Thread[main,5,main]) started.
:androidCreateProject
Executing task ':androidCreateProject' (up-to-date check took 0.0 secs) due to:
Task has not declared any outputs.
Starting process 'command 'C:/AndroidSDK/sdk/tools/android.bat''. Working direct
ory: C:\AndroidFX Command: C:/AndroidSDK/sdk/tools/android.bat create project -n
CodeGenerator -p CodeGenerator -t android-21 -k hello -a Activity
An attempt to initialize for well behaving parent process finished.
Successfully started process 'command 'C:/AndroidSDK/sdk/tools/android.bat''
Error: Package name 'hello' contains invalid characters.
A package name must be constitued of two Java identifiers.
Each identifier allowed characters are: a-z A-Z 0-9 _
Proces
s 'command 'C:/AndroidSDK/sdk/tools/android.bat'' finished with exit value 0 (st
ate: SUCCEEDED)
:androidCreateProject (Thread[main,5,main]) completed. Took 1.375 secs.
:deleteSrcAndLayout (Thread[main,5,main]) started.
:deleteSrcAndLayout
Executing task ':deleteSrcAndLayout' (up-to-date check took 0.0 secs) due to:
Task has not declared any outputs.
:deleteSrcAndLayout FAILED
:deleteSrcAndLayout (Thread[main,5,main]) completed. Took 0.594 secs.
FAILURE: Build failed with an exception.
* Where:
Build file 'C:\dalvik-sdk\samples\Ensemble8\build.gradle' line: 203
* What went wrong:
Execution failed for task ':deleteSrcAndLayout'.
> Directory does not exist: C:\AndroidFX\CodeGenerator\src
* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to
get more log output.
BUILD FAILED
Total time: 6.531 secs
Please help me!! Im stuck!!!!
I also tried JDK7u75 but it didnt worked!!
I successfully created an android project by editing the createHelloWorld.bat under android-tools in the dalvik-sdk.

Resources