Airflow Debugging broken DAGs new CLI - airflow

I'm trying to debug a Broken DAG message from the airflow UI, from the accepted answer of Debugging Broken DAGs.
However, with airflow 2.1.0, the airfow list_dags command now prints a usage message:
usage: airflow [-h] GROUP_OR_COMMAND ...
airflow dags list and airflow dags report both give a table with DAGs, but none give the full stack trace I'm looking for.
$ airflow dags report
file | duration | dag_num | task_num | dags
=================+================+=========+==========+=================
/myBROKENDAG.py | 0:00:00.032594 | 0 | 0 |
/example-dag.py | 0:00:00.009614 | 1 | 9 | example_dag
/mydag2.py | 0:00:00.007147 | 1 | 1 | mydag2
Any ideas on how to find the full stack trace for a broken DAG with the new CLI?

What I could figure so far, is from the container or host running the airflow webserver:
cd dags
python myBROKENDAG.py
this also accounted for the UI-defined connection and printed the full stacktrace.
Credit to this answer

Related

airflow.exceptions.AirflowException: dag_id could not be found: sample_dag. Either the dag did not exist or it failed to parse

I have airflow installed.
airflow info yields:
Apache Airflow
version | 2.2.0
executor | SequentialExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn | sqlite:////home/user#local/airflow/airflow.db
dags_folder | /home/user#local/airflow/dags
plugins_folder | /home/user#local/airflow/plugins
base_log_folder | /home/user#local/airflow/logs
remote_base_log_folder |
System info
OS | Linux
architecture | x86_64
uname | uname_result(system='Linux', node='ubuntuVM.local', release='5.11.0-37-generic',
| version='#41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021', machine='x86_64')
locale | ('en_US', 'UTF-8')
python_version | 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]
python_location | /home/user#local/anaconda3/envs/airflow/bin/python
Tools info
git | git version 2.25.1
ssh | OpenSSH_8.2p1 Ubuntu-4ubuntu0.3, OpenSSL 1.1.1f 31 Mar 2020
kubectl | NOT AVAILABLE
gcloud | NOT AVAILABLE
cloud_sql_proxy | NOT AVAILABLE
mysql | NOT AVAILABLE
sqlite3 | 3.36.0 2021-06-18 18:36:39
| 5c9a6c06871cb9fe42814af9c039eb6da5427a6ec28f187af7ebfb62eafa66e5
psql | NOT AVAILABLE
Paths info
airflow_home | /home/user#local/airflow
system_path | /home/user#local/anaconda3/envs/airflow/bin:/home/user#local/anaconda3/
| condabin:/sbin:/bin:/usr/bin:/usr/local/bin:/snap/bin
python_path | /home/user#local/anaconda3/envs/airflow/bin:/home/user#local/anaconda3/
| envs/airflow/lib/python39.zip:/home/user#local/anaconda3/envs/airflow/lib/pytho
| n3.9:/home/user#local/anaconda3/envs/airflow/lib/python3.9/lib-dynload:/home/jm
| ellone#local/anaconda3/envs/airflow/lib/python3.9/site-packages:/home/user#ocp.
| local/airflow/dags:/home/user#local/airflow/config:/home/user#local/air
| flow/plugins
airflow_on_path | True
Providers info
apache-airflow-providers-celery | 2.1.0
apache-airflow-providers-ftp | 2.0.1
apache-airflow-providers-http | 2.0.1
apache-airflow-providers-imap | 2.0.1
apache-airflow-providers-sqlite | 2.0.1
I then cd to /home/user#local/airflow/dags and touch to create a file sample_dag.py.
Next, I ran:
airflow dags backfill sample_dag
But airflow throws:
airflow.exceptions.AirflowException: dag_id could not be found: sample_dag.py. Either the dag did not exist or it failed to parse.
Further, I do not see my dag in my localhost:8080 page but I do see the sample DAGs.
I have no reason to think a blank py file should work but I would think it should be seen.
How do I go about creating my first DAG? Judging by the docs I've read, this should be correct.
The exception is thrown because there are no DAGs with a dag_id of "sample_dag" in the dags_folder location. The dag_id is set when calling the DAG constructor rather than referencing the name of the DAG file.
For example:
with DAG(
dag_id='hello_world',
schedule_interval="#daily",
start_date=datetime(2021, 1, 1),
catchup=False,
tags=['example'],
) as dag:
...
An empty DAG file will not be recognized by Airflow nor will an empty DAG be created in the UI.
To get going with your first DAG, you can check out the classic tutorial, the TaskFlow API tutorial, or dive into one of the sample DAGs that were loaded initially via the Code View in the Airflow UI.

"spawn npm.cmd ENOENT" when running "amplify publish"

I have a simple React application with an Amplify configuration. When I try to publish it using Amplify (Mac) it ends with the following error:
❯ amplify publish
✔ Successfully pulled backend environment dev from the cloud.
Current Environment: dev
| Category | Resource name | Operation | Provider plugin |
| -------- | ---------------------------- | --------- | ----------------- |
| Api | sls-demo-twitter-state-api-2 | No Change | |
| Hosting | amplifyhosting | No Change | awscloudformation |
No changes detected
Publish started for amplifyhosting
command execution terminated with error
An error occurred during the publish operation: spawn npm.cmd ENOENT
The same configuration runs without any problems on my other client (Windows). Actually Amplify tries to build the app first and this step does not work. I cannot find reason for this.
"amplify publish" basically runs the build & start commands from amplify project configuration,
that you can reconfigure using amplify configure project.
I hope in your machine(Mac) npm is missing.
Install node brew install node and run amplify publish

Why airflow task fails with try number 0 and null operator?

I have 2 airflow workers, each in a machine
In machine A a file sensor task runs, while in machine B an rsync task (copies files from A to B) and also another file sensor task for the copied files.
Sometimes rsync task fails without trying to execute and does not produce a log file.
The trigger rule of tasks is all_done and airflow v1.9
The Sensor Tasks use xcom_push and xcom_pull which I disabled but the same issue happens
-[ RECORD 3 ]---+-----------------------------------------------------
task_id | task_name (rsync)
dag_id | dag_name
execution_date | 2019-10-08 19:30:00
start_date |
end_date | 2019-10-08 20:20:21.649662
duration |
state | failed
try_number | 0
hostname |
unixname | nio
job_id |
pool | pool_name
queue | queue name of machine B
priority_weight | 3
operator |
queued_dttm | 2019-10-08 20:20:20.550397
pid |
max_tries | 3
operator is null
retry is 0
and the state is failed
Expect that the rsync task will run.

How to save output of SQL query fired from Osquery to a file

I have installed Osquery utility on my machine. When I fire an SQL command, it gives output to STDOUT. Is there any way to redirect that output to a file?
$ sudo osqueryi
I0314 10:57:51.644351 3958 database.cpp:563] Checking database version for migration
I0314 10:57:51.644912 3958 database.cpp:587] Performing migration: 0 -> 1
I0314 10:57:51.645279 3958 database.cpp:619] Migration 0 -> 1 successfully completed!
I0314 10:57:51.645627 3958 database.cpp:587] Performing migration: 1 -> 2
I0314 10:57:51.646088 3958 database.cpp:619] Migration 1 -> 2 successfully completed!
Using a virtual database. Need help, type '.help'
osquery>
osquery>
osquery> SELECT * from memory_info;
+--------------+-------------+----------+----------+-------------+-----------+----------+------------+-----------+
| memory_total | memory_free | buffers | cached | swap_cached | active | inactive | swap_total | swap_free |
+--------------+-------------+----------+----------+-------------+-----------+----------+------------+-----------+
| 513617920 | 270921728 | 15110144 | 99860480 | 0 | 145080320 | 59494400 | 0 | 0 |
+--------------+-------------+----------+----------+-------------+-----------+----------+------------+-----------+
osquery>
I want this output in a file. I checked Osquery official documentation. But it hasn't been helpful to solve this particular problem. https://osquery.readthedocs.io/en/stable/introduction/sql/#sql-as-understood-by-osquery
You can use the redirection facilities of your shell:
$ osqueryi --json 'select * from osquery_info' > res.json
$ cat res.json
[
{"build_distro":"10.12","build_platform":"darwin","config_hash":"e7c68185a7252c23585d53d04ecefb77b3ebf99c","config_valid":"1","extensions":"inactive","instance_id":"38201952-9a75-41dc-b2f8-188c2119cda1","pid":"26255","start_time":"1552676034","uuid":"4740D59F-699E-5B29-960B-979AAF9BBEEB","version":"3.3.0","watcher":"-1"}
]
Note that in this example we use JSON output. There are other options available: --csv, --line, --list.
As seph explained in https://stackoverflow.com/a/55164199/491710, it is a common use-case to schedule queries in osqueryd and push the results into a logging pipeline.
osqueryi is generally for interactive use. When saving to files, or having osquery part of a data pipeline, people usually configure scheduled queries with osqueryd.
https://osquery.readthedocs.io/en/stable/deployment/configuration/ has some pretty simple examples of a configuration.
You could also specify the query on the command line, and then do whatever you're doing in the shell.

Flyway database migration tool info option not printing the version number

WE have flyway integrated with Redshift and we are using this as a simple java main program to run all our schema migrations. We also use the info command to print the current version of the database, However this command successfully runs or at least appears to run but does not print the version number.
We have version 4.2 of the flyway jar. What is that we may be missing? Thanks
To manually recreate what the info command line option does in java code you can copy what its implementation does (from the source):
MigrationInfoDumper.dumpToAsciiTable(flyway.info().all())
An example from the docs is shown below:
+-------------+------------------------+---------------------+---------+
| Version | Description | Installed on | State |
+-------------+------------------------+---------------------+---------+
| 1 | Initial structure | | Pending |
| 1.1 | Populate table | | Pending |
| 1.3 | And his brother | | Pending |
+-------------+------------------------+---------------------+---------+

Resources