airflow experimental api , how to set dag that is running as failed - airflow

I'm using experimental api of airflow
and would like to failed running DAG
i didn't see it in documentation
i tried the below, but it creating new running dag/task
would be glad to get the right experimental api/method/payload for it
thanks in advnace
'''session.post(
url=f'{airflow_env}/api/experimental/dags/{dag_name}/dag_runs',
json={'state':'failed',
"dag_run_id":"scheduled__2022-03-13T20:30:32.887761+00:00"})
'''

Related

Working DAG fails when triggered from another dag in CLI

I have a simple DAG which connects to an impala db and runs an sql script. The dag runs fine when running independently:
airflow dags test original_dag_name 2022-9-27
However, when I test using TriggerDagRunOpertor from another DAG, the DAG fails:
airflow tasks test other_dag_name trigger_task 2022-9-27
Looking at the logs for original_dag_name I see the following:
[Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function
...which appears to be driver related, which doesn't make sense as it works fine when I trigger the DAG on its own. Is there some sort of config not getting set correctly when triggering via TriggerDagRunOperator?
Here is the TriggerDagRunOperator task:
task_run_original_dag = TriggerDagRunOperator (
task_id='run_original_dag',
trigger_dag_id='original_dag_name',
execution_date='{{ ds }}',
reset_dag_run=True,
wait_for_completion=True,
poke_interval=60
)

Airflow 2 - debugging why dag is not loading

On Airflow 2 my dag is not showing on the UI, and I'm getting DAG Import Errors (...) for it.
The error message is insufficient for me to debug (it's a custom operator, with a lot of custom logic - so I don't want to get into details of the error itself).
On Airflow 1.X I could use cli:
airflow list_dags
to get more elaborated debug message, is there anything analogical on airflow 2 ?
I'm looking for a cli command/UI option that will provide me with more elaborated error message, than the one I'm getting on the main screen of the webserver.
As described in the Airlfow's documentation, to test DAG loading you can simply run:
python your-dag-file.py
If there is any problem during the DAG loading phase you will get a stack trace here.
The later sections also describe how to test custom operators.
As explained in the upgrading manual the
airflow list_dags has been changed to airflow dags list
The full syntax is:
airflow dags list [-h] [-o table, json, yaml] [-S SUBDIR]
for more information see docs

Docker container failed to start when deploying to Google Cloud Run

I am new to GCP, and am trying to teach myself by deploying a simple R script in a Docker container that connects to BigQuery and writes the system time. I am following along with this great tutorial: https://arbenkqiku.github.io/create-docker-image-with-r-and-deploy-as-cron-job-on-google-cloud
So far, I have:
1.- Made the R script
library(bigrquery)
library(tidyverse)
bq_auth("/home/rstudio/xxxx-xxxx.json", email="xxxx#xxxx.com")
project = "xxxx-project"
dataset = "xxxx-dataset"
table = "xxxx-table"
job = insert_upload_job(project=project, data=dataset, table=table, write_disposition="WRITE_APPEND",
values=Sys.time() %>% as_tibble(), billing=project)
2.- Made the Dockerfile
FROM rocker/tidyverse:latest
RUN R -e "install.packages('bigrquery', repos='http://cran.us.r-project.org')"
ADD xxxx-xxxx.json /home/rstudio
ADD big-query-tutorial_forQuestion.R /home/rstudio
EXPOSE 8080
CMD ["Rscript", "/home/rstudio/big-query-tutorial.R", "--host", "0.0.0.0"]
3.- Successfully run the container locally on my machine, with the system time being written to BigQuery
4.- Pushed the container to my container registry in Google Cloud
When I try to deploy the container to Cloud Run (fully managed) using the Google Cloud Console I get this error: 
"Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information. Logs URL:https://console.cloud.google.com/logs/xxxxxx"
When I review the log, I find these noteworthy entries:
1.- A warning that says "Container called exit(0)"
2.- Two bugs that say "Container Sandbox: Unsupported syscall setsockopt(0x8,0x0,0xb,0x3e78729eedc4,0x4,0x8). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information."
When I check BigQuery, I find that the system time was written to the table, even though the container failed to deploy.
When I use the port specified in the tutorial (8787) Cloud Run throws an error about an "Invalid ENTRYPOINT".
What does this error mean? How can it be fixed? I'd greatly appreciate input as I'm totally stuck!
Thank you!
H.
The comment of John is the right source of the errors: You need to expose a webserver which listen on the $PORT and answer to HTTP/1 HTTP/2 protocols.
However, I have a solution. You can use Cloud Build for this. Simply define your step with your container name and the args if needed
Let me know if you need more guidance on this (strange) workaround.
Log information "Container Sandbox: Unsupported syscall setsockopt" from Google Cloud Run is documented as an issue for gVisor.

unable to specify master_type in MLEngineTrainingOperator

I am using airflow to schedule a pipeline that will result in training a scikitlearn model with ai platform. I use this DAG to train it
with models.DAG(JOB_NAME,
schedule_interval=None,
default_args=default_args) as dag:
# Tasks definition
training_op = MLEngineTrainingOperator(
task_id='submit_job_for_training',
project_id=PROJECT,
job_id=job_id,
package_uris=[os.path.join(TRAINER_BIN)],
training_python_module=TRAINER_MODULE,
runtime_version=RUNTIME_VERSION,
region='europe-west1',
training_args=[
'--base-dir={}'.format(BASE_DIR),
'--event-date=20200212',
],
python_version='3.5')
training_op
The training package loads the desired csv files and train a RandomForestClassifier on it.
This works fine until the number and the size of the files increase. Then I get this error:
ERROR - The replica master 0 ran out-of-memory and exited with a non-zero status of 9(SIGKILL). To find out more about why your job exited please check the logs:
The total size of the files is around 4 Gb. I dont know what is the default machine used but is seems not enough. Hoping this would solve the memory consumption issue I tried to change the parameter n_jobs of the classifier from -1 to 1, with no more luck.
Looking at the code of MLEngineTrainingOperator and the documentation I added a custom scale_tier and a master_type n1-highmem-8, 8 CPUs and 52GB of RAM , like this:
with models.DAG(JOB_NAME,
schedule_interval=None,
default_args=default_args) as dag:
# Tasks definition
training_op = MLEngineTrainingOperator(
task_id='submit_job_for_training',
project_id=PROJECT,
job_id=job_id,
package_uris=[os.path.join(TRAINER_BIN)],
training_python_module=TRAINER_MODULE,
runtime_version=RUNTIME_VERSION,
region='europe-west1',
master_type="n1-highmem-8",
scale_tier="custom",
training_args=[
'--base-dir={}'.format(BASE_DIR),
'--event-date=20200116',
],
python_version='3.5')
training_op
This resulted in an other error:
ERROR - <HttpError 400 when requesting https://ml.googleapis.com/v1/projects/MY_PROJECT/jobs?alt=json returned "Field: master_type Error: Master type must be specified for the CUSTOM scale tier.">
I don't know what is wrong but it appears that is not the way to do that.
EDIT: Using command line I manage to launch the job:
gcloud ai-platform jobs submit training training_job_name --packages=gs://path/to/package/package.tar.gz --python-version=3.5 --region=europe-west1 --runtime-version=1.14 --module-name=trainer.train --scale-tier=CUSTOM --master-machine-type=n1-highmem-16
However i would like to do this in airflow.
Any help would be much appreciated.
EDIT: My environment used an old version of apache airflow, 1.10.3 where the master_type argument was not present.
Updating the version to 1.10.6 solved this issue
My environment used an old version of apache airflow, 1.10.3 where the master_type argument was not present. Updating the version to 1.10.6 solved this issue

Dag Seems to be missing

I have a dag which checks for new workflows to be generated (Dynamic DAG) at a regular interval and if found, creates them. (Ref: Dynamic dags not getting added by scheduler )
The above DAG is working and the dynamic DAGs are getting created and listed in the web-server. Two issues here:
When clicking on the DAG in web url, it says "DAG seems to be missing"
The listed DAGs are not listed using "airflow list_dags" command
Error:
DAG "app01_user" seems to be missing.
The same is for all other dynamically generated DAGs. I have compiled the Python script and found no errors.
Edit1:
I tried clearing all data and running "airflow run". It ran successfully but no Dynamic generated DAGs were added to "airflow list_dags". But when running the command "airflow list_dags", it loaded and executed the DAG, (which generated Dynamic DAGs). The dynamic DAGs are also listed as below:
[root#cmnode dags]# airflow list_dags
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
[2019-08-13 00:34:31,692] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=15, pool_recycle=1800, pid=25386
[2019-08-13 00:34:31,877] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-08-13 00:34:32,113] {__init__.py:305} INFO - Filling up the DagBag from /root/airflow/dags
/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py:70: PendingDeprecationWarning: Invalid arguments were passed to BashOperator (task_id: tst_dyn_dag). Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'provide_context': True}
super(BashOperator, self).__init__(*args, **kwargs)
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
app01_user
app02_user
app03_user
app04_user
testDynDags
Upon running again, all the above generated 4 dags disappeared and only the base DAG, "testDynDags" is displayed.
When I was getting this error, there was an exception showing up in the webserver logs. Once I resolved that error and I restarted the webserver it went through normally.
From what I can see this is the error that is thrown when the webserver tried to parse the dag file and there is an error. In my case it was an error importing a new operator I added to a plugin.
Usually, I check in Airflow UI, sometimes the reason of broken DAG appear in there. But if it is not there, I usually run the .py file of my DAG, and error (reason of DAG cant be parsed) will appear.
I never got to work on dynamic DAG generation but I did face this issue when DAG was not present on all nodes ( scheduler, worker and webserver ). In case you have airflow cluster, please make sure that DAG is present on all airflow nodes.
Same error, the reason was I renamed my dag_id in uppercase. Something like "import_myclientname" into "import_MYCLIENTNAME".
I am little late to the party but I faced the error today:
In short: try executing airflow dags report and/or airflow dags reserialize
Check out my comment here:
https://stackoverflow.com/a/73880927/4437153
I found that airflow fails to recognize a dag defined in a file that does not have from airflow import DAG in it, even if DAG is not explicitly used in that file.
For example, suppose you have two files, a.py and b.py:
# a.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def makedag(dag_id="a"):
with DAG(dag_id=dag_id) as dag:
DummyOperator(task_id="nada")
dag = makedag()
and
# b.py
from a import makedag
dag = makedag(dag_id="b")
Then airflow will only look at a.py. It won't even look at b.py at all, even to notice if there's a syntax error in it! But if you add from airflow import DAG to it and don't change anything else, it will show up.

Resources