BigQueryCheckOperator failed with 404 Error in Cloud composer - airflow

I am using BigQueryCheckOperator in Airflow to know if the data exists in BQ Table, but the Dag is failing with this ERROR - <HttpError 404 when requesting https://bigquery.googleapis.com/bigquery/v2/projects/
Here is the logs of the Dag
Can Someone tell me how to fix this issue?

This is known Airflow issue querying Bigquery datasets residing in non multi-regional locations (US,EU) within some of Bigquery operator submodules, the pull request #8273 has been already raised.
You can also check out this Stack thread for most accurate problem explanation.
By now, it was announced to bring this problem fixed in Airflow 2.0 release, however community group has been pushed Backport package in order to help the users with older Airflow 1.10.* versions and it will be considered in further building Airflow images for GCP Composer.
Looking for a workaround, you can try to adjust BashOperator invoking bq command-line tool in attempt to perform the certain action against Bigquery dataset inside the particular DAG file.

Related

GitLab pipeline merge request failed: Invalid Firebase project selection

I've been trying to merge a source branch with a target branch, but have consistently gotten the following error message on my failed job(s):
$ firebase use project_name --token $FIREBASE_TOKEN
Error: Invalid project selection, please verify project project-name exists and you have access.
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1
I have followed the advice from this thread and logged out/in from Firebase to use the project again, which unfortunately hasn't worked:
firebase logout
firebase login
firebase use project_name
I've triple-checked that I'm using the correct project name in Firebase, rather than the name of my Gitlab repo.
Unsure if it's related, but when setting up the merge request, GitLab notes that The source branch is 3 commits behind the target branch. I don't believe this is part of the issue, but worth bringing up.
Merging branches has never been an issue until today, and this is the first time I'm seeing this particular error causing the failed jobs. Any advice is appreciated
EDIT:
I added a screenshot image of the projects list, showing I've logged into the necessary Project by Project ID on Firebase. Everything should be connected, but I can't see what I'm missing that's causing the failed jobs.
UPDATE:
I've added firebase projects:list within the pipeline editor and get the following error message.
The issue I have here is I cannot find a firebase-debug.log file, after searching ways to find it, and trying to recreate the file by commenting out # firebase-debug.log* in my .gitignore file and running firebase init to try solutions from posts like this. Any thoughts on the original merging issue or how to find firebase-debug.log to move closer to a solution are greatly appreciated.
The setup appears incorrect for CI; eg. one doesn't need to pass $FIREBASE_TOKEN.
The issue might even stem from, that you use --token once and then not anymore.
Please refer to the user manual: https://github.com/firebase/firebase-tools/#general
Using a service account might be the least troublesome.
There's also a login --reauth option; and login:ci.

Airflow DAG Versioning

Is DAG versioning a thing ? I can't find much on the subject with a few Google searches. I would like to look at the DAGs screen in Airflow and be sure of what DAG code is in the wild.
The simplest solution would be to include a version number as part of the dag_id, but I would appreciate knowing if anyone has better, alternative solution. Tags would work too and migjht look good in the UI - they are designed for for filtering though, I'm not sure if there would be undesirable side-effects.
As the author of the DAG Versioning AIP, I can say that this work has been deferred post 2.0 mainly to support end-to-end DAG Versioning.
Originally, we (Airflow Core Committers) were planning to have a Webserver-only DAG Versioning i.e. to improve the visibility behaviour but not execution:
The scope of this AIP to make sure that the visibility behavior of
Airflow is correct, without changing the execution behaviour which
will continue to be based on the most recent version of the DAG.
This means it overcomes the issues where you can go back to an old version of the DAG, to view the shape of the DAG few months back and you can see the correct representation instead of "always-latest".
Currently, Airflow suffers from the issue where if you add/remove a task, it gets added/removed in all the previous DagRuns in the Webserver.
However, what we have decided is that we will accomplish Remote DAG Fetcher + DAG Versioning and enable versioning of DAG on the worker side, so a user will be able to run a DAG with the previous version too.
Currently, we don't have a date but mostly planning to do it around the end of 2021.
The Airflow project has a draft feature open to support DAG versions. The answer currently is that Airflow does not support versions.
The first use case in the link describes a key limitation, log files from previous runs can only surface nodes from the current DAG.
As mentioned above, as of yet, Airflow doesn't has its own functionality of versioning workflows. However you can manage that on your own by managing DAGs on their own git repository and fetching its state into airflow reposiroty as submodules. More on that;
https://www.youtube.com/watch?v=a-4yRne3ba4&lc=UgwiIO-ECVFSZPz1hOt4AaABAg

Can't use plugin module with Cloud Composer

I'm trying to use the Cloud Composer to run my workflow. I wanted to use "GoogleCloudStorageToGoogleCloudStorageOperator" operator which is available from Apache Airflow v1.10, but not being supported in current cloud composer (it supports only Apache Airflow v1.9 for now (2019/01/16)).
Following the guidance of the Google's blog post, I added the operator to a cloud composer environment by myself, and it worked well until a few days ago.
However, for now, when I tried to create a new cloud composer env and to deploy the same DAG that worked well previously, I got a following error message on the Airflow Web UI. And DAG is failed.
Broken DAG: [/home/airflow/gcs/dags/xxx.py] Relationships can only be set between Operators; received GoogleCloudStorageToGoogleCloudStorageOperator
I couldn't understand why this error occurred even if I used the same code and followed the same procedure to deploy my DAG to the cloud composer.
I appreciate for those who give me any advice to solve this problem.
This was due to a bug in Composer 1.4.2 which was already fixed.
Airflow error importing DAG using plugin - Relationships can only be set between Operators
Try out the DAG on Astronomer Cloud (http://astronomer.io/cloud), free 30 day trial.
Disclosure: I work at Astronomer.

Scheduler not updating package files

I'm developing a DAG on Cloud Composer; my code is separated into a main python file and one package with subfolders, it looks like this:
my_dag1.py
package1/__init__.py
package1/functions.py
package1/package2/__init__.py
package1/package2/more_functions.py
I updated one of the functions on package1/functions.py to take an additional argument (and update the reference in my_dag1.py). The code would run correctly on my local environment and I was not getting any errors when running
gcloud beta composer environments run my-airflow-environment list_dags --location europe-west1
But the Web UI raised a python error
TypeError: my_function() got an unexpected keyword argument
'new_argument'
I have tried to rename the function and the error changed to
NameError: name 'my_function' is not defined
I tried changing the name of the DAG and to upload the files to the dag folder zipped and unzipped, but nothing worked.
The error disappeared only after I renamed the package folder.
I suspect the issue is related to the scheduler picking up my_dag1.py but not package1/functions.py. The error appeared out of nowhere as I have made similar updates on the previous weeks.
Any idea on how to fix this issue without refactoring the whole code structure?
EDIT-1
Here's the link to related discussion on Google Groups
I've run into a similar issue. the "Broken DAG" error won't dismiss in Web UI. I guess this is a cache bug in Web server of AirFlow.
Background.
I created a customized operator with Airflow Plugin features.
After I import the customized operator, the airflow Web UI keep shows the Broken DAG error says that it can't find the customized operator.
Why I think it's a bug in Web server Airflow?
I can manually run the DAG with the command airflow test, so the import should be correct.
Even if I remove the related DAG file from the /dags/ folder of airflow, the error still there.
Here are What I did to resolve this issue.
restart airflow web service. (sometimes you can resolve the issue only by this).
make sure no DAG is running, restart airflow scheduler service.
make sure no DAG is running, restart airflow worker
Hopefully, it can help someone has the same issue.
Try restarting the webserver with:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION

Airflow DAG "seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence"

I used airflow for workflow of Spark jobs. After installation, I copy the DAG files into DAGs folder set in airflow.cfg. I can backfill the DAG to run the BashOperators successfully. But there is always a warning like the one mentioned. I didn't verify if the scheduling is fine, but I doubt scheduling can work as the warning said the master scheduler doesn't know of my DAG's existence. How can I eliminate this warning and get scheduling work? Anybody run into the same issue who can help me out?
This is usually connected to the Scheduler not running or the refresh interval being too wide. There are no log entries present so we cannot analyze from there. Also, unfortunately the very cause might have been ignored, because this is usually the root of the problem:
I didn't verify if the scheduling is fine.
So first you should check if both of the following services are running:
airflow webserver
and
airflow scheduler
If that won't help, see this post for more reference: Airflow 1.9.0 is queuing but not launching tasks

Resources