We have an airflow HTTP connection by name "service-1-con".
This is working fine when we have used airflow CLI to manage the connections.
Now the same is not working when specified using env variable AIRFLOW_CONN_SERVICE-1-CON.
There are other HTTP connections using env variables which work. Only this one is not working.
We suspect this could be due to the hyphen in the name. But it works when managed through UI and airflow CLI.
Can somebody advise what could be going wrong?
It works for me:
Here's how you can debug:
import os
from airflow.hooks.base_hook import BaseHook
conn_id = 'service-1-con'
env_var = f'AIRFLOW_CONN_{conn_id}'.upper()
os.environ[env_var] = 'ssh://hello:mypassword#'
conn = BaseHook.get_connection(conn_id)
print(conn.get_uri())
Here is the code where the connections are retrieved from env vars.
Related
I'm trying to implement custom XCOM backend.
Those are the steps I did:
Created "include" directory at the main Airflow dir (AIRFLOW_HOME).
Created these "custom_xcom_backend.py" file inside:
from typing import Any
from airflow.models.xcom import BaseXCom
import pandas as pd
class CustomXComBackend(BaseXCom):
#staticmethod
def serialize_value(value: Any):
if isinstance(value, pd.DataFrame):
value = value.to_json(orient='records')
return BaseXCom.serialize_value(value)
#staticmethod
def deserialize_value(result) -> Any:
result = BaseXCom.deserialize_value(result)
result = df = pd.read_json(result)
return result
Set at config file:
xcom_backend = include.custom_xcom_backend.CustomXComBackend
When I restarted webserver I got:
airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "xcom_backend" key in "core" section. Current value: "include.cust...
My guess is that it not recognizing the "include" folder
But how can I fix it?
*Note: There is no docker. It is installed on a Ubuntu machine.
Thanks!
So I solved it:
Put custom_xcom_backend.py into the plugins directory
set at config file:
xcom_backend = custom_xcom_backend.CustomXComBackend
Restart all airflow related services
*Note: Do not store DataFrames that way (bad practice).
Sources I used:
https://www.youtube.com/watch?v=iI0ymwOij88
I am trying to write some DAG integrity tests in airflow. The issue I am coming across is the DAG that I am testing, I have references to variables in some of the tasks within that DAG.
eg: Variable.get("AIRFLOW_VAR_BLOB_CONTAINER")
I seem to be getting the error:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: variable
from this because when testing via pytest, those variables (and the variables table) don't exist. Does anyone know any workarounds or suggested methods to handle Variables/Connection references when running DAG Integrity tests?
Thanks,
You can create a local metastore for testing. Running airflow db init without any other settings will create a SQLite metastore in your home directory which you can use during testing. My default additional settings for a local metastore for testing are:
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False (to ensure there are no defaults to make things magically work)
AIRFLOW__CORE__LOAD_EXAMPLES=False (to ensure there are no defaults to make things magically work)
AIRFLOW__CORE__UNIT_TEST_MODE=True (Set default test settings, skip certain actions, etc.)
AIRFLOW_HOME=[project root dir] (To avoid Airflow files in your home dir)
Running airflow db init with these settings results in three files in your project root dir:
unittests.db
unittests.cfg
webserver_config.py
It's probably a good idea to add those to your .gitignore. With this set up you can safely test against the local metastore unittests.db during your tests (ensure that when running pytest, the same env vars are set).
Alternatively, if you don't want a local metastore for reasons, you will have to resort to mocking to substitute the call Airflow makes to the metastore. This requires knowledge of the internals of Airflow. An example:
import datetime
from unittest import mock
from airflow.models import DAG
from airflow.operators.bash import BashOperator
def test_bash_operator(tmp_path):
with DAG(dag_id="test_dag", start_date=datetime.datetime(2021, 1, 1), schedule_interval="#daily") as dag:
with mock.patch("airflow.models.variable.Variable.get") as variable_get_mock:
employees = ["Alice", "Bob", "Charlie"]
variable_get_mock.return_value = employees
output_file = tmp_path / "output.txt"
test = BashOperator(task_id="test", bash_command="echo {{ var.json.employees }} > " + str(output_file))
dag.clear()
test.run(
start_date=dag.start_date,
end_date=dag.start_date,
ignore_first_depends_on_past=True,
ignore_ti_state=True,
)
variable_get_mock.assert_called_once()
assert output_file.read_text() == f"[{', '.join(employees)}]\n"
These lines:
with mock.patch("airflow.models.variable.Variable.get") as variable_get_mock:
employees = ["Alice", "Bob", "Charlie"]
variable_get_mock.return_value = employees
Determine that the function airflow.models.variable.Variable.get isn't actually called but instead this list is returned: ["Alice", "Bob", "Charlie"]. Since task.run() doesn't return anything, I made the bash_command write to a tmp_path, and read the file to assert if the content is what I expected.
This avoids the need for a metastore entirely, but mocking can be a lot of work and complex once your tests grow beyond basic examples like these.
Snowflake is not showing in the connections dropdown.
I am using MWAA 2.0 and the providers are already in the requirements.txt
MWAA uses python 3.7 dont know if this can be a thing
Requirements.txt:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.7.txt"
asn1crypto
azure-common
azure-core
azure-storage-blob
boto3
botocore
certifi
cffi
chardet
cryptography
greenlet
idna
isodate
jmespath
msrest
numpy
oauthlib
oscrypto
pandas
pyarrow
pycparser
pycryptodomex
PyJWT
pyOpenSSL
python-dateutil
pytz
requests
requests-oauthlib
s3transfer
six
urllib3
apache-airflow-providers-http
apache-airflow-providers-snowflake
#apache-airflow-providers-snowflake[slack]
#apache-airflow-providers-slack
snowflake-connector-python >=2.4.1
snowflake-sqlalchemy >=1.1.0
\
If anyone is in this trouble, instead of choosing Snowflake in the dropdown, you can choose AWS as the connection and will work fine.
It took me a while to finally figure this one out after trying many different parameter combinations.
My full Snowflake URL is:
https://xx12345.us-east-2.aws.snowflakecomputing.com
The correct format for the Host field is:
xx12345.us-east-2.snowflakecomputing.com
For the Extra field, this is what worked for me:
{
"account": "xx12345.us-east-2.aws",
"warehouse": "my_warehouse_name",
"database": "my_database_name"
}
Make sure you put Amazon Web Services for the Conn Type, like #AXI said.
Also, I have these modules defined in my requirements.txt file:
apache-airflow-providers-snowflake==1.3.0
snowflake-connector-python==2.4.5
snowflake-sqlalchemy==1.2.4
My Airflow version is 2.0.2.
According to MWAA docs, it should be enough to add apache-airflow-providers-snowflake==1.3.0 to the requirements file. When I added it to the existing MWAA env, where I had already tried many different combinations of packages, it helped partially. It was possible to create a connection using CLI, but not with UI.
But, when I created a new clean MWAA env with the requirements file as stated in mentioned AWS doc, it worked well. The connection was available in UI.
I followed https://fastapi.tiangolo.com/tutorial/bigger-applications/ resource to design my app
.....game/urls.py....
from fastapi import APIRouter
router = APIRouter()
#router.post("/", response_model=schemas.GameOut, tags=["games"])
def create_game(game: schemas.GameIn, db: Session = Depends(get_db)):
return Crud.create(db,game,model)
...main.py...
from game import urls as game_urls
app.include_router(game_urls,prefix="/games")
imported everything properly.
When i run uvicorn main:app --reload it is showing "NO attribures 'routes' " error
I am not able to find, what is the mistake i am doing here. Could any one helps me.
It seems you're injecting the entire urls module in your last line;
app.include_router(game_urls, prefix="/games")
^
I believe you should only inject the router object, e.g. (you might want to import just the router here instead)
app.include_router(game_urls.router, prefix="/games")
Also if you have issue with #router not existing make sure you define the APIRouter as router and not web_router = APIRouter()
I try to configure Airbnb AirFlow to use the CeleryExecutor like this:
I changed the executer in the airflow.cfg from SequentialExecutor to CeleryExecutor:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor
executor = CeleryExecutor
But I get the following error:
airflow.configuration.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
Note that the sql_alchemy_conn is configured like this:
sql_alchemy_conn = sqlite:////root/airflow/airflow.db
I looked at Airflow's GIT (https://github.com/airbnb/airflow/blob/master/airflow/configuration.py)
and found that the following code throws this exception:
def _validate(self):
if (
self.get("core", "executor") != 'SequentialExecutor' and
"sqlite" in self.get('core', 'sql_alchemy_conn')):
raise AirflowConfigException("error: cannot use sqlite with the {}".
format(self.get('core', 'executor')))
It seems from this validate method that the sql_alchemy_conn cannot contain sqlite.
Do you have any idea how to configure the CeleryExecutor without sqllite? please note that I downloaded rabitMQ for working with the CeleryExecuter as required.
It is said by AirFlow that the CeleryExecutor requires other backend than default database SQLite. You have to use MySQL or PostgreSQL, for example.
The sql_alchemy_conn in airflow.cfg must be changed to follow the SqlAlchemy connection string structure (see SqlAlchemy document)
For example,
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#127.0.0.1:5432/airflow
To configure Airflow for mysql
firstly install mysql this might help or just google it
goto airflow installation director usually /home//airflow
edit airflow.cfg
locate
sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
and add # in front of it so it looks like
#sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
if you have default sqlite
add this line below
sql_alchemy_conn = mysql://:#localhost:3306/
save the file
run command
airflow initdb
and done !
As other answers have stated you need to use a different database besides SQLite. Additionally you need to install rabbitmq, configure it appropriately, and change each of your airflow.cfg's to have the correct rabbitmq information. For an excellent tutorial on this see A Guide On How To Build An Airflow Server/Cluster.
If you run it on a kubernetes cluster. Use the following config:
airflow:
config:
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow#airflow-postgresql:5432/airflow