I am trying to create Hive connection in airflow to point to specific Database. I tried to find the params in HiveHook and tried the below in the extra options.
{"db":"test_base"} {"schema":"test_base"} and {"database":"test_base"}
But looks like nothing works and always points to default db.
could someone help me to pointout what are the possible parameters we can pass in extra_options ?
Related
I have an airflow DAG which call a particular bash command using a variable. At the backend, we have Aurora DB. Do we know if there are any tables in the Aurora DB which stores information of the variables used in Airflow DAGs? I need to create a report out of it and hence, the ask to access the variables from backend.
I tried using the operational_insights schema but could not find any tables with the desired information.
If you are using an Airflow variable you should be able to query a list of them with the REST API no matter which backend you use.
curl "http://<your Airflow host>/api/v1/variables" --user "login:password"
This is preferred over querying the Airflow metadata database directly because if you accidentally modify or drop a table you can corrupt your Airflow.
With that caveat: the standard table where Airflow variables are stored is variable so after logging into the db SELECT * FROM variable; should return a list.
Again this is for Airflow Variables. From your question I am not entirely sure if you mean that or in general any variables that tasks use. In the latter case you might be looking for the rendered_fields parameter of the task instances, which can also be done using the API.
I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.
I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.
Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.
Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).
P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions
I have a project requirement to back-up Airflow Metadata DB to some data warehouse (but not using an Airflow DAG). At the same time, the requirement mentions some connection called airflow_db.
I am quite new to Airflow, so I googled a bit on the topic. I am a bit confused about this part. Our Airflow Metadata DB is PostgreSQL (this is built from docker-compose, so I am tinkering on a local install), but when I look at Connections in Airflow Web UI, it says airflow_db is MySQL.
I initially assumed that they are the same, but by the looks of it, they aren't? Can someone explain the difference and what they are for?
Airflow creates airflow_db Conn Id with MySQL by default (see source code)
Default connections are not really useful in production system. It's just a long list of stuff that you are probably not going to use.
Airflow 1.1.10 introduced the ability not to create the default list by setting:
load_default_connections = False in airflow.cfg (See PR)
To give more background the connection list is where hooks find the information needed in order to connect to a service. It's not related to the backend database. Though the backend is db like any db and if you wish to allow hooks to interact with it you can define it in the list like any other connection (which is probably why you have this as option in the default).
Im trying to add another data in a from table in a separate database to my script,
but I keep getting this error all time.
My script
connect database "chris.db" .
run chrisf.p
disconnect databse.
The error I'm getting
How can I get round this issue?
Thank you.
The word "database" is not part of the syntax for the CONNECT statement.
CONNECT "chris".
is the correct syntax.
The OpenEdge documentation for CONNECT is here: https://documentation.progress.com/output/OpenEdge117/openedge117/?_ga=2.93982683.75218856.1547464117-1040589272.1546786181#page/dvpin%2Fthe-connect-statement.html
I'm not sure what you are trying to do with:
run chrisf.p disconnect databse.
but that will run an external procedure called "chrisf.p" and pass 2 "compile on the fly" parameters with values of "disconnect" and "databse". (I'm pretty sure that's not really what you intend.)
Do we have any way to specify default schema in cataloged DBs in db2 client in AIX.
The problem is , when it's connecting to DB, it's taking user ID as default schema and that's where it's failing.
We have too many scripts that are doing transactions to DB without specifying schema in their db2 sql statements. So it's not feasible to change scripts at all.
Also we can't create users to match schema.
You can try to type SET SCHEMA=<your schema> ; before executing your queries.
NOTE: Not sure if this work (I am without a DB2 database at the moment, but it seems that work) and depending on your DB2 version.
You can create a stored procedure that just changes the current schema and then set the SP as connect proc. You can test some conditions before make that schema change, for example if the stored procedure is executed from the AIX server directly with a given user.
You configure the database to use this SP each time a connection is established by modifying connect_proc
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.config.doc/doc/r0057371.html
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.dbobj.doc/doc/c0057372.html
You can create alias in the new user schema that points to the tables with the other schema. Refer these links :
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0000910.html
http://bytes.com/topic/db2/answers/181247-do-you-have-always-specify-schema-when-using-db2-clp