specify a database name in databricks sql connection parameters - airflow

I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.
I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.

Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.
Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).
P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions

Related

How to create Hive connection in airflow to specific DB?

I am trying to create Hive connection in airflow to point to specific Database. I tried to find the params in HiveHook and tried the below in the extra options.
{"db":"test_base"} {"schema":"test_base"} and {"database":"test_base"}
But looks like nothing works and always points to default db.
could someone help me to pointout what are the possible parameters we can pass in extra_options ?

Error executing SQLite command: 'too many SQL variables' Azure PullAsync limitation in xamarin form android

This looks like limitation from Microsoft azure mobile client for offline sync service for android.
In my xamarin form application i have 40 azure tables to sync with remote. Whenever the particular request(_abcTable.PullAsync) has the more number record like 5K, PullAsync returns the exception saying that : Error executing SQLite command: 'too many SQL variables'.
That pull async URL goes like this : https://abc-xyz.hds.host.com/AppHostMobile/tables/XXXXXXResponse?$filter=(updatedAt ge datetimeoffset'2017-06-20T13:26:17.8200000%2B00:00')&$orderby=updatedAt&$skip=0&$top=5000&ProjectId=2&__includeDeleted=true.
But in postman i can see the same Url returning the 5K records and Works fine in iPhone device as well but failing only in android.
From the above PullAsync request if i change the "top" parameter value from 5000 to 500 it works fine in android but takes more time. Do i have any other alternatives without limiting the performance.
Package version:
Microsoft.Azure.Mobile.Client version="3.1.0"
Microsoft.Azure.Mobile.Client.SQLiteStore" version=“3.1.0”
Microsoft.Bcl version="1.1.10"
Microsoft.Bcl.Build version="1.0.21"

SQLite.Net.Core-PCL version="3.1.1"
SQLite.Net-PCL version="3.1.1"
SQLitePCLRaw.bundle_green version="1.1.2"
SQLitePCLRaw.core" version="1.1.2"
SQLitePCLRaw.lib.e_sqlite3.android" version="1.1.2"
SQLitePCLRaw.provider.e_sqlite3.android" version="1.1.2"
Please let me know if i need to provide more information. Thanks
Error executing SQLite command: 'too many SQL variables
Per my understanding, your sqlite may touch the Maximum Number Of Host Parameters In A Single SQL Statement mentions as follows:
A host parameter is a place-holder in an SQL statement that is filled in using one of the sqlite3_bind_XXXX() interfaces. Many SQL programmers are familiar with using a question mark ("?") as a host parameter. SQLite also supports named host parameters prefaced by ":", "$", or "#" and numbered host parameters of the form "?123".
Each host parameter in an SQLite statement is assigned a number. The numbers normally begin with 1 and increase by one with each new parameter. However, when the "?123" form is used, the host parameter number is the number that follows the question mark.
SQLite allocates space to hold all host parameters between 1 and the largest host parameter number used. Hence, an SQL statement that contains a host parameter like ?1000000000 would require gigabytes of storage. This could easily overwhelm the resources of the host machine. To prevent excessive memory allocations, the maximum value of a host parameter number is SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999.
The maximum host parameter number can be lowered at run-time using the sqlite3_limit(db,SQLITE_LIMIT_VARIABLE_NUMBER,size) interface.
I refered Debugging the Offline Cache and init my MobileServiceSQLiteStore as follows:
var store = new MobileServiceSQLiteStoreWithLogging("localstore.db");
I logged all the SQL commands that are executed against the SQLite store when invoking pullasync. I found that after successfully retrieve response from mobile backend via the following request:
https://{your-app-name}.azurewebsites.net/tables/TodoItem?$filter=((UserId%20eq%20null)%20and%20(updatedAt%20ge%20datetimeoffset'1970-01-01T00%3A00%3A00.0000000%2B00%3A00'))&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
Microsoft.Azure.Mobile.Client.SQLiteStore.dll would execute the following sql statement for updating the related local table:
BEGIN TRANSACTION
INSERT OR IGNORE INTO [TodoItem] ([id]) VALUES (#p0),(#p1),(#p2),(#p3),(#p4),(#p5),(#p6),(#p7),(#p8),(#p9),(#p10),(#p11),(#p12),(#p13),(#p14),(#p15),(#p16),(#p17),(#p18),(#p19),(#p20),(#p21),(#p22),(#p23),(#p24),(#p25),(#p26),(#p27),(#p28),(#p29),(#p30),(#p31),(#p32),(#p33),(#p34),(#p35),(#p36),(#p37),(#p38),(#p39),(#p40),(#p41),(#p42),(#p43),(#p44),(#p45),(#p46),(#p47),(#p48),(#p49)
UPDATE [TodoItem] SET [Text] = #p0,[UserId] = #p1 WHERE [id] = #p2
UPDATE [TodoItem] SET [Text] = #p0,[UserId] = #p1 WHERE [id] = #p2
.
.
COMMIT TRANSACTION
Per my understanding, you could try to set MaxPageSize up to 999. Also, this limitation is from sqlite and the update processing is automatically handled by Microsoft.Azure.Mobile.Client.SQLiteStore. For now, I haven't find any approach to override the processing from Microsoft.Azure.Mobile.Client.SQLiteStore.

Invalid column definition error when using four part name to access Oracle DB as SQL Server linked server

I have setup a linked server in SQL Server 2008 R2 in order to access an Oracle 11g database. The MSDASQL provider is used to connect to the linked server through the Oracle Instant Client ODBC driver. The connection works well when using the OPENQUERY with the below syntax:
SELECT *
FROM OPENQUERY(LINKED_SERVER, 'SELECT * FROM SCHEMA.TABLE')
However, went I try to use a four part name using the below syntax:
SELECT *
FROM LINKED_SERVER..SCHEMA.TABLE
I receive the following error:
Msg 7318, Level 16, State 1, Line 1
The OLE DB provider "MSDASQL" for linked server "LINKED_SERVER" returned an invalid column definition for table ""SCHEMA"."TABLE"".
Does anyone have any insight on what my be causing the four part name query to fail while the OPENQUERY one works without any problems?
The correct path to follow is to use OPENQUERY function because your linked server is Oracle: the four name syntax will work fine for MSSQL servers, essentially because they understand T-SQL.
With very simple queries, a 4 part name can accidentally work but not often if you are in a real scenario. In your case, the SELECT * is returning all the columns, and in your case one of the column definition is not compatible with SQL Server. Try another table or try to select a single simple column (e.g. a CHAR or a NUMBER), maybe it will work without problem.
In any case, using distributed queries can be tricky sometime. Database itself does some optimizations before executing commands, so it is important for the database to know what it can do and what it can't. If the DB thinks the linked server is MSSQL, it will take some action that may not work with Oracle.
When using four part name syntax with a linked DB different from MSSQL, you will have other problems as well, for example using database builtin functions (i.e. to_date() Oracle function will not work because MSSQL would want to use its own convert() function, and so on).
So again, if the linked server is not a MSSQL, the right choice is to use OPENQUERY and passing it a query that use a syntax valid against the linked server SQL dialect.
If you use the OLEDB provider for Oracle you can query without using openquery

specify default schema for a database in db2 client

Do we have any way to specify default schema in cataloged DBs in db2 client in AIX.
The problem is , when it's connecting to DB, it's taking user ID as default schema and that's where it's failing.
We have too many scripts that are doing transactions to DB without specifying schema in their db2 sql statements. So it's not feasible to change scripts at all.
Also we can't create users to match schema.
You can try to type SET SCHEMA=<your schema> ; before executing your queries.
NOTE: Not sure if this work (I am without a DB2 database at the moment, but it seems that work) and depending on your DB2 version.
You can create a stored procedure that just changes the current schema and then set the SP as connect proc. You can test some conditions before make that schema change, for example if the stored procedure is executed from the AIX server directly with a given user.
You configure the database to use this SP each time a connection is established by modifying connect_proc
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.config.doc/doc/r0057371.html
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.dbobj.doc/doc/c0057372.html
You can create alias in the new user schema that points to the tables with the other schema. Refer these links :
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0000910.html
http://bytes.com/topic/db2/answers/181247-do-you-have-always-specify-schema-when-using-db2-clp

Trace the Cause for update of Sql Table

I have a table Product which have Quantity column, This table get updated thru .net application using Stored procedure based on flag variable. Now im having problem reported from user that even though the flag variable is not set table is getting updated with new values.
Now i need to isolated the cause for the issue.How will i check which update and through which application this table is getting modified. I have no idea about it.
What is the best approach to resolve this issue?
Assuming you are using SQL Server:
You can monitor calls to SQL Server using SQL Server Profiler. You can setup a filter to monitor queries affecting the Product table. The log will show what the query looked like, when the query was executed, the database user executing the query, the name of the application (if that is specified in the connection string) and a bunch of other things.

Resources