Airflow: SQL to GCS error with NULL datetime fields - airflow

I'm using the MsSqlToGoogleCloudStorageOperator operator to extract data from my MSSQL server DB, and I noticed that if I query tables that have datetime fields using a SELECT * FROM [table_name] my task will fail with a
ERROR - Object of type datetime is not JSON serializable
The only round this has been to specify the fields I want and even in that case if there's a datetime field that's NULL I get the same error. Any clue on how to fix this?

This is a bug in the current operator. There is a PR pending to be approved that fix that.
However you don't need to wait for the release to address the issue.
As a temporary workaround you can copy the code from the PR into your environment creating a custom operator
TempMsSqlToGoogleCloudStorageOperator
Once the fix is released you can remove the custom operator and import the fixed operator from the latest version of provider (for Airflow >= 2.0.0) or backport-provider (for Airflow < 2.0.0)

Related

specify a database name in databricks sql connection parameters

I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.
I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.
Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.
Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).
P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions

ImportError: No module named None

*** Test cases ***
TestDB
Connect To Database Using Custom Params None database='TestDB', user='system', password='system', host='10.91.41.101', port=1521
Please help - the error is:
ImportError: No module named None
The error most probably comes from the way you call Connect To Database Using Custom Params - the first argument you're passing, which should be the value for the dbapiModuleName, is passed as a string object, with the value "None".
If you wanted to call it with the value None object (as it's written in the library's help), that should have been ${None} in robotframework format.
I doubt that's going to work though - the DatabaseLibrary probably needs some DB type identifier. So if you are using postgres for example, you'd call it with "psycopg2":
Connect To Database Using Custom Params psycopg2 database='TestDB', user='username', password='mypass', host='10.1.1.2', port=1521
Have in mind you'd need the DB driver already installed through pip, psycopg2 in the case of the example here.
P.S. please don't paste actual credentials in SO.
I assume your question should have been posted something like this...
Issue
When attempting to execute the following test case in Robot Framework, I receive the following error: ImportError: No module named None
Here is the test case in question:
*** Test Cases ***
TestDB
Connect To Database Using Custom Params None database='TestDB', user='system', password='system', host='10.91.41.101', port=1521
If so, your issue may be as simple as spacing. Robot Framework can accept pipes as delimiters, but if you choose to use spaces, you must use 2 or more.
Based on your copy/paste, it looks like you have only one space between Connect To Database Using Custom Params and None (which, I'm assuming you're specifying as the DB API Python module the system default - not sure if that's recommended or supported). Make sure you have at least two spaces (I generally try for 4 unless I have a lot of parameters) between keywords and their parameters.
So:
Make sure you have at least two spaces between the keyword and parameters. Here is the keyword in question's reference.
Verify that you don't need to specify the Python database driver for the database you are using. Based on the port you've specified, I'm guessing it's an Oracle database. Here's a list of Python Oracle drivers.
I had a similar error and read on it for hours not knowing I hadn't created a .env file yet. credit to a friend who brought me to this page. (which gave me the hint on what I was missing).
I created a .env file in my root folder where the manage.py file is and configured my database settings and voila. Thanks Suraj

ORA-14102: only one LOGGING or NOLOGGING clause may be specified

While importing an oracle schema from dump file, i am getting below error while creating tables.
ORA-14102: only one LOGGING or NOLOGGING clause may be specified.
I see the above error while creating tables from the dumpfile for several tables.
How to enable or disable LOGGING/NOLOGGING at schema level before i start import?
When performing an Oracle database export with the expdp of Oracle 11gR2 (11.2.0.1) and then importing it into the database with impdp, the following error messages appear in the import log file:
ORA-39083: Object type INDEX failed to create with error:
ORA-14102: only one LOGGING or NOLOGGING clause may be specified
This is a known Oracle 11gR2 issue. The problem is that the DBMS_METADATA.GET_DDL returns invalid syntax for an index created. So, during the index creation, both the NOLOGGING and LOGGING keywords are visible in the DDL. Download and apply Patch 8795792 from Oracle to resolve this issue.

Querying a linked SQLite DB in SSMS

I'm trying to use a SQLite database a linked server in SSMS. I've managed to get the ODBC driver installed and a linked server created, but I can't seem to find a way to get queries to work. I think it's just a matter of not understanding the proper syntax for it. Here's what I've tried:
exec sp_tables_ex 'SQLITE'
This works as expected, showing all of the tables in the database.
select * from SQLITE.[default].dbo.TRANSLATION
Fails with this error message
Invalid use of schema or catalog for OLE DB provider "MSDASQL" for
linked server "SQLITE". A four-part name was supplied, but the
provider does not expose the necessary interfaces to use a catalog or
schema.
Taking a clue from that, I tried removing the schema:
select * from SQLITE.[default].TRANSLATION
But this gives me another error message:
Invalid object name 'SQLITE.default.TRANSLATION'.
Likewise, the following give the same error (with slight changes for the object name):
select * from SQLITE.[default].TRANSLATION
select * from SQLITE.dbo.TRANSLATION
select * from SQLITE.TRANSLATION
Any ideas? I'm not quite sure what to try from here.

When I use Fixture with SqlAlchemy in my unit tests, why am I unable to confirm changes to the database during test?

I am testing a message processer that uses SqlAlchemy (v0.7.4). In my test, I am using Fixture (v1.4) with Sqlite to set up and tear down a temporary database. My fixture data includes a file table with a status field that should get updated when the processor runs.
I have confirmed that the test, the processor being tested, and the fixture are all sharing the same database session.
I query the status field on the file record before the processor is run and afterwards. The value should change (from an int representing "Processing" to "Complete"). I have added debug code within the processor to verify that the field is being updated with the correct new status value. I am also able to independently verify that the processor runs successfully by checking the contents of an output file it produces. However, when I query the status at the end of my test using my test's database session, it is always the same as the value at the beginning.
I have tried explicitly committing and flushing the session before the final status query. Nothing works. Any ideas?
The issue here was twofold: 1) My test was using a temporary Sqlite database in memory. 2) Within my functional test, the processor was being spawned within a new process.
So even though I had hacked the processor class to use the same database session as the test itself, since processor and database were in separate memory spaces, the database updates the processor was making were invisible to the test code trying to verify the results.
Solution: set up the temporary Sqlite database in-file rather than in-memory.
Additionally, I discovered that, when Fixture does its teardown, it will throw an error if your fixture data isn't in the same state as it was setup in (noted at the end of this section). But that was a separate issue.

Resources