How to access on premise Teradata from Azure Databricks - teradata

We need to connect to on premise Teradata from Azure Databricks .
Is that possible at all ?
If yes please let me know how .

I was looking for this information as well and I recently was able to access our Teradata instance from Databricks. Here is how I was able to do it.
Step 1. Check your cloud connectivity.
%sh nc -vz 'jdbcHostname' 'jdbcPort'
- 'jdbcHostName' is your Teradata server.
- 'jdbcPort' is your Teradata server listening port. By default, Teradata listens to the TCP port 1025
Also check out Databrick’s best practice on connecting to another infrastructure.
Step 2. Install Teradata JDBC driver.
Teradata Downloads page provides JDBC drivers by version and archive type. You can also check the Teradata JDBC Driver Supported Platforms page to make sure you pick the right version of the driver.
Databricks offers multiple ways to install a JDBC library JAR for databases whose drivers are not available in Databricks. Please refer to the Databricks Libraries to learn more and pick the one that is right for you.
Once installed, you should see it listed in the Cluster details page under the Libraries tab.
Terajdbc4.jar dbfs:/workspace/libs/terajdbc4.jar
Step 3. Connect to Teradata from Databricks.
You can define some variables to let us programmatically create these connections. Since my instance required LDAP, I added LOGMECH=LDAP in the URL. Without LOGMECH=LDAP it returns “username or password invalid” error message.
(Replace the text in italic to the values in your environment)
driver = “com.teradata.jdbc.TeraDriver”
url = “jdbc:teradata://Teradata_database_server/Database=Teradata_database_name,LOGMECH=LDAP”
table = “Teradata_schema.Teradata_tablename_or_viewname”
user = “your_username”
password = “your_password”
Now that the connection variables are specified, you can create a DataFrame. You can also explicitly set this to a particular schema if you have one already. Please refer to Spark SQL Guide for more information.
Now, let’s create a DataFrame in Python.
My_remote_table = spark.read.format(“jdbc”)\
.option(“driver”, driver)\
.option(“url”, url)\
.option(“dbtable”, table)\
.option(“user”, user)\
.option(“password”, password)\
.load()
Now that the DataFrame is created, it can be queried. For instance, you can select some particular columns to select and display within Databricks.
display(My_remote_table.select(“EXAMPLE_COLUMN”))
Step 4. Create a temporary view or a permanent table.
My_remote_table.createOrReplaceTempView(“YOUR_TEMP_VIEW_NAME”)
or
My_remote_table.write.format(“parquet”).saveAsTable(“MY_PERMANENT_TABLE_NAME”)
Step 3 and 4 can also be combined if the intention is to simply create a table in Databricks from Teradata. Check out the Databricks documentation SQL Databases Using JDBC for other options.
Here is a link to the write-up I published on this topic.
Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects

If you create a virtual network that can connect to on prem then you can deploy your databricks instance into that vnet. See https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html.
I assume that there is a spark connector for terradata. I haven't used it myself but I'm sure one exists.

You can't. If you run Azure Databricks, all the data needs to be stored in Azure. But you can call the data using REST API from Teradata and then save data in Azure.

Related

Snowflake ODBC works with Azure and fail with AWS

I created an azure Snowflake trial account and an odbc dsn, it works.
Then I had to create a Snowflake trial in AWS to use the Snowflake training.
When creating a DSN it fails with this error; Incorrect username or password was specified.
For Azure, I use ..snowflakecomputing.com, this works.
For AWS I use .snowflakecomputing.com, I get the user error.
I tried other combinations but hten I always get a host unresolved error.
..snowflakecomputing.com
.sg..aws.snowflakecomputing.com
Thanks for hints
As per https://docs.snowflake.com/en/user-guide/organizations-connect.html your Snowflake URL / servername is in this format
https://<account_locator>.<region>.<cloud>.snowflakecomputing.com
The cloud is not needed if using AWS, but is if using Azure or GCP. The region is not needed if us-east1 but otherwise must be specified.
Now, it doesn't matter where you are trying to connect to Snowflake from - it only matters where your Snowflake instance is.
So, if you made your Snowflake instances in AWS it is the exact same URL / server name whether you connect to it via the web console, a command-line tool, ODBC from an AWS application, JDBC from an Azure application, or anything else.
Thus, if you made the Snowflake account in AWS you will use something like
https://FV23210.ap-southeast2.snowflakecomputing.com
no matter where you call it from; if you made the Snowflake account in Azure you will use something like
https://FV23210.ap-southeast2.azure.snowflakecomputing.com
no matter where you call it from.

Credentials for AWS Athena ODBC connection

I want to access AWS Athena in Power BI with ODBC. I used the ODBC driver(1.0.3) that Amazon provides:
https://docs.aws.amazon.com/de_de/athena/latest/ug/connect-with-odbc.html
To access the AWS-Service I use the user=YYY and the password=XXX. To access the relevant data our administrator created a role “ExternalAthenaAccessRole#99999”.
99999 is the ID of the account where Athena runs.
To use the ODVC-driver in Power BI I created the following connection string:
Driver=Simba Athena ODBC Driver;AwsRegion=eu-central-1;S3OutputLocation=s3://query-results-bucket/testfolder;AuthenticationType=IAM Credentials;
But when I enter the User XXX with the password YYY It get the message “We couldn’t authenticate with the credentials provided. Please try again.”.
Normally I would think that I must include the role “ExternalAthenaAccessRole#99999” in the connection string, but I couldn’t find a parameter for it in the documentation.
https://s3.amazonaws.com/athena-downloads/drivers/ODBC/SimbaAthenaODBC_1.0.3/Simba+Athena+ODBC+Install+and+Configuration+Guide.pdf
Can anybody help me how I can change the connection string so that I can access the data with the ODBC driver in Power BI?
TL;DR;
When using Secret Keys, do not specify "User / password", but instead always click on "default credentials" in Power Bi, to force it to use the Local AWS Configuration (e.g. C:/...$USER_HOME/.aws/credentials)
Summarized Guide for newbies:
Prerequisites:
AWSCli installed locally, on your laptop. If you don’t have this, just download the MSI installer from here:
https://docs.aws.amazon.com/cli/latest/userguide/install-windows.html
Note: this quick guide is just to configure the connection using AWS Access Keys, and not federating the credentials through any other Security layer.
Configure locally your AWS credentials.
From the Windows command prompt (cmd), execute: aws configure
Enter your AWS Access Key ID, Secret Access Key and default region; for example "eu-west-1" for Ireland.
You can get these Keys from the AWS console, IAM service, Users, select your user, Security, Create/Download Access Keys.
You should never share these keys, and it’s highly recommended to rotate these, for example, every month.
Download Athena ODBC Driver:
https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html
Important: If you have Power Bi 64 bits, download the same (32 or 64) for the ODBC.
Install it on your laptop, where you have Power Bi.
Open Windows ODBCs, add a User DSN and select Simba-Athena as the Driver.
Use always "Default credentials" and not user/password, since it will use our local keys from Step 1.
Configure an S3 bucket, for the temporary results. You can use something like: s3://aws-athena-query-results-eu-west-1-power-bi
On the Power Bi app, click on Get Data and Type ODBC.
Choose Credentials "default", to use the local AWS keys (from step 1) and, optionally, enter a "select" query.
Click on Load the data.
Important concern: I’m afraid Power Bi will load all the results from the query into our local memory. So if, for example. we're bringing 3 months of data and that is equivalent to 3 GB, then we will consume this in our local laptop.
Another important concern:
- For security reasons, you'll need to implement a KMS Encryption keys. Otherwise, the data is being transmitted in clear text, instead of being encrypted.
Relevant reference (as listed above), where you can find the steps for this entire configuration process, but more in detail:
- https://s3.amazonaws.com/athena-downloads/drivers/ODBC/Simba+Athena+ODBC+Install+and+Configuration+Guide.pdf
Carlos.

SQL Server 2017 ODBC via Rstudio or R on SSMS gets connected only to master database

I have been working on SQL Server 2017 via R (on Rstudio as well as R on SSMS) and i am unable to connect to a specific database. I mention the database name in the connection prompt but, it gets connected only to the master database. Is there something that I am missing while connecting?
The syntax I use for connection is:
conn = "Driver={ODBC Driver 13 for SQL Server};server=;Uid=uid; pwd=pwd;Database = mydb"
I am trying to use both RevoscaleR as well as ODBC() package in Rstudio to connect to a specific database but, it still gets connected to master database. Using RStudio connections pane, if i try to explore the other databases, it shows only dbo schemas and no other schemas even if they exist. Can someone help me in figuring out what might have gone wrong?
Most likely the login you use (the uid) is not authorized for that particular database (it is not created as a user in that database).
Some example code you can run in SSMS as - for example - sa:
--switch over to the database in question
USE mydb
GO
CREATE USER uid FOR LOGIN uid;
The above code creates a user in the database in question with the same name as the login.
Hope this helps!

Connecting to DSN created by SQLite driver

How to connect to DSN created by SQlite Driver using SQL anywhere APIs from C++ code?
I am using db_string_connect() to connect to sybase adaptive server anywhere. I want to use the same function to connect to the DSN created by SQLite Driver as well but db_string_connect() API is returning sqlcode -103 ["You supplied an invalid user ID or an incorrect password."].
I have this somewhat weird requirement because I want to abstract the connection to different databases at ODBC layer. And the code to connect to sybase is already written and I want to minimize the changes in the code. Hope I am making some sense.
Thanks.
You will not be able to use a function from SQL Anywhere client library to connect directly to some other database. Typically, if you need to be able to connect and manipulate different types of database systems, you have to introduce a database layer that sits between the vendor specific client libraries and your code. This could be something you write yourself or use an existing one.

Pull Sybase data into SQL Server

I have an ASP.NET app that uses a SQL Server database. I now need to pull data from Sybase ASE into that SQL Server database for my app to consume, and I'm not having any success with my ideas.
Has anyone done this? Any ideas/suggestions/tips?
You can configure a linked server from SQL Server to Sybase. It should be fairly vanilla using the Sybase provider on the MS side.
Okay, I've finally (through lame trial and error) found out how to link my Sybase ASE (12.5) server to my SQL Server (2008) which will allow the integration I want. Here's roughly how I did it:
Logged in to Sybase ASE OLE DB Configuration Manager (this is like the Sybase version of Windows' ODBC Data Sources) and added an OLE DB data source. I believe you must be an admin on the PC to do this.
In SQL Server 2008 Management Studio, went to Server Objects > Linked Servers. Right click and select "New Linked Server".
In the Linked Server Properties, I set the following properties:
General:
--Linked server: the name of your linked server as you want it to appear in your linked server list
--Provider: Select Sybase ASE OLE DB Provider from the dropdown list.
--Product name: The exact name of the OLD DB data source you just created in Sybase ASE OLE DB Configuration Manager.
--Data source: Same as Product name.
--Provider string: I left this blank
--Location: I left this blank
--Catalog: The default database (master or whatever) to log on to.
Security:
--You need to map a valid SQL Server logon to a valid Sybase logon. I did not use impersonation (which does a credentials pass-thru).
--I chose my connection Be made without using a security context.
Server Options:
--All the defaults worked for me.
Throughout, the standard SQL Server help worked fairly well as a guide. Though not always true, F1 was my friend here.
I can now do distributed queries, DTS or SSIS packages, and use SSRS. This takes a lot of the suck out of Sybase ASE.
Of course the above can be done via the command line using sp_linkserver, but the GUI is more comfortable for a lowly dev like me.
Use Management Studio or Enterprise Manager to import the data using the data importation wizard. That should be it, just make sure you pick the right data provider in the wizard and you should be good to go.
If you want this to be a live feed create a small windows service to manage the exchange of information. It should be relatively simple to do, just a little bit of leg work on your end. If you are adverse to that there are plenty of off the shelf solutions that can do this for you.
The question is a little vague on specifics:
Is this a one time conversion or part of a repeated process.
Is the source machine "reachable" from your destination machine (can you connect the two or do you need to read in files)
With most conversions there are two parts:
Physically getting data from the source into the destination.
Mapping data from the source to the destination tables.
It is hard to make any recommendations without more info. What would be fine for a one time conversion would not work if you need to read in data all day every day. Also, if the source database can not be connected to and you have to pass files, they methods change.

Resources