IBM Data Catalog doesn't allow to download the Connected Data Asset - watson-knowledge-catalog

IBM Data Catalog doesn't allow to download the Connected Data Asset
I created a connected data from a dashdb connection -> selecting a table.
Also tried to create a connected data asset from cloudant connection -> selecting document.
Also uploaded a csv file as data asset.
None of the above enables Download Button.

Currently the download button is always disabled. We are looking to re-enable it for certain assets soon. More information will be provided when it comes.

Related

How to access on premise Teradata from Azure Databricks

We need to connect to on premise Teradata from Azure Databricks .
Is that possible at all ?
If yes please let me know how .
I was looking for this information as well and I recently was able to access our Teradata instance from Databricks. Here is how I was able to do it.
Step 1. Check your cloud connectivity.
%sh nc -vz 'jdbcHostname' 'jdbcPort'
- 'jdbcHostName' is your Teradata server.
- 'jdbcPort' is your Teradata server listening port. By default, Teradata listens to the TCP port 1025
Also check out Databrick’s best practice on connecting to another infrastructure.
Step 2. Install Teradata JDBC driver.
Teradata Downloads page provides JDBC drivers by version and archive type. You can also check the Teradata JDBC Driver Supported Platforms page to make sure you pick the right version of the driver.
Databricks offers multiple ways to install a JDBC library JAR for databases whose drivers are not available in Databricks. Please refer to the Databricks Libraries to learn more and pick the one that is right for you.
Once installed, you should see it listed in the Cluster details page under the Libraries tab.
Terajdbc4.jar dbfs:/workspace/libs/terajdbc4.jar
Step 3. Connect to Teradata from Databricks.
You can define some variables to let us programmatically create these connections. Since my instance required LDAP, I added LOGMECH=LDAP in the URL. Without LOGMECH=LDAP it returns “username or password invalid” error message.
(Replace the text in italic to the values in your environment)
driver = “com.teradata.jdbc.TeraDriver”
url = “jdbc:teradata://Teradata_database_server/Database=Teradata_database_name,LOGMECH=LDAP”
table = “Teradata_schema.Teradata_tablename_or_viewname”
user = “your_username”
password = “your_password”
Now that the connection variables are specified, you can create a DataFrame. You can also explicitly set this to a particular schema if you have one already. Please refer to Spark SQL Guide for more information.
Now, let’s create a DataFrame in Python.
My_remote_table = spark.read.format(“jdbc”)\
.option(“driver”, driver)\
.option(“url”, url)\
.option(“dbtable”, table)\
.option(“user”, user)\
.option(“password”, password)\
.load()
Now that the DataFrame is created, it can be queried. For instance, you can select some particular columns to select and display within Databricks.
display(My_remote_table.select(“EXAMPLE_COLUMN”))
Step 4. Create a temporary view or a permanent table.
My_remote_table.createOrReplaceTempView(“YOUR_TEMP_VIEW_NAME”)
or
My_remote_table.write.format(“parquet”).saveAsTable(“MY_PERMANENT_TABLE_NAME”)
Step 3 and 4 can also be combined if the intention is to simply create a table in Databricks from Teradata. Check out the Databricks documentation SQL Databases Using JDBC for other options.
Here is a link to the write-up I published on this topic.
Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects
If you create a virtual network that can connect to on prem then you can deploy your databricks instance into that vnet. See https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html.
I assume that there is a spark connector for terradata. I haven't used it myself but I'm sure one exists.
You can't. If you run Azure Databricks, all the data needs to be stored in Azure. But you can call the data using REST API from Teradata and then save data in Azure.

Decryption issue when running presto queries in EMR for data encrypted using AWS client side master key

I have used your latest script that successfully installs presto server(version 0.99) and java 8 on Amazon EMR instance. My data files are located in a s3 bucket encrypted with client-side customer managed key that were encrypted . When I create a hive table that references those encrypted data files in s3, hive can successfully decrypt the records and display it in console. However, when viewing the same external table from presto command line interface the data is displayed in its encrypted form. I have looked at your link given in:
https://prestodb.io/docs/current/release/release-0.57.html and added those properties in my hive.properties file and it looks like given below.
hive.s3.connect-timeout=2m
hive.s3.max-backoff-time=10m
hive.s3.max-error-retries=50
hive.metastore-refresh-interval=1m
hive.s3.max-connections=500
hive.s3.max-client-retries=50
connector.name=hive-hadoop2
hive.s3.socket-timeout=2m
hive.s3.aws-access-key=***
hive.s3.aws-secret-key=**
hive.metastore.uri=thrift://localhost:9083
hive.metastore-cache-ttl=20m
hive.s3.staging-directory=/mnt/tmp/
hive.s3.use-instance-credentials=true
Any help on how to decrypt the files in using presto cli will be much appreciated.
We will follow-up in the issue you filed: https://github.com/facebook/presto/issues/2945

Access 2010 Cannot share database across network

I have a 2010 Microsoft Access database in the format FixList.accdb
There is one table and one form in it, that i want a small number of users to access at the same time.
I have split the database, so that the back end is in a different folder to the front end.
Finally, i have gone to options and selected the following:
- Default Open Mode =Shared
- Default Record Locking = No locks
- Open Databases by using record-level locking (NOT ticked)
It opens fine when 1 user opens the database, but when a second user double clicks the Access file to open, the following message appears. "You do not have exclusive access to the database at this time. If you proceed to make changes, you may not be able to save them later". My question is, what other change(s) can i make to this database so that the error message above does not appear when more than one person opens the file?
Do NOT have more than one concurrent user open the same copy of the front-end (e.g., by having all users open the copy from a folder on the server). Each user must have their own local copy of the front-end .accdb/.accde file.

What would be the best practice downloading all the files from a directory using Sftp

I would like to implement the following functionality:
downloading all the files from a specified remote directory to a local directory.
after downloading all the files I need a list file which contains all the downloaded files.
(I only want this list file when all the files were downloaded successfully.)
Point 1:
Let's say we have around 10 files in the remote directory.
I can use an int-sftp:inbound-channel-adapter component to download all the files but 10 poll cycles are needed to download all of them since the inbound component is only able to download 1 file per poll request.
Spring Integration creates 10 File messages one by one.
Questions:
How can I identify the last file (message) received from the FTP server?
I don't want let users access to list file till all the files from the FTP is successfully received.
How can I achive this?
I can write file names into a list file using the int-file:outbound-channel-adapter but users can read temorary information from that file before the download process is finished.
How can I trigger the event that all files which are on the FTP are downloaded?
Thanks for your advices
Ferenc
First of all this isn't correct:
the inbound component is only able to download 1 file per poll request
You can configure it to to download infinitely during the single poll - max-messages-per-poll=-1. Anyway it is a default option on <poller>.
Anyway if it is your case to dowload one file per poll, you can go ahead with that requirements.
Since any Messaging system tries to achieve stateless paradigm, it is normal that one message doesn't know anything about another. And with that they all don't impact each other. The async scenario is the best for Messaging. With that we can process the second message quicker, than the first one.
Your requirement is enough interest and I won't dare to call it strange. Because any business may have place.
Since you are going to process several download files as one group, there will be need to have some marker on the remote server. Or it can be some timeframe, which we can extract from file timestamp. Or there will be need to store on the remote server some marker file to point that a set of files are finished and you can process them from your application using their local version. Would be great, if that marker file can contain a list of file names of that group.
Otherwise we don't have any hook to group messages for those files.
From other side you can consider to use <int-sftp:outbound-gateway> with MGET command: http://docs.spring.io/spring-integration/docs/latest-ga/reference/html/sftp.html#sftp-outbound-gateway

Read connection setting from a text file

I am developing a transformation (on pentaho 4.4.0) which basically reads data from one oracle DB (11g), makes some transformation and loads data into another Oracle DB. Now for DB table input/output when I need to select a connection I have to select it from a drop down menu in 'Edit Step'. When I edit the connection, it asks me settings, like Host name, database name, port number, user name, password.
What I want is, to create a text file called 'Pentaho_connection_properties' in some directory on my machine where I will save all these info and as soon as I choose a connection name from the connection drop down menu, Pentaho should automatically read the file and populate the settings corresponding to that connection name. The purpose is to get rid of this manual process to entering settings again and again for multiple use of same DB.
Please let me know how this can be done. I will appreciate if you can be little explicit since I am new to Pentaho.
Thanks
Switch to using database repositories. Connection definitions are shared throughout an entire database repository.

Resources